EDEP 768 - University of Hawaii

Ronald H. Heck
The University of Hawai‘i at Mānoa
January 11, 2016
Conducting Multi-Parameter Tests Between Nested Models in SPSS
Often we wish to a series of competing models during the course of model building. The overall
goal is to find develop a model that summarizes proposed relationships in a parsimonious
manner (i.e., with few extraneous parameters) compared with one or more alternative models.
There are typically two different situations that arise. In the first situation, we may just generally
wish to examine the fit of two or more proposed models in terms of which provides a better fit to
the data, given the number of parameters it has. The models may have some variables and
parameters in common, but they differ in that one cannot be completely constructed from the
others. In this first situation, to compare models we can make use of what are called “information
indices”; for example, we can make use of Akaike’s Information Criterion (AIC), Consistent
Akaike’s Information Criterion (CAIC), or the BIC (Bayesian Information Criterion). The goal is
to find the model that produces the lowest coefficient, given its number of parameters. Each
provides a “penalty” for the number of parameters in the model, with the BIC being providing
the more extreme penalty for more parameters, and hence, favors models with fewer parameters
than the AIC or the CAIC (which provides an additional correction to AIC favoring models with
fewer parameters).
In the other situation, one model may be directly developed from the other model by adding or
removing parameters. They are referred to as “nested” models; more specifically, a model that
estimates a lower number of parameters is nested within a model that estimates a larger number
of parameters if fixing one or more parameter estimates to zero in the larger model results in the
smaller model (e.g., Hox, 2010; Marcoulides & Hershberger, 1997; Peugh & Heck, in press).
The goal is generally to see whether the restricted (or nested) model, with fewer parameters
estimated, fits the data as well (or better) than the alternative model with more estimated
parameters. Nested models are typically examined using likelihood ratio tests (which are
distributed as chi-square with degrees of freedom equal to the difference in model parameters
estimated). The difference in the models is examined with respect to the change in log likelihood
between the restricted (nested model with fewer parameters) estimated and alternative models.
Maximum likelihood (ML) estimation summarizes the fit of a proposed model with respect to a
discrepancy function between the sample covariance (or correlation) matrix and the modelimplied covariance matrix. A model that fits the data perfectly would provide no discrepancy
between the two covariance matrices. In order to evaluate the fit of a proposed model against the
data, ML estimation produces a model deviance statistic, defined as –2*log likelihood (–2LL),
where likelihood is the value of the likelihood function at convergence and log is the natural
logarithm. The deviance is an indicator of how well the model fits the data. Models with lower
deviance (i.e., a smaller discrepancy function) fit better than models with higher deviance.
Nested models (i.e., where a more specific model is formed from a more general one) can be
compared by examining differences in these deviance coefficients under specified conditions
(e.g., changes in deviance between models per differences in degrees of freedom).
IBM SPSS MIXED currently offers two estimation choices: full information ML estimation
(often abbreviated as ML) and restricted maximum likelihood (REML) estimation, which is the
default setting. It is important to note there are differences between ML and REML parameter
estimation in comparing models in multilevel modeling situations (e.g., for more detailed
discussion, see Goldstein, 2011, pp. 57-59; Hox, 2012, pp. 40-42; Singer & Willett, 2003, pp.
Ronald H. Heck
The University of Hawai‘i at Mānoa
January 11, 2016
Conducting Multi-Parameter Tests
88-92; Snijders & Bosker, 2012, pp. 60-61). In ML estimation, both regression coefficients and
variance components are included in the likelihood function, while in REML estimation, only
the variance components are included in estimating the likelihood function, with the regression
coefficients estimated in a second step (Hox, 2010). REML, therefore, is referred to as a
restricted solution. One of the shortcomings of ML estimation for comparing nested multilevel
models is that the estimation process but does not take into account the loss in degrees of
freedom due to the estimation of the P +1 regression coefficients in the proposed model (Hox,
2010). This failure to allocate degrees of freedom properly results in negatively biased randomeffect parameter estimates due to positively biased degrees of freedom for parameter estimation.
As there are more parameters in the model and smaller sizes, therefore, the variance estimates
obtained through ML may be too small, which leads to overly liberal hypothesis tests
(Raudenbush & Bryk, 2002).
In contrast, because REML considers fixed effect parameters separately, unbiased random effect
estimates can be obtained after the fixed effects and their degrees of freedom are removed from
the likelihood function. In other words, REML takes into account the loss in degrees of freedom
due to the estimation of the P + 1 regression coefficients in the model in order to obtain unbiased
estimation of the variance components (Snijders & Bosker, 2012). This correction in the
denominators used to calculate the variance will be greatest when the sample size is small.
Where sample sizes are balanced in multilevel data, REML will produce estimates consistent
with estimates produced by ANOVA, which are optimal (Searle, Casella, & McCulloch, 1992).
ML is widely used in model comparison, however, since the computations are easier and
generally efficient (with sufficient sample sizes), and because both regression coefficients and
variance components are included in estimating the likelihood function, a chi-square test
between competing models can easily be constructed to compare nested models (Hox, 2010).
Where competing models are nested in random effects, REML estimation can be used. Nested
models can be compared using a likelihood ratio test, which involves first computing both the
difference in the -2 LogL (or chi-square) model fit statistics (i.e., Δ -2LogL = [-2LogLsmaller] – [2LogLlarger]) and the difference in the number of estimated parameters (Δp = plarger - psmaller)
between the two nested models. An alternative means of comparing nested models, which can
be used with REML estimation (which facilitates comparing nested models with both regression
coefficients and random effects), is the multi-parameter Wald test (Schafer, 1997). In the
following section, we will illustrate each approach.
A Short Illustration Using a Likelihood Ratio Test
For example, let’s suppose we are comparing two nested three-level models. The goal is to
determine whether allowing a level-2 predictor’s slope coefficient (in this case, gender) to vary
randomly at level 3 improves the fit of the model against the model where the gender slope is
fixed at level 2. In this first case, we can use a likelihood ratio test, since we are examining
whether adding a random effect improves the model fit relative to the restricted model (with
slope fixed to 0). Hence, there are no fixed effects involved in the comparison between models.
The first model, with the random effect of gender fixed at level 3, is nested with respect to the
alternative model allowing the level-2 predictor to vary randomly at level 3. This is because
fixing the random slope effect associated with gender to 0 in the alternative model results in the
restricted model (i.e., with a greater number of degrees of freedom). A likelihood ratio test
comparing the two models can be conducted by first using the -2LL (or deviance) information in
the SPSS output to compute these difference values [Δ -2LogL = (-2LogLsmaller) – (-2LogLlarger)].
Ronald H. Heck
The University of Hawai‘i at Mānoa
January 11, 2016
Conducting Multi-Parameter Tests
The actual nested model “test” involves referring the -2LogL difference value (Δ -2LogL ) to a
chi-square sampling distribution at degrees of freedom equal to the difference in the number of
estimated parameters. In this case, the difference in model degrees of freedom is 3, owing to the
presence of other random effects in the level-3 model. The alternative model (with 14
parameters) is tested against the restricted model by adding a random effect for gender
(GROUP_GENDER) in the last line.
The model with random slope at level 3 has 14 estimated parameters and a -2LL (deviance) of
109293.033. The nested model (without the level-3 random gender effect) has 11 estimated
parameters and a -2LL of 109302.040. The chi-square difference is 9.007, which is significant at
p < .05. The required chi-square coefficient (for 3 degrees of freedom, at p = .05) is 7.82. The
results of the likelihood ratio test therefore show that allowing the effects of gender to vary
randomly at level 3 would result in a significant improvement in the fit of the analysis model to
the data.
Comparing Nested Models Using a Multi-parameter Test
In similar fashion, the model that allows gender to vary randomly at level 3 is nested with respect
to a more complex alternative model that includes a level-3 predictor and the presence of three
other added variable interactions because fixing the four fixed effects to 0 in the level-3 predictor
model results in the more restricted model with the level-2 predictor varying randomly at level 3.
As noted earlier, this latter model with fixed-effect and random-effect parameters can be
compared against the nested model using REML estimation, instead of ML estimation, and a
multi-parameter Wald test (Enders, 2010, pp. 233-239; Li, Raghunathan, & Rubin, 1991; Peugh
& Heck, in press, Schafer, 1997) instead of a likelihood ratio test. The multi-parameter test is
similar to how the coefficient of determination (i.e., R2) in a multiple regression analysis tests the
inclusion of all predictors simultaneously for a significantly non-zero proportion of response
variable variance explained (using an F-ratio test).
Following is the SPSS syntax that can be used to compare the random gender effects model to
the free/reduced lunch percentages model (both estimated using REML) via a multi-parameter
Wald test (with a difference of 4 degrees of freedom) is
January 11, 2016
Conducting Multi-Parameter Tests
Ronald H. Heck
The University of Hawai‘i at Mānoa
where the /TEST = command is used in SPSS for multi-parameter Wald testing. A text title
(“MPW” for multi-parameter Wald) is included, and each of the four fixed-effect parameters is
listed to indicate its inclusion in the test, followed by a “1” that indicates each fixed effect is
weighted equally, and each parameter is separated by a semicolon to allow the four effects to be
tested as a unified set. Results for the multi-parameter Wald test in SPSS (F[4, 55.064] = 5.83, p
< .001) showed that the inclusion of the four fixed effects in the level-3 predictor model (with 18
parameters estimated versus 14) resulted in a significant improvement in model fit.
Test of Contrastsa
Source Numerator df
a. Dependent Variable: MATH_DV.
Enders, C.K. (2010). Applied missing data analysis. New York: Guilford.
Goldstein, H. (2011). Multilevel statistical models (4th ed.). West Sussex, UK: Wiley.
Hox, J.J. (2010). Multilevel analysis methods: Techniques and applications. New York:
Li, K.H., Raghunathan, T.E., & Rubin, D.B. (1991). Large sample significance levels from
multiply imputed data using moment-based statistics and an F reference distribution.
Journal of the American Statistical Association, 86, 1065-1073.
Peugh, J. L. & Heck, R. H. (in press). Conducting three-level longitudinal analyses. Journal
of Adolescent Development.
Schafer, J.L. (1997). Analysis of incomplete multivariate data. Boca Raton, FL: Chapman &
Searle, S.R., Casella, G., & McCulloch, C.E. (1991). Variance components. NY: Wiley.
Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and
event occurrence. New York: Oxford University Press.
Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and
advanced multilevel modeling. London: Sage.