HIERARCHICAL LINEAR MODELING OF DYADIC DATA

advertisement
Nonconvergence and sample bias 1
Running head: HIERARCHICAL LINEAR MODELING OF DYADIC DATA
Nonconvergence and Sample Bias in Hierarchical Linear Modeling of Dyadic Data
Jason T. Newsom and Masami Nishishiba
Portland State University
Draft: 2/13/02
We thank Tasha Beretvas, Joop Hox, and Aloen Townsend for helpful comments on an earlier
draft of the manuscript. Address correspondence to Jason T. Newsom, Ph.D., Institute on Aging,
School of Community Health, Portland State University, P.O. Box 751, Portland, OR 972070751 or newsomj@pdx.edu.
Nonconvergence and sample bias 2
Abstract
Recent statistical development and software availability for hierarchical linear models has led to
an increasingly wide range of research applications. Although researchers have begun to use
these techniques and even recommend their use with dyadic data, very little is known about
estimation with only two observations per group. This Monte Carlo study examines the effects
of the number of dyads and the intraclass correlation coefficient on convergence difficulties and
bias of parameter estimates and their standard errors. Results show that convergence problems
with intercept and slope variance estimates are extremely common. Bias was generally low for
fixed effects and their standard errors, but random effects and their standard errors frequently
showed serious bias. The findings suggest that random effect estimation and significance tests
are largely impractical with dyadic data.
Nonconvergence and sample bias 3
Convergence Difficulties and Sample Bias in Hierarchical Linear Modeling of Dyadic Data
With the recent development and widespread availability of hierarchical linear modeling
(HLM) techniques, new analysis strategies for a variety of research designs have emerged. HLM
is an appropriate analytic technique for analysis of nested or hierarchically structured data in
which individual observations are nested within groups. A common example is data that involve
students nested within classrooms. Students who share the same teacher, facilities, or curriculum
tend to have related or dependent scores. These data structures lead to violation of standard
independence assumptions of traditional regression analysis whenever the measure of
nonindependence, the intraclass correlation coefficient (ICC), is greater than zero. Among
several new areas of the application of HLM is the analysis of dyadic data, such as couples,
twins, parent-child interaction, or friendship pairs. There is a growing body of published reports
that use HLM for analysis of dyadic data (e.g., Barnett, Marshall, Raudenbush, & Brennan,
1993; Kurdek, 1997; Windle & Dumenci, 1997), and a few sources that suggest HLM as an
option for dyadic data (e.g., Maguire, 1999; Kashy & Kenny, 2000; Newsom, in press). Analysis
of dyadic data is appropriate under the rationale of HLM, because dyad members are individuals
nested within groups of two. However, little simulation work has been conducted that examines
the behavior of estimates and their standard errors when data are from individuals nested within
very small groups, and no studies that we could find have examined dyads. The growing
application of HLM to dyadic data has been proceeding with virtually no information available
to researchers about the practical difficulties of estimation, such as convergence failures, bias in
regression estimates, or bias in standard errors and significance tests. The present study
examines these issues in an extensive Monte Carlo study of the HLM with dyadic data under a
variety of conditions.
Nonconvergence and sample bias 4
Hierarchical Linear Models
HLM, sometimes referred to as "multilevel regression" or "multilevel modeling", is a
regression-based analysis that can be conceptualized as a two-level regression (Aitkin &
Longford, 1986; de Leeuw & Kreft, 1986: Goldstein, 1986; Mason, Wong, & Entwisle, 1984;
Raudenbush & Bryk, 1986). The first level of analysis involves an identical regression analysis
repeated within each group. This regression model follows the ordinary least squares regression
model1 with as many as p independent variables:
Level 1:
Yij   0 j  1 j x1ij  . . .   pj x pij  rij
(1.1)
In the above equation, subscripts i and j represent individuals and groups, respectively. The
subscript, p, designates the number of predictors at level 1. In the case of dyads, the number of
observations per group, nj, is equal to two. In the second-level of analysis, the intercept, 0, and
slopes, p, serve as dependent variables in another regression analysis using predictors measured
at the group level. Such predictors might include classroom size, the teacher’s race, or average
student socioeconomic status with classroom data, or household income, the number of years
married, or twin’s age in the dyadic example. As in ordinary least squares regression analysis,
the intercept represents the value of the dependent variable when the predictors equal zero. It is
also possible to “center” predictors, by computing deviations from the group mean or grand
mean. Centering produces level-1 intercepts that represent the group mean or grand mean for
each group (for a more complete discussion of this topic, see Kreft, de Leeuw, & Aiken, 1995).
Thus, the level-two analysis is a regression that predicts the intercept or a particular slope for
each group. Under certain coding schemes, the level-2 intercept can be interpreted as a grand
mean. The level-2 slopes can be interpreted as the effect of a level-2 predictor on the group
Nonconvergence and sample bias 5
average when predicting intercepts or the effect of the level-2 predictor on the relationship
between a level-1 predictor and the dependent variable. The latter is referred to as a “crossgroup interaction” (e.g., Kreft and de Leew, 1998). There are p + 1 possible level-2 equations,
but, for simplicity, we present only two level-2 equations, based on an analysis with only one
level-1 predictor, 1:
Level-2 equations:
 0 j   00   01  . . .   0 q u0 j
(1.2)
1 j   10   11  . . .   1q u1 j
(1.3)
In the equations above, there are q possible level-2 predictors and  represents the coefficient for
the intercept or slope. 00 or 10 are intercepts and 0q and 1q are slopes and both are commonly
referred to as “fixed effects.” u0j and u1j are level-2 residuals. Their variances, 00 and 11, which
represent the variation of the intercept or slope across groups, are referred to as “random effects”
and can be of principal interest to researchers.
The level-1 and level-2 equations can be written as a single regression equation using
algebraic substitution:
Yij   00   10 x1ij  . . .   p 0 x pij   01 z1 j   11 z1 j u1 j  . . .   0 q xqj   1q xqj u1 j  u0 j  u1 j  rij
(1.4)
The extent to which observations within a group are related can be expressed as
an estimate of the ratio the between-group variation relative to the total variation in the
population, called the ICC:
(1.5)
 00
   00
In this equation,  is the ICC, 00 is the variance of the intercept when there are no predictors2,

2
and  is the within-group variance (i.e, the variance of r). When the ICC equals 0, there is no
Nonconvergence and sample bias 6
difference between OLS regression estimates and those obtained with HLM, because no
clustering exists. OLS standard error estimates become increasingly negatively biased as
clustering within groups increases.
Behavior of HLM Estimates and Their Standard Errors
Over the last decade, there has been a rapid growth in the popularity of multilevel models
in the social sciences and psychology, but it is surprising to learn that there is a great paucity of
published Monte Carlo studies examining the bias or efficiency of coefficients, random effects,
and their standard errors. Textbook discussions frequently cite unpublished doctoral
dissertations or technical reports, and conclude that the most commonly used estimator, restricted
maximum likelihood (REML), shows little bias in fixed effects estimates. 3 Although much of
this work is relevant, researchers interested in applying hierarchical models to dyads do not have
ready access to these findings nor are the findings easy to extrapolate to the dyadic case.
Moreover, in practice, researchers often encounter convergence difficulties due to nonpositive
definite variance estimates, but there is little work documenting the conditions under which these
problems are most likely to occur. Of the few Monte Carlo studies conducted, most have
attempted to retain a constant total sample size, while attempting to evaluate the effects of
differing numbers and observations per group. Thus, their findings make it difficult to
extrapolate to the effects of varying the number of groups when groups sizes are small. In
addition, varying both group size and the number of observations per group simultaneously may
have other problems. Bliese (1998), for instance, shows that the ICC is dependent on group size,
making it difficult to assess the independent effects of ICC and group size without special
precautions.
Nonconvergence and sample bias 7
Perhaps the most frequently discussed simulation work was done by Bassiri (1988),
reported in an unpublished doctoral dissertation. Bassiri examined the effects of intraclass
correlation (.10 vs. .25), the number of observations per group (5 through 150), and the number
of groups (10 through 150) on fixed effects estimates, their standard errors, random effects
estimates, and Type I and Type II error rates with REML estimation. She concluded that, in
general, level-2 estimates (fixed and random effects) are unbiased, consistent, and asymptotically
efficient, that standard errors are primarily a function of the number of groups rather than the
size of the groups, and higher ICC values are associated with poorer precision in fixed and
random effects estimates.
In another doctoral dissertation, Kim (1990) examined the magnitude of slope estimates,
the number of groups (25, 50, 100, and 200), and the number of observations within each group
(10, 20, 30). GLS and full ML estimates were compared using relatively complex regression
models (i.e., 9 parameters). His findings suggest that the greatest bias in fixed and random
effects occurs when there are a relatively large number of observations per group and there are a
small number of groups (i.e., 40 case per group and 25 groups). Kim, however, did not
manipulate group size and the number of groups independently (e.g., in the conditions in which
there were a small number of observations per group there were also more groups), so it is
difficult to discern which of these factors is responsible for parameter bias. In addition, Kim’s
study only included 50 replications per cell, which will lead to less reliable estimates of the true
sampling variability (Efron, 1990).
In another small study, Mok (1995) reached similar conclusions. She concluded that the
number of groups is a more important factor than the number of observations per group in both
bias and efficiency of estimates. She states, "…if resources were available for a sample size n,
Nonconvergence and sample bias 8
comprising J schools with I students from each school, then less bias and more efficiency would
be expected from sample designs involving more schools (large J), and fewer students per school
(small I) than sample designs involving fewer schools (small J), and fewer students per school
(small I)" (p.6).
Busing (1993; see also van der Leeden and Busing; 1994) conducted the most extensive
simulation study to date, using a large number of replications and examining a wide range of
ICC values (.2, .4, .6., and .8), number of groups (5, 10, 25, 50, 100, and 300), and the number of
observations within groups (5, 10, 25, 50, and 100). In addition to parameter and standard error
bias, he examined convergence difficulties and improper solutions. Because the number of
observations per group and the number of groups were independently manipulated and because
as few as 5 observations per group were investigated, these findings are perhaps the most
applicable to the dyadic case. Although fixed estimates performed well in all conditions, Busing
reports that random effects estimates were biased unless more than 300 groups were used. These
results are consistent with those reported by both Kim (1990) and Mok (1995) which show
random effects estimates are affected more by the number of groups rather than the number of
observations per group. With very few observations per group, there was a greater propensity
for nonconvergence or improper solutions (i.e., nonpositive definite matrix), although the largest
percentage of convergence problems, which was found in the smallest group size condition, was
under 2%. It is unclear, however, the extent to which these results will generalize to as few as 2
observations per group.
Donoghue and Jenkins (1992) reported on a small simulation study with 20 replications
in each cell of the design, examining the effects of model misspecification. The study compared
misspecified models (i.e., inclusion of a predictor in the model when it was unrelated to the
Nonconvergence and sample bias 9
outcome and failure to included a predictor in the model even though it had a large relationship
with the outcome) to correctly specified models. The authors report no bias of within-group
error estimates, slopes, intercepts, or covariance estimates in most conditions. These authors also
examine convergence difficulties. Although they indicate that nonpositive definite matrices or
convergence failures were more common with only 5 observations per group, detailed results for
all conditions in their study were not presented. The number of groups and the number of
individuals per group, were not independently manipulated, however, because every condition
contained 1500 observations total, ranging from 10 groups with 150 observations to 300 groups
with 5 observations).
In a recent simulation by Maas and Hox (2002), the authors examine the effects of ICC,
(.10 through .30) group size, and the number of groups of parameter bias. Their study includes
sample sizes as low as 5 observations, with the number of groups varying from 10 to 100.
Although they report that fixed effects are unbiased under all conditions, they do find evidence
of important bias in variance estimates and their standard errors with a small number of
observations per group and a small number of groups.
Although they do not present Monte Carlo findings, Bryk and Raudenbush (1992) and
Snijders and Bosker (1994, 1999) describe pertinent analytic work. Bryk and Raudenbush state
that estimates will be unbiased with balanced data (i.e., equal number of observations per group)
but are too small with unbalanced data, and standard errors will tend to be negatively biased
unless large total sample sizes are used. In addition, they speculate that a small number of
observations per group may be problematic: "We suspect that the likelihood of  [matrix of
variance estimates] can be quite skewed if nj [number of observations per group] is small, even
Nonconvergence and sample bias 10
if J [number of groups] is large, thus rendering test results inaccurate" (p. 224, bracketed text
added).
Snijders and Bosker (1994,1999), based on analytic work with power analysis, discuss
the relationship between standard errors, group size, number of groups, and intraclass
correlation. They show a nonmonotonic decrease in standard errors associated with more
observations per group in some conditions. The shape of the decline across sample sizes,
however, differed by ICC, and, although there was little difference between 2 and 5 observations
per group as a function of ICC, larger group sizes only led to an overall decline in standard errors
when the ICC was low. Higher ICC values (e.g., .2 to .4) showed a slight or no increase in
standard error with larger samples. This finding suggests that, with dyadic data, increasing the
number of groups may not produce consistency in standard errors for larger ICC values. As with
several other studies, the effects of group size and number of groups are difficult to distinguish,
because group size and the number of groups were not independently examined (i.e., a constant
total sample size was used).
Finally, based on their synthesis of existing simulation work and experience with HLM,
some authors have provided recommendations for the optimal number of observations per group
and number of groups. Kreft (1996) proposes a general 30/30 rule, in which there are 30 groups
and 30 observations per group. Elaborating on Kreft’s recommendations, Hox (1998) suggests a
minimum ratio of 50 groups to 20 observations per group in order to test cross-level interactions,
and suggests a minimum ratio of 100 to 10 to test random effects. No specific guidance is
provided on the lowest possible number of observations per group or the minimum number of
groups required when there are a small number of observations, but these recommendations carry
the implication that smaller group sizes of number of groups may be problematic.
Nonconvergence and sample bias 11
In our review of the simulation and analytic work on HLM estimates, it appears that
dyadic researchers have little information to draw from in making analytic decisions. Authors
have most often concluded that the number of groups is more important than group size. No
simulation studies to date have investigated group sizes as small as two. Although two studies
have noted problems with convergence or nonpositive definite matrices that tend to increase with
fewer observations per group, more information is clearly needed on these estimation problems.
Thus, a variety of questions remain about convergence difficulties and the behavior of parameter
estimates and their standard errors when dyadic data are analyzed with hierarchical models. For
instance, what is the minimum number of dyads needed to obtain unbiased parameter estimates?
How likely are convergence difficulties in estimating random effects when dyadic data are used,
and under what conditions are they most or least likely to occur? To what extent do bias and
convergence problems depend on the number of dyads or the ICC?
To address these questions, we examine nonconvergence problems, bias in fixed effects
estimates (intercept and slope estimates), bias in random effects estimates (variance), and bias in
standard errors for these estimates in a large simulation study. These factors are examined under
a wide variety of ICC values and sample sizes in which population values are known.
Method
Design
A Monte Carlo study was undertaken to examine the effects of intraclass correlation and
sample size on nonconvergence, parameter bias, and standard error bias. The study was a 4
(intraclass correlation) X 5 (sample size) experimental design. With a group size of two, a model
estimating both intercept and slope variance is not identified.4 Thus, two separate models were
tested in order to examine nonconvergence, parameter bias, and standard error bias in the
Nonconvergence and sample bias 12
variances estimates of intercepts and slopes. Model 1 estimated the variance of the intercept and
Model 2 estimated the variance of the slope. Each cell of the design contained 200 samples
randomly drawn from a larger population so that expected values could be compared with known
population values.
Data Generation
A total of 400,000 observations were generated representing population data for the four
intraclass correlation conditions. The first step involved the generation of a dependent variable
representing paired, clustered observations. A normal deviate (with mean of 0 and SD of 1) for
50,000 observations was generated using the pseudorandom number generator in SPSS Version
10.0 (i.e., the RV.NORMAL function). To create clustered dyads, a second variable was
generated based on the normal deviate plus random error. The degree of random error in the
computation of the second variable was then varied to create four data sets with differing
correlations between the normal deviate and the second variable (r=.05, .10, .20, and .30). A
third variable, representing a predictor with a known correlation (r=.3), was generated using the
normal deviate plus random error. The data were then disaggregated to create a dependent
variable with one correlated predictor for the four intraclass correlation conditions, representing
four populations of 100,000 observations each.
The 4 X 5 design involved a total of 4,000 replications. Two hundred samples for each
of the 20 conditions were repeatedly drawn from one of the four population data sets using the
pseudorandom number generator in SPSS Version 10.0 (i.e., the SAMPLE procedure). Five
sample sizes, representing different numbers of dyads, nj=50, 100, 200, 500, and 1000 (i.e., 100,
200, 400, 1000, and 2000 individuals respectively) were chosen to represent a range of sample
sizes likely to be employed in psychological research.
Nonconvergence and sample bias 13
Model Specification and Data Analysis
Analyses were conducted using SAS Version 6.12 PROC MIXED with REML
estimation. Two different models were tested: Model 1 specified a random effect of the intercept
(i.e., estimation of 00) and Model 2 specified a random effect of the slope (i.e., estimate of 11).
The two-level equations for the models are as follows:
Model 1
Yij   0 j  1 j x1ij  rij
(1.6)
 0 j   00  u0 j
(1.7)
1 j   10
(1.8)
Substituting, this model can be represented by a single equation as:
Yij   00   10 x1ij  u0 j  rij
(1.9)
Yij   0 j  1 j x1ij  rij
(1.10)
 0 j   00
(1.11)
1 j   10  u1 j
(1.12)
Model 2
Expressed as a single equation:
Yij   00   10 x1ij  u1 j  rij
(1.13)
Nonconvergence was estimated by the number of samples in which no estimate was
available for random effect. A failure to obtain an estimate of the random effect in all cases was
due to a nonpositive gamma matrix, which contains the random effects estimates. Nonpositive
definiteness occurs because an estimate generated during the iterative process is zero or negative,
and because the inversion of such a matrix is undefined, iterations are stopped and no estimate is
Nonconvergence and sample bias 14
available. When a nonpositive definite matrix occurs, SAS PROC MIXED provides output for
fixed effects estimates but prints a warning regarding the random effects. Other packages, may
handle nonpositive solutions differently. For example, by default HLM version 5 resets
nonpositive variances to zero during the iteration process.
We computed bias estimates for the two fixed effects, o and 1, and the random effect,
00 for Model 1 or 11 for Model 2. Bias was computed by subtracting the expected sample value

from the population value: E ˆ   . To obtain a measure of the magnitude of this bias, we
computed a percentage bias measure using a z-value,
z

E ˆ  
(1.14)
SD
in which SD represents the standard deviation of sample estimates of the parameter, ˆ .The
cumulative probability of the z-value was then obtained, and .5 was subtracted. The probability
value was multiplied by 100 to obtain a percent difference from the population mean based on
the normal distribution. We chose this computation of percent bias because of difficulties with
the use of a more common measure of percent bias that divides bias by the population value.
The more common measure of percent bias leads to division by zero if the population value is
zero, and, because of the use of random deviates, population values for the intercept were equal
to zero.
Bias in standard errors was computed by subtracting the standard deviation of the
parameter across samples from the expected value of the standard error for that parameter:
% BIAS 
E  SE   SD
SD
(1.15)
Nonconvergence and sample bias 15
In this equation, SE represents the sample standard error estimate for the parameter, and SD
represents the standard deviation of sample estimates.
Results
Nonconvergence
The number and percentage of samples of 200 that had convergence failures is presented
in Table 1. For Model 1, in which intercept variance was estimated, nonconvergence was a
greater problem with fewer dyads and smaller intraclass correlation. As many as 39% of
samples had a nonpositive definite matrix in the condition with the lowest intraclass correlation
(ICC = .05) and 50 dyads. With a low ICC, even 1000 dyads were insufficient to avoid
convergence problems altogether. With higher ICC values, however, fewer dyads are required to
avoid convergence problems, J of 1000, 200, and 100 for ICCs of .10, .20, and .30, respectively.
For Model 2, which estimated slope variance, convergence was highly problematic and were
seemingly unrelated to intraclass correlation or number of dyads. For nearly all cells of the
design, nearly 50% of variance estimates for slopes were unobtainable.
Parameter Bias
Results for the bias estimates of 0, 1, 00, and 11 appear in Table 2. Bias estimates of
fixed effects 0 and 1 show little if any bias across all conditions. Percentage bias of these
parameters was below 10% and nearly always below 5%. Moreover, bias estimates were not
consistently in a positive or negative direction.
Random effects results for 00, and 11, were quite different, however. Both variance
estimates were consistently positively biased. Bias in intercept variance estimates in Model 1,
00, decreases as the number of dyads increases. This decrease also appears to be dependent on
the ICC, with sample size having a greater effect for higher intraclass coefficients and a slight
Nonconvergence and sample bias 16
tendency toward underestimation of intercept variance with high ICCs in large samples. In
Model 1, which estimates intercept variance (i.e., 00,), all percentage bias estimates were greater
than 10% when ICC = .05. For ICC of .10, .20, and .30, 1000, 200, and 50 dyads, respectively,
appear to be sufficient to keep bias in the estimate of intercept variance under 10%. In Model 2,
which estimates slope variance (i.e., 11,), bias estimates were greater than 10% in all ICC
conditions and were not dependent on sample size.
In summary, it appears that there is little bias in intercept or slope parameter estimates.
Intercept variance estimates will be problematic unless the ICC is greater than .10 and there is
sufficient sample size. Slope variance estimates appear to be positively biased regardless of the
ICC value or the sample size and are therefore likely to be impractical for dyadic researchers.
Standard Error Bias
To investigate potential problems with significance tests of fixed or random effects, we
computed bias for the standard error estimates of 0, 1, 00, and 11. Results are presented in
Table 3. There are few apparent biases for standard errors of intercepts, with nearly every bias
estimate below 10%. However, there did appear to be a trend toward negative bias as ICCs
values increased, and this may suggest some potential difficulties with Type I errors for very
large number of dyads sizes and higher ICCs. Bias was also generally low in standard error
estimates of slopes. Although several cells showed greater than 10% overestimation, these
observations appear to be limited to the lower sample size conditions (e.g., fewer than 200
dyads) in Model 2.
There was evidence of severe bias for variance estimates in both Model 1 and Model 2.
Standard errors for variance estimates of intercepts in Model 1 showed positive bias as high as
69%. Although bias was reduced as sample size increased, as many as 1000 dyads was not
Nonconvergence and sample bias 17
sufficient to reduce the bias in the standard error estimates for the intercept when ICC = .05. To
obtain unbiased standard error estimates for intercepts (Model 1), 500, 200, and 200 dyads
appear to be necessary for ICCs of .10, .20, and .30, respectively. For standard error estimates of
slopes (Model 2), the positive bias was quite high for all conditions, with an average bias of
approximately 64%.
Discussion
An extensive study of convergence difficulties and parameter bias shows important
problems with the use of HLM with dyadic data whenever random effects estimates are of
interest. Although fixed effects estimates of slopes and intercepts and their standard errors show
little bias across any sample size or ICC condition, random effects estimates led to frequent
convergence difficulties, positive bias in parameters, and positive bias in standard errors.
Although intercept random effects were less problematic with larger sample sizes and higher
ICC values, there were serious convergence difficulties, parameter bias, and standard error bias
for slope random effects. In practice, it will be difficult to estimate intercept or slope variances
in many dyadic applications.
These results provide greatly needed information on the application of HLM to dyadic
data and suggest that such applications are largely impractical when research questions concern
variability across dyads. Our findings indicate that convergence with random effects will often
be a serious problem, but a proportion of models specifying random effects will converge. In
these cases, researchers may be tempted to report their findings, but, because of serious problems
with bias in random effects estimates and their standard errors, neither the estimates nor the
significance tests using Wald ratios (i.e., z-test using ratio of the estimate to the standard error)
can be trusted.
Nonconvergence and sample bias 18
Dyadic researchers not interested in random effects estimates could successfully employ
HLM, but estimates would have to be assumed to be invariant across dyads. Inclusion or
exclusion of random effects may have an impact on fixed effects estimates (Kreft & de Leeuw,
1998; Longford, 1993), however, and therefore constraints on random parameters is not
recommended without reasonable justification. When level-2 predictors are incorporated and the
variance is set to zero, researchers can conceptualize level-1 parameters as varying nonrandomly
(Raudenbush & Bryk, 2002). In this case, slopes or intercepts from level 1 varying completely
as a function of some level-2 variable. An additional problem for researchers is that a proportion
of models with random effects will converge and estimates of random effects and their
significance tests will be obtained.
In this study, our focus has been on the application of HLM to dyadic data, but the results
also have implications for growth curve models with two time points. One important difference
may be the range of ICC values expected with longitudinal data. Many variables may have
higher ICC values or consistency over time than were examined in our study (i.e., greater than
.3). Our choice of values for ICC was guided by common values encountered in the dyadic
literature rather than those expected for repeated measures. The results, however, suggest that
ICC values had no effect on slope variance estimates or their standard errors within the range of
values we examined, and, thus, there is little reason to expect a difference in this trend with
higher ICC values.
Although the present study represents valuable information on dyadic analysis with
HLM, there are some important limitations of our findings. Perhaps most importantly,
convergence difficulties and bias in random effects may be maximally estimated here, because
values of both the intercept and slope variance in the population were low. Nonpositive sample
Nonconvergence and sample bias 19
variance estimates are most likely to occur under these circumstances, and, thus, bias in sample
estimates of intercept or slope variance may be less severe when their respective population
parameters are large. Nevertheless, such circumstances are likely to be common in practice
where researchers are unaware of true population values.
More generally, we did not systematically vary whether random or fixed parameters were
correctly or incorrectly specified in the model, and we cannot be certain of the impact of model
misspecification on estimates or standard errors. For fixed effects, both models we tested
represent some degree of misspecification.. With only two observations per group, one must
constrain either the variance estimate for the intercept or slope and their covariance to zero to
identify the model. Furthermore, more explicit comparisons between correct and incorrect
model specification would be needed for exact information about Type I or Type II error rates.
In addition to these issues, there are several other limitations of the present study. First,
the models we examined were relatively simple and did not include level-2 predictors or multiple
level-1 predictors. It is difficult to imagine, however, that better estimates would be obtained
with more complex models. Second, our manipulation of the ICC was not independent of
within-group variance or between-group variance, but differences in these variance components
are not likely to be independent of ICC values in practice. Because sample ICC values were
compared to population ICC values, however, our results provide a fair comparison between
sample and population values and thus accurate estimates of bias in each of the ICC conditions.
Moreover, Bryk and Raudenbush (1992, p. 221) show that standard errors do not depend on
within- or between-group variance when balanced data are used. Third, a number of authors
have attempted to retain an equivalent sample size while examining the effects of the number of
observations per group and the number of groups (e.g., Bassiri, 1988; Kim, 1990; Mok, 1995).
Nonconvergence and sample bias 20
Other authors (Busing, 1993; van der Leeden & Busing, 1994) have attempted to examine group
size and the number of groups independently, allowing total sample size to fluctuate. It may be
impossible to disentangle the importance of group size, number of groups, and total sample size.
Regardless of the resolution of this issue, our focus on dyadic data meant that the effects of the
number of groups and the total sample size could not be separated.
Our findings do not provide dyadic researchers interested in testing random effects with
much reason for optimism, and we can offer few suggestions for existing solutions. Some
authors (Bryk & Raudenbush, 1992; Snijders & Bosker, 1999) recommend that variance tests be
conducted via likelihood ratio tests using FIML instead of Wald tests. We did not obtain either
FIML estimates or nested model tests, and, thus, have no information on the usefulness of this
recommendation with dyadic data. Because REML and FIML estimates are similar with larger
sample sizes (Kim, 1990), we would expect little difference from our findings using FIML in
place of REML. Furthermore, given the rather severe bias in estimates for random effects, it is
unlikely nested tests of these biased parameters would be a substantial improvement over Wald
ratio tests. Researchers also rarely report or test random effects using nested models, typically
relying on the Wald tests automatically generated by the software.
van der Leeden et al. (1997)
have suggested the use of bootstrap methods for standard error estimates and future work on this
may prove useful for testing random effects with dyads. Another possible solution may involve
a bias correction factor that could be applied to parameters and standard error estimates. Until
such work is completed and tested in simulation work, researchers need to be cautioned against
the use of random effects with dyadic data.
Nonconvergence and sample bias 21
References
Aitkin, M., & Longford, N. (1986). Statistical modeling issues in school effectiveness
studies. Journal of the Royal Statistical Society, Series A, 149, 1-43.
Barnett, R.C., Marshall, N.L., Raudenbush, S.W., & Brennan, R.T. (1993). Gender and
the relationship between job experiences and psychological distress: A study of dual-earner
couples. Journal of Personality and Social Psychology, 64, 794-806.
Bassiri, D. (1988). Large and small sample properties of maximum likelihood estimates
for the hierarchical linear model. Unpublished doctoral dissertation. East Lansing, MI,
Michigan State University.
Bliese, P.D. (1998). Group size, ICC values, and group-level correlations: A simulation.
Organizational Research Methods, 1, 355-373
Bryk, A.S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and
data analysis methods. Newbury Park, CA: Sage.
Busing, F.M.T.A. (1993). Distribution characteristics of variance estimates in two-level
models. Preprint PRM 93-04. Psychometric and Research Methodology, Leiden, Netherlands.
de Leeuw, J. & Kreft, I. (1986). Random coefficient models for multilevel analysis.
Journal of Educational Statistics, 11, 57-85.
Donoghue, J.R., & Jenkins, F. (1992). A Monte Carlo study of ethe effects of model
misspecification on HLM estimates. Technical report, Educational Testing Service, Princeton,
NJ. November 1992.
Efron (1990). More efficient bootstrap computations. Journal of the American Statistical
Association, 85, 79-89.
Nonconvergence and sample bias 22
Goldstein, H. (1986). Multilevel mixed linear model analysis with iterative generalized
least squares. Biometrika, 73, 43-56.
Hox, J. (1998). Multilevel modeling: When and why. In R.Mathar & M. Schader,
Classifcation, data analysis, and data highways. Berlin, Germany: Springer-Verlag.
Kashy, D.A., & Kenny, D.A. (2000). The analysis of data from dyads and groups. In
H.T. Reis & C. M. Judd, Handbook of research methods in social and personality psychology
(pp. 451-477). Cambridge: Cambridge University Press.
Kim, K.-S. (1990). Multilevel data analysis: A comparison of analytic alternatives.
Unpublished doctoral dissteration. University of California, Los Angeles, CA.
Kreft, I.G.G. (1996). Are multilevel techniques necessary? An overview, including
simulation studies. Unpublished manuscript, California State University, Los Angeles, CA.
Kreft, I.G.G., & de Leeuw, J. (1998). Introducing multilevel modeling. London: Sage.
Kreft, I.G.T., de Leeuw, J., & Aiken, L. (1995). The effect of different forms of
centering in hierarchical linear models. Multivariate Behavioral Research, 30, 1-22.
Kurdek, L.A. (1997). Adjustment to relationship dissolution in gay, lesbian, and
heterosexual partners. Personal Relationships, 4, 145-161.
Longford, N. (1993). Random coefficient models. Oxford, England: Oxford University
Press.
Maas, C.J.M., & Hox, J.J. (2002). Sufficient sample sizes for multilevel modeling.
Unpublished manuscript, Utrecht University, The Netherlands.
Maguire, M. C. (1999). Treating the dyad as the unit of analysis: A primer on three
analytic approaches. Journal of Marriage and Family, 61, 213-223
Nonconvergence and sample bias 23
Mason, W., Wong, G., & Entwisle, B. (1984). Contextual analysis through the multilevel linear model. In S. Leinhardt, (Ed.), Sociological Methodology (pp. 72-103). San
Francisco: Jossey-Bass.
Mok, M. (1995) Sample size requirements for 2-level designs in educational research.
Unpublished manuscript, Macquarie University, Sydney Australia.
Newsom, J.T. (in press). A multilevel structural equation model for dyadic data.
Structural Equation Modeling.
Raudenbush, S., & Bryk, A. (1986). A hierarchical model for studying school effects.
Sociology of Education, 59, 1-17.
Raudenbush, S., & Bryk, A. (2002). Hierarchical linear models: Applications and data
analysis methods (second edition). Thousand Oaks, CA: Sage.
Kreft, IG.G., de Leeuw, J., & Aiken, L. (1995). The effect of different forms of centering
in hierarchical linear models. Multivariate Behavioral Research, 30, 1-22.
Snijders, T. A.B., & Bosker, R.J. (1994). Modeled variance in two-level models.
Sociological Methods and Research, 22, 342-363
Snijders, T. A.B., & Bosker, R.J. (1999). Multilevel analysis: An introduction to basic
and advanced multilevel modeling. Thousand Oaks, CA: Sage.
van der Leeden, R., & Busing, F.M.T.A. (1994). First iteration versus IGLS/RIGLS
estimates in two-level models: A Monte Carlo study with ML3. Preprint PRM 94-03.
Psychometrics and Research Methodology, Leiden, Netherlands.
Windle, M., & Dumenci, L. (1997). Parental and occupational stress as predictors of
depressive symptoms among dual-income couples: A multilevel modeling approach. Journal of
Marriage & the Family, 59, 625-634.
Nonconvergence and sample bias 24
Table 1
Random Effect Nonconvergence in Models 1 (Intercept) and 2 (Slope).
ICC .05
Number of dyads
ICC .10
ICC .20
ICC .30
n
%
n
%
n
%
n
%
50
78
39.0
54
27.0
14
7.0
5
2.5
100
59
29.5
49
24.5
5
2.5
0
0.0
200
54
27.0
26
13.0
0
0.0
0
0.0
500
29
14.5
2
1.0
0
0.0
0
0.0
1000
15
7.5
0
0.0
0
0.0
0
0.0
50
115
57.5
113
56.5
111
55.5
114
57.0
100
114
57.0
99
49.5
98
49.0
109
54.5
200
113
56.5
104
52.0
89
44.5
115
57.5
500
118
59.0
98
49.0
81
40.5
114
57.0
1000
114
57.0
113
56.5
71
35.5
122
61.0
Model 1
(Intercept variance)
Model 2
(Slope variance)
Nonconvergence and sample bias 25
Table 2
Sample Bias of Parameter Estimates for Intercepts, Slopes, and Random Effects.
B0
B1
T00/ T11
Number
of dyads
bias
%
bias
%
bias
%
50
0.00362
1.64
0.00773
3.23
0.07344
32.69
100
-0.00392
-2.37
-0.00303
-1.85
0.04269
28.18
200
-0.00264
-1.98
0.00559
4.65
0.02815
24.58
500
-0.00598
-7.74
0.00468
6.38
0.01518
18.29
1000
-0.00212
-3.95
0.00348
6.66
0.00807
12.83
50
0.00816
3.24
-0.00284
-1.15
0.05591
21.35
100
0.00295
1.72
0.00914
5.11
0.03022
16.57
200
-0.00323
-2.56
0.00455
3.88
0.01173
8.47
500
0.00036
0.48
0.00137
1.88
0.00517
5.39
1000
-0.00019
-0.36
0.00069
1.29
0.00003
0.042
50
-0.00239
-0.95
-0.00418
-1.84
0.02768
9.10
100
-0.00169
-0.92
-0.00303
-1.98
0.01978
9.45
200
0.00015
0.12
0.00092
0.82
0.00436
2.77
500
-0.00155
-1.95
-0.00139
-1.90
0.00073
0.67
1000
-0.00112
-1.96
0.00012
0.22
-0.00013
-0.18
50
-0.01604
-5.71
-0.00599
-2.59
0.01200
4.06
100
-0.00641
-3.55
-0.00080
-0.48
0.00815
3.73
200
-0.00095
-0.71
0.00173
1.46
-0.00006
-0.04
500
0.00012
0.14
-0.00043
-0.59
-0.00254
-2.58
1000
-0.00068
-1.19
-0.00069
-1.28
-0.00610
-7.80
Model 1
ICC .05
ICC .10
ICC .20
ICC.30
Nonconvergence and sample bias 26
(Table 2 continued)
B0
B1
T00/ T11
Number
of dyads
bias
%
bias
%
bias
%
Model 2
ICC .05
50
0.00526
2.33
0.00525
2.36
0.07022
39.00
100
-0.00055
-0.34
0.00739
4.64
0.04671
39.15
200
-0.00260
-1.99
0.00350
2.83
0.02936
39.72
500
-0.00093
-1.15
0.00463
6.71
0.01955
39.34
1000
0.00116
2.17
0.00252
4.68
0.01204
36.18
50
-0.00454
-1.90
0.00120
0.47
0.07712
35.78
100
0.00010
0.053
0.00670
4.03
0.04568
37.56
200
-0.00096
-0.79
0.00285
2.17
0.03518
39.37
500
0.00059
0.83
0.00083
1.10
0.01821
37.05
1000
-0.00986
2.24
0.00070
-0.26
0.07924
43.08
50
-0.00986
-4.03
0.00070
0.29
0.07924
31.01
100
-0.00307
-1.79
0.00440
2.81
0.04561
35.51
200
-0.00433
-3.26
0.00068
0.56
0.03281
36.30
500
-0.00213
-2.65
0.00056
0.69
0.01817
34.42
1000
-0.00189
-3.51
0.00107
1.96
0.01169
33.12
50
-0.00142
-5.06
-0.01358
-5.48
0.06667
35.31
100
0.00308
1.62
0.00168
1.01
0.04562
39.18
200
-0.00154
-1.25
0.00590
5.11
0.03341
40.27
500
-0.00180
-2.32
-0.00145
-2.22
0.01862
39.91
1000
0.00279
4.97
0.00199
3.80
0.01222
41.34
ICC .10
ICC .20
ICC .30
Nonconvergence and sample bias 27
Table 3
Sample Bias of Standard Error Estimates for Intercepts, Slopes, and Random Effects.
B0
B1
T00 /T11
Number
of dyads
J
Bias
%
Bias
%
Bias
%
Model 1
ICC .05
50 122
0.01251
14.18
-0.00074
-0.78
0.05404
69.31
100 141
0.00390
5.91
0.00130
1.98
0.03617
65.96
200 146
-0.00401
-7.52
-0.00029
-0.60
0.02215
52.04
500 171
0.00032
1.03
0.00093
3.18
0.00876
27.48
1000 185
0.00047
2.18
0.00051
2.46
0.00408
16.56
50 146
0.00268
2.68
-0.00282
-2.85
0.03595
36.23
100 151
0.00367
5.37
-0.00356
-5.00
0.02359
33.42
200 174
-0.00018
-0.35
0.00100
2.13
0.01069
19.47
500 198
0.00085
2.77
0.00092
3.18
0.00283
7.41
1000 200
0.00156
7.54
-0.00002
-0.12
0.00129
4.67
50 186
0.00499
4.97
0.00325
3.60
0.01590
13.22
100 195
0.00076
1.04
0.00497
8.15
0.01161
14.05
200 200
-0.00038
-0.73
0.00183
4.07
0.00311
4.96
500 200
0.00108
3.41
0.00048
1.64
-0.00191
-4.41
1000 200
0.00047
2.09
-0.00001
-0.08
-0.00057
-1.90
50 195
-0.00355
-3.18
-0.00156
-1.70
0.01924
16.33
100 200
0.00490
6.82
-0.00163
-2.45
0.00987
11.33
200 200
0.00042
0.78
-0.00132
-2.80
0.00203
3.09
500 200
0.00053
1.57
0.00017
0.57
0.00309
7.88
1000 200
0.00120
5.27
-0.00098
-4.57
-0.00110
-3.49
ICC .10
ICC.20
ICC .30
Nonconvergence and sample bias 28
(Table 3 Continued )
B0
B1
T00 /T11
Number
of
dyads
J
Bias
%
Bias
%
Bias
%
Model 2
ICC .05
50 122
0.00602
6.73
0.01743
19.86
0.04722
82.50
100 141
0.00172
2.63
0.00792
12.49
0.02738
72.39
200 146
-0.00464
-8.88
0.00057
1.16
0.01988
85.69
500 171
-0.00227
-7.01
0.00362
13.23
0.00963
61.29
1000 185
-0.00005
-0.24
0.00016
0.76
0.00614
55.50
50 146
0.00034
0.36
0.00363
3.56
0.03425
47.55
100 151
-0.00650
-8.81
0.00580
8.75
0.02711
68.43
200 174
-0.00060
-1.25
-0.00188
-3.57
0.01665
59.02
500 198
0.00190
6.73
0.00113
3.77
0.00997
61.80
1000 200
0.00159
7.53
0.00038
1.79
0.00779
74.53
50 186
-0.00275
-2.82
0.01020
10.75
0.01491
16.53
100 195
-0.00112
-1.63
0.00976
15.62
0.02539
58.93
200 200
-0.00535
-10.11
0.00199
4.15
0.01477
49.25
500 200
-0.00191
-5.97
-0.00078
-2.44
0.00851
47.41
1000 200
-0.00017
-0.77
0.00019
0.86
0.00612
50.15
50 195
-0.01690
-15.15
0.00434
4.40
0.03493
55.01
100 200
-0.00844
-11.16
0.00552
8.36
0.02891
78.35
200 200
-0.00134
-2.74
0.00394
8.57
0.01916
74.39
500 200
-0.00090
-2.91
0.00413
15.37
0.01179
80.77
1000 200
-0.00105
-4.73
0.00078
3.75
0.00851
94.93
ICC .10
ICC .20
ICC .30
Nonconvergence and sample bias 29
Footnotes
1
Although there are several estimators available with HLM, such as restricted maximum
likelihood, generalized least squares, or ordinary least squares, we will use “ordinary least squares” to
refer to standard, nonhierarchical linear regression, and “HLM” to refer to multilevel regression using
any of these possible estimators.
2
The notation for  does not typically involve an exponent, although it is a variance rather than a
standard deviation.
3
Although several of the studies mentioned here (e.g., van der Leeden & Busing, 1994) address
issues of the relative efficiency of different estimators, such as full maximum likelihood and
generalized least squares, we focus our discussion on findings that are applicable to REML, by far the
most commonly used estimator.
4
It is not possible to estimate both variances (or “random effects”) in hierarchical linear models,
because the number of random parameters that can be estimated is limited by the number of variancecovariance matrix elements within each level-2 unit (Snijders & Bosker, 1999). The variancecovariance matrix among level-1 units is given by:
  00  2 01 xi 1   11 xi21   2
 00   01  xi 1  xi 2    11 xi 1 xi 2 


 00  2 01 xi 2   11 xi22   2 
 00   01  xi 1  xi 2    11 xi 1 xi 2
where  00 ,  11 and  01 are the variance estimates for the intercept and slope and the covariance between
intercept and slope, respectively, and  2 is the variance of the level-1 residual. xi 1 and xi  2 is the
predictor variable for the two observations in a dyad (Goldstein, 1999). The level-1 variancecovariance matrix for dyadic data (i.e., 2 observations per group or nj = 2), given above, contains
n j  n j  1 / 2 or 2  2  1 / 2  3 unique elements. A model containing a single level-1 predictor with
random effects estimated for the intercept and slope, such as that in the models tested here requires 4
Nonconvergence and sample bias 30
parameters to be estimated:  00 ,  11 ,  01 , and  2 . In general,  q  1 q  2   / 2  1 parameters are
required for a model with all possible random variances and covariances. With dyadic data, a model
with a random intercept and slope results in 1 too many parameters to be estimated given the number
of covariance elements available, leading to identification problems.
Download