HIERARCHICAL LINEAR MODELING OF DYADIC DATA

Nonconvergence and sample bias 1 Running head: HIERARCHICAL LINEAR MODELING OF DYADIC DATA Nonconvergence and Sample Bias in Hierarchical Linear Modeling of Dyadic Data Jason T. Newsom and Masami Nishishiba Portland State University Draft: 2/13/02 We thank Tasha Beretvas, Joop Hox, and Aloen Townsend for helpful comments on an earlier draft of the manuscript. Address correspondence to Jason T. Newsom, Ph.D., Institute on Aging, School of Community Health, Portland State University, P.O. Box 751, Portland, OR 972070751 or newsomj@pdx.edu. Nonconvergence and sample bias 2 Abstract Recent statistical development and software availability for hierarchical linear models has led to an increasingly wide range of research applications. Although researchers have begun to use these techniques and even recommend their use with dyadic data, very little is known about estimation with only two observations per group. This Monte Carlo study examines the effects of the number of dyads and the intraclass correlation coefficient on convergence difficulties and bias of parameter estimates and their standard errors. Results show that convergence problems with intercept and slope variance estimates are extremely common. Bias was generally low for fixed effects and their standard errors, but random effects and their standard errors frequently showed serious bias. The findings suggest that random effect estimation and significance tests are largely impractical with dyadic data. Nonconvergence and sample bias 3 Convergence Difficulties and Sample Bias in Hierarchical Linear Modeling of Dyadic Data With the recent development and widespread availability of hierarchical linear modeling (HLM) techniques, new analysis strategies for a variety of research designs have emerged. HLM is an appropriate analytic technique for analysis of nested or hierarchically structured data in which individual observations are nested within groups. A common example is data that involve students nested within classrooms. Students who share the same teacher, facilities, or curriculum tend to have related or dependent scores. These data structures lead to violation of standard independence assumptions of traditional regression analysis whenever the measure of nonindependence, the intraclass correlation coefficient (ICC), is greater than zero. Among several new areas of the application of HLM is the analysis of dyadic data, such as couples, twins, parent-child interaction, or friendship pairs. There is a growing body of published reports that use HLM for analysis of dyadic data (e.g., Barnett, Marshall, Raudenbush, & Brennan, 1993; Kurdek, 1997; Windle & Dumenci, 1997), and a few sources that suggest HLM as an option for dyadic data (e.g., Maguire, 1999; Kashy & Kenny, 2000; Newsom, in press). Analysis of dyadic data is appropriate under the rationale of HLM, because dyad members are individuals nested within groups of two. However, little simulation work has been conducted that examines the behavior of estimates and their standard errors when data are from individuals nested within very small groups, and no studies that we could find have examined dyads. The growing application of HLM to dyadic data has been proceeding with virtually no information available to researchers about the practical difficulties of estimation, such as convergence failures, bias in regression estimates, or bias in standard errors and significance tests. The present study examines these issues in an extensive Monte Carlo study of the HLM with dyadic data under a variety of conditions. Nonconvergence and sample bias 4 Hierarchical Linear Models HLM, sometimes referred to as "multilevel regression" or "multilevel modeling", is a regression-based analysis that can be conceptualized as a two-level regression (Aitkin & Longford, 1986; de Leeuw & Kreft, 1986: Goldstein, 1986; Mason, Wong, & Entwisle, 1984; Raudenbush & Bryk, 1986). The first level of analysis involves an identical regression analysis repeated within each group. This regression model follows the ordinary least squares regression model1 with as many as p independent variables: Level 1: Yij   0 j  1 j x1ij  . . .   pj x pij  rij (1.1) In the above equation, subscripts i and j represent individuals and groups, respectively. The subscript, p, designates the number of predictors at level 1. In the case of dyads, the number of observations per group, nj, is equal to two. In the second-level of analysis, the intercept, 0, and slopes, p, serve as dependent variables in another regression analysis using predictors measured at the group level. Such predictors might include classroom size, the teacher’s race, or average student socioeconomic status with classroom data, or household income, the number of years married, or twin’s age in the dyadic example. As in ordinary least squares regression analysis, the intercept represents the value of the dependent variable when the predictors equal zero. It is also possible to “center” predictors, by computing deviations from the group mean or grand mean. Centering produces level-1 intercepts that represent the group mean or grand mean for each group (for a more complete discussion of this topic, see Kreft, de Leeuw, & Aiken, 1995). Thus, the level-two analysis is a regression that predicts the intercept or a particular slope for each group. Under certain coding schemes, the level-2 intercept can be interpreted as a grand mean. The level-2 slopes can be interpreted as the effect of a level-2 predictor on the group Nonconvergence and sample bias 5 average when predicting intercepts or the effect of the level-2 predictor on the relationship between a level-1 predictor and the dependent variable. The latter is referred to as a “crossgroup interaction” (e.g., Kreft and de Leew, 1998). There are p + 1 possible level-2 equations, but, for simplicity, we present only two level-2 equations, based on an analysis with only one level-1 predictor, 1: Level-2 equations:  0 j   00   01  . . .   0 q u0 j (1.2) 1 j   10   11  . . .   1q u1 j (1.3) In the equations above, there are q possible level-2 predictors and  represents the coefficient for the intercept or slope. 00 or 10 are intercepts and 0q and 1q are slopes and both are commonly referred to as “fixed effects.” u0j and u1j are level-2 residuals. Their variances, 00 and 11, which represent the variation of the intercept or slope across groups, are referred to as “random effects” and can be of principal interest to researchers. The level-1 and level-2 equations can be written as a single regression equation using algebraic substitution: Yij   00   10 x1ij  . . .   p 0 x pij   01 z1 j   11 z1 j u1 j  . . .   0 q xqj   1q xqj u1 j  u0 j  u1 j  rij (1.4) The extent to which observations within a group are related can be expressed as an estimate of the ratio the between-group variation relative to the total variation in the population, called the ICC: (1.5)  00    00 In this equation,  is the ICC, 00 is the variance of the intercept when there are no predictors2,  2 and  is the within-group variance (i.e, the variance of r). When the ICC equals 0, there is no Nonconvergence and sample bias 6 difference between OLS regression estimates and those obtained with HLM, because no clustering exists. OLS standard error estimates become increasingly negatively biased as clustering within groups increases. Behavior of HLM Estimates and Their Standard Errors Over the last decade, there has been a rapid growth in the popularity of multilevel models in the social sciences and psychology, but it is surprising to learn that there is a great paucity of published Monte Carlo studies examining the bias or efficiency of coefficients, random effects, and their standard errors. Textbook discussions frequently cite unpublished doctoral dissertations or technical reports, and conclude that the most commonly used estimator, restricted maximum likelihood (REML), shows little bias in fixed effects estimates. 3 Although much of this work is relevant, researchers interested in applying hierarchical models to dyads do not have ready access to these findings nor are the findings easy to extrapolate to the dyadic case. Moreover, in practice, researchers often encounter convergence difficulties due to nonpositive definite variance estimates, but there is little work documenting the conditions under which these problems are most likely to occur. Of the few Monte Carlo studies conducted, most have attempted to retain a constant total sample size, while attempting to evaluate the effects of differing numbers and observations per group. Thus, their findings make it difficult to extrapolate to the effects of varying the number of groups when groups sizes are small. In addition, varying both group size and the number of observations per group simultaneously may have other problems. Bliese (1998), for instance, shows that the ICC is dependent on group size, making it difficult to assess the independent effects of ICC and group size without special precautions. Nonconvergence and sample bias 7 Perhaps the most frequently discussed simulation work was done by Bassiri (1988), reported in an unpublished doctoral dissertation. Bassiri examined the effects of intraclass correlation (.10 vs. .25), the number of observations per group (5 through 150), and the number of groups (10 through 150) on fixed effects estimates, their standard errors, random effects estimates, and Type I and Type II error rates with REML estimation. She concluded that, in general, level-2 estimates (fixed and random effects) are unbiased, consistent, and asymptotically efficient, that standard errors are primarily a function of the number of groups rather than the size of the groups, and higher ICC values are associated with poorer precision in fixed and random effects estimates. In another doctoral dissertation, Kim (1990) examined the magnitude of slope estimates, the number of groups (25, 50, 100, and 200), and the number of observations within each group (10, 20, 30). GLS and full ML estimates were compared using relatively complex regression models (i.e., 9 parameters). His findings suggest that the greatest bias in fixed and random effects occurs when there are a relatively large number of observations per group and there are a small number of groups (i.e., 40 case per group and 25 groups). Kim, however, did not manipulate group size and the number of groups independently (e.g., in the conditions in which there were a small number of observations per group there were also more groups), so it is difficult to discern which of these factors is responsible for parameter bias. In addition, Kim’s study only included 50 replications per cell, which will lead to less reliable estimates of the true sampling variability (Efron, 1990). In another small study, Mok (1995) reached similar conclusions. She concluded that the number of groups is a more important factor than the number of observations per group in both bias and efficiency of estimates. She states, "…if resources were available for a sample size n, Nonconvergence and sample bias 8 comprising J schools with I students from each school, then less bias and more efficiency would be expected from sample designs involving more schools (large J), and fewer students per school (small I) than sample designs involving fewer schools (small J), and fewer students per school (small I)" (p.6). Busing (1993; see also van der Leeden and Busing; 1994) conducted the most extensive simulation study to date, using a large number of replications and examining a wide range of ICC values (.2, .4, .6., and .8), number of groups (5, 10, 25, 50, 100, and 300), and the number of observations within groups (5, 10, 25, 50, and 100). In addition to parameter and standard error bias, he examined convergence difficulties and improper solutions. Because the number of observations per group and the number of groups were independently manipulated and because as few as 5 observations per group were investigated, these findings are perhaps the most applicable to the dyadic case. Although fixed estimates performed well in all conditions, Busing reports that random effects estimates were biased unless more than 300 groups were used. These results are consistent with those reported by both Kim (1990) and Mok (1995) which show random effects estimates are affected more by the number of groups rather than the number of observations per group. With very few observations per group, there was a greater propensity for nonconvergence or improper solutions (i.e., nonpositive definite matrix), although the largest percentage of convergence problems, which was found in the smallest group size condition, was under 2%. It is unclear, however, the extent to which these results will generalize to as few as 2 observations per group. Donoghue and Jenkins (1992) reported on a small simulation study with 20 replications in each cell of the design, examining the effects of model misspecification. The study compared misspecified models (i.e., inclusion of a predictor in the model when it was unrelated to the Nonconvergence and sample bias 9 outcome and failure to included a predictor in the model even though it had a large relationship with the outcome) to correctly specified models. The authors report no bias of within-group error estimates, slopes, intercepts, or covariance estimates in most conditions. These authors also examine convergence difficulties. Although they indicate that nonpositive definite matrices or convergence failures were more common with only 5 observations per group, detailed results for all conditions in their study were not presented. The number of groups and the number of individuals per group, were not independently manipulated, however, because every condition contained 1500 observations total, ranging from 10 groups with 150 observations to 300 groups with 5 observations). In a recent simulation by Maas and Hox (2002), the authors examine the effects of ICC, (.10 through .30) group size, and the number of groups of parameter bias. Their study includes sample sizes as low as 5 observations, with the number of groups varying from 10 to 100. Although they report that fixed effects are unbiased under all conditions, they do find evidence of important bias in variance estimates and their standard errors with a small number of observations per group and a small number of groups. Although they do not present Monte Carlo findings, Bryk and Raudenbush (1992) and Snijders and Bosker (1994, 1999) describe pertinent analytic work. Bryk and Raudenbush state that estimates will be unbiased with balanced data (i.e., equal number of observations per group) but are too small with unbalanced data, and standard errors will tend to be negatively biased unless large total sample sizes are used. In addition, they speculate that a small number of observations per group may be problematic: "We suspect that the likelihood of  [matrix of variance estimates] can be quite skewed if nj [number of observations per group] is small, even Nonconvergence and sample bias 10 if J [number of groups] is large, thus rendering test results inaccurate" (p. 224, bracketed text added). Snijders and Bosker (1994,1999), based on analytic work with power analysis, discuss the relationship between standard errors, group size, number of groups, and intraclass correlation. They show a nonmonotonic decrease in standard errors associated with more observations per group in some conditions. The shape of the decline across sample sizes, however, differed by ICC, and, although there was little difference between 2 and 5 observations per group as a function of ICC, larger group sizes only led to an overall decline in standard errors when the ICC was low. Higher ICC values (e.g., .2 to .4) showed a slight or no increase in standard error with larger samples. This finding suggests that, with dyadic data, increasing the number of groups may not produce consistency in standard errors for larger ICC values. As with several other studies, the effects of group size and number of groups are difficult to distinguish, because group size and the number of groups were not independently examined (i.e., a constant total sample size was used). Finally, based on their synthesis of existing simulation work and experience with HLM, some authors have provided recommendations for the optimal number of observations per group and number of groups. Kreft (1996) proposes a general 30/30 rule, in which there are 30 groups and 30 observations per group. Elaborating on Kreft’s recommendations, Hox (1998) suggests a minimum ratio of 50 groups to 20 observations per group in order to test cross-level interactions, and suggests a minimum ratio of 100 to 10 to test random effects. No specific guidance is provided on the lowest possible number of observations per group or the minimum number of groups required when there are a small number of observations, but these recommendations carry the implication that smaller group sizes of number of groups may be problematic. Nonconvergence and sample bias 11 In our review of the simulation and analytic work on HLM estimates, it appears that dyadic researchers have little information to draw from in making analytic decisions. Authors have most often concluded that the number of groups is more important than group size. No simulation studies to date have investigated group sizes as small as two. Although two studies have noted problems with convergence or nonpositive definite matrices that tend to increase with fewer observations per group, more information is clearly needed on these estimation problems. Thus, a variety of questions remain about convergence difficulties and the behavior of parameter estimates and their standard errors when dyadic data are analyzed with hierarchical models. For instance, what is the minimum number of dyads needed to obtain unbiased parameter estimates? How likely are convergence difficulties in estimating random effects when dyadic data are used, and under what conditions are they most or least likely to occur? To what extent do bias and convergence problems depend on the number of dyads or the ICC? To address these questions, we examine nonconvergence problems, bias in fixed effects estimates (intercept and slope estimates), bias in random effects estimates (variance), and bias in standard errors for these estimates in a large simulation study. These factors are examined under a wide variety of ICC values and sample sizes in which population values are known. Method Design A Monte Carlo study was undertaken to examine the effects of intraclass correlation and sample size on nonconvergence, parameter bias, and standard error bias. The study was a 4 (intraclass correlation) X 5 (sample size) experimental design. With a group size of two, a model estimating both intercept and slope variance is not identified.4 Thus, two separate models were tested in order to examine nonconvergence, parameter bias, and standard error bias in the Nonconvergence and sample bias 12 variances estimates of intercepts and slopes. Model 1 estimated the variance of the intercept and Model 2 estimated the variance of the slope. Each cell of the design contained 200 samples randomly drawn from a larger population so that expected values could be compared with known population values. Data Generation A total of 400,000 observations were generated representing population data for the four intraclass correlation conditions. The first step involved the generation of a dependent variable representing paired, clustered observations. A normal deviate (with mean of 0 and SD of 1) for 50,000 observations was generated using the pseudorandom number generator in SPSS Version 10.0 (i.e., the RV.NORMAL function). To create clustered dyads, a second variable was generated based on the normal deviate plus random error. The degree of random error in the computation of the second variable was then varied to create four data sets with differing correlations between the normal deviate and the second variable (r=.05, .10, .20, and .30). A third variable, representing a predictor with a known correlation (r=.3), was generated using the normal deviate plus random error. The data were then disaggregated to create a dependent variable with one correlated predictor for the four intraclass correlation conditions, representing four populations of 100,000 observations each. The 4 X 5 design involved a total of 4,000 replications. Two hundred samples for each of the 20 conditions were repeatedly drawn from one of the four population data sets using the pseudorandom number generator in SPSS Version 10.0 (i.e., the SAMPLE procedure). Five sample sizes, representing different numbers of dyads, nj=50, 100, 200, 500, and 1000 (i.e., 100, 200, 400, 1000, and 2000 individuals respectively) were chosen to represent a range of sample sizes likely to be employed in psychological research. Nonconvergence and sample bias 13 Model Specification and Data Analysis Analyses were conducted using SAS Version 6.12 PROC MIXED with REML estimation. Two different models were tested: Model 1 specified a random effect of the intercept (i.e., estimation of 00) and Model 2 specified a random effect of the slope (i.e., estimate of 11). The two-level equations for the models are as follows: Model 1 Yij   0 j  1 j x1ij  rij (1.6)  0 j   00  u0 j (1.7) 1 j   10 (1.8) Substituting, this model can be represented by a single equation as: Yij   00   10 x1ij  u0 j  rij (1.9) Yij   0 j  1 j x1ij  rij (1.10)  0 j   00 (1.11) 1 j   10  u1 j (1.12) Model 2 Expressed as a single equation: Yij   00   10 x1ij  u1 j  rij (1.13) Nonconvergence was estimated by the number of samples in which no estimate was available for random effect. A failure to obtain an estimate of the random effect in all cases was due to a nonpositive gamma matrix, which contains the random effects estimates. Nonpositive definiteness occurs because an estimate generated during the iterative process is zero or negative, and because the inversion of such a matrix is undefined, iterations are stopped and no estimate is Nonconvergence and sample bias 14 available. When a nonpositive definite matrix occurs, SAS PROC MIXED provides output for fixed effects estimates but prints a warning regarding the random effects. Other packages, may handle nonpositive solutions differently. For example, by default HLM version 5 resets nonpositive variances to zero during the iteration process. We computed bias estimates for the two fixed effects, o and 1, and the random effect, 00 for Model 1 or 11 for Model 2. Bias was computed by subtracting the expected sample value  from the population value: E ˆ   . To obtain a measure of the magnitude of this bias, we computed a percentage bias measure using a z-value, z  E ˆ   (1.14) SD in which SD represents the standard deviation of sample estimates of the parameter, ˆ .The cumulative probability of the z-value was then obtained, and .5 was subtracted. The probability value was multiplied by 100 to obtain a percent difference from the population mean based on the normal distribution. We chose this computation of percent bias because of difficulties with the use of a more common measure of percent bias that divides bias by the population value. The more common measure of percent bias leads to division by zero if the population value is zero, and, because of the use of random deviates, population values for the intercept were equal to zero. Bias in standard errors was computed by subtracting the standard deviation of the parameter across samples from the expected value of the standard error for that parameter: % BIAS  E  SE   SD SD (1.15) Nonconvergence and sample bias 15 In this equation, SE represents the sample standard error estimate for the parameter, and SD represents the standard deviation of sample estimates. Results Nonconvergence The number and percentage of samples of 200 that had convergence failures is presented in Table 1. For Model 1, in which intercept variance was estimated, nonconvergence was a greater problem with fewer dyads and smaller intraclass correlation. As many as 39% of samples had a nonpositive definite matrix in the condition with the lowest intraclass correlation (ICC = .05) and 50 dyads. With a low ICC, even 1000 dyads were insufficient to avoid convergence problems altogether. With higher ICC values, however, fewer dyads are required to avoid convergence problems, J of 1000, 200, and 100 for ICCs of .10, .20, and .30, respectively. For Model 2, which estimated slope variance, convergence was highly problematic and were seemingly unrelated to intraclass correlation or number of dyads. For nearly all cells of the design, nearly 50% of variance estimates for slopes were unobtainable. Parameter Bias Results for the bias estimates of 0, 1, 00, and 11 appear in Table 2. Bias estimates of fixed effects 0 and 1 show little if any bias across all conditions. Percentage bias of these parameters was below 10% and nearly always below 5%. Moreover, bias estimates were not consistently in a positive or negative direction. Random effects results for 00, and 11, were quite different, however. Both variance estimates were consistently positively biased. Bias in intercept variance estimates in Model 1, 00, decreases as the number of dyads increases. This decrease also appears to be dependent on the ICC, with sample size having a greater effect for higher intraclass coefficients and a slight Nonconvergence and sample bias 16 tendency toward underestimation of intercept variance with high ICCs in large samples. In Model 1, which estimates intercept variance (i.e., 00,), all percentage bias estimates were greater than 10% when ICC = .05. For ICC of .10, .20, and .30, 1000, 200, and 50 dyads, respectively, appear to be sufficient to keep bias in the estimate of intercept variance under 10%. In Model 2, which estimates slope variance (i.e., 11,), bias estimates were greater than 10% in all ICC conditions and were not dependent on sample size. In summary, it appears that there is little bias in intercept or slope parameter estimates. Intercept variance estimates will be problematic unless the ICC is greater than .10 and there is sufficient sample size. Slope variance estimates appear to be positively biased regardless of the ICC value or the sample size and are therefore likely to be impractical for dyadic researchers. Standard Error Bias To investigate potential problems with significance tests of fixed or random effects, we computed bias for the standard error estimates of 0, 1, 00, and 11. Results are presented in Table 3. There are few apparent biases for standard errors of intercepts, with nearly every bias estimate below 10%. However, there did appear to be a trend toward negative bias as ICCs values increased, and this may suggest some potential difficulties with Type I errors for very large number of dyads sizes and higher ICCs. Bias was also generally low in standard error estimates of slopes. Although several cells showed greater than 10% overestimation, these observations appear to be limited to the lower sample size conditions (e.g., fewer than 200 dyads) in Model 2. There was evidence of severe bias for variance estimates in both Model 1 and Model 2. Standard errors for variance estimates of intercepts in Model 1 showed positive bias as high as 69%. Although bias was reduced as sample size increased, as many as 1000 dyads was not Nonconvergence and sample bias 17 sufficient to reduce the bias in the standard error estimates for the intercept when ICC = .05. To obtain unbiased standard error estimates for intercepts (Model 1), 500, 200, and 200 dyads appear to be necessary for ICCs of .10, .20, and .30, respectively. For standard error estimates of slopes (Model 2), the positive bias was quite high for all conditions, with an average bias of approximately 64%. Discussion An extensive study of convergence difficulties and parameter bias shows important problems with the use of HLM with dyadic data whenever random effects estimates are of interest. Although fixed effects estimates of slopes and intercepts and their standard errors show little bias across any sample size or ICC condition, random effects estimates led to frequent convergence difficulties, positive bias in parameters, and positive bias in standard errors. Although intercept random effects were less problematic with larger sample sizes and higher ICC values, there were serious convergence difficulties, parameter bias, and standard error bias for slope random effects. In practice, it will be difficult to estimate intercept or slope variances in many dyadic applications. These results provide greatly needed information on the application of HLM to dyadic data and suggest that such applications are largely impractical when research questions concern variability across dyads. Our findings indicate that convergence with random effects will often be a serious problem, but a proportion of models specifying random effects will converge. In these cases, researchers may be tempted to report their findings, but, because of serious problems with bias in random effects estimates and their standard errors, neither the estimates nor the significance tests using Wald ratios (i.e., z-test using ratio of the estimate to the standard error) can be trusted. Nonconvergence and sample bias 18 Dyadic researchers not interested in random effects estimates could successfully employ HLM, but estimates would have to be assumed to be invariant across dyads. Inclusion or exclusion of random effects may have an impact on fixed effects estimates (Kreft & de Leeuw, 1998; Longford, 1993), however, and therefore constraints on random parameters is not recommended without reasonable justification. When level-2 predictors are incorporated and the variance is set to zero, researchers can conceptualize level-1 parameters as varying nonrandomly (Raudenbush & Bryk, 2002). In this case, slopes or intercepts from level 1 varying completely as a function of some level-2 variable. An additional problem for researchers is that a proportion of models with random effects will converge and estimates of random effects and their significance tests will be obtained. In this study, our focus has been on the application of HLM to dyadic data, but the results also have implications for growth curve models with two time points. One important difference may be the range of ICC values expected with longitudinal data. Many variables may have higher ICC values or consistency over time than were examined in our study (i.e., greater than .3). Our choice of values for ICC was guided by common values encountered in the dyadic literature rather than those expected for repeated measures. The results, however, suggest that ICC values had no effect on slope variance estimates or their standard errors within the range of values we examined, and, thus, there is little reason to expect a difference in this trend with higher ICC values. Although the present study represents valuable information on dyadic analysis with HLM, there are some important limitations of our findings. Perhaps most importantly, convergence difficulties and bias in random effects may be maximally estimated here, because values of both the intercept and slope variance in the population were low. Nonpositive sample Nonconvergence and sample bias 19 variance estimates are most likely to occur under these circumstances, and, thus, bias in sample estimates of intercept or slope variance may be less severe when their respective population parameters are large. Nevertheless, such circumstances are likely to be common in practice where researchers are unaware of true population values. More generally, we did not systematically vary whether random or fixed parameters were correctly or incorrectly specified in the model, and we cannot be certain of the impact of model misspecification on estimates or standard errors. For fixed effects, both models we tested represent some degree of misspecification.. With only two observations per group, one must constrain either the variance estimate for the intercept or slope and their covariance to zero to identify the model. Furthermore, more explicit comparisons between correct and incorrect model specification would be needed for exact information about Type I or Type II error rates. In addition to these issues, there are several other limitations of the present study. First, the models we examined were relatively simple and did not include level-2 predictors or multiple level-1 predictors. It is difficult to imagine, however, that better estimates would be obtained with more complex models. Second, our manipulation of the ICC was not independent of within-group variance or between-group variance, but differences in these variance components are not likely to be independent of ICC values in practice. Because sample ICC values were compared to population ICC values, however, our results provide a fair comparison between sample and population values and thus accurate estimates of bias in each of the ICC conditions. Moreover, Bryk and Raudenbush (1992, p. 221) show that standard errors do not depend on within- or between-group variance when balanced data are used. Third, a number of authors have attempted to retain an equivalent sample size while examining the effects of the number of observations per group and the number of groups (e.g., Bassiri, 1988; Kim, 1990; Mok, 1995). Nonconvergence and sample bias 20 Other authors (Busing, 1993; van der Leeden & Busing, 1994) have attempted to examine group size and the number of groups independently, allowing total sample size to fluctuate. It may be impossible to disentangle the importance of group size, number of groups, and total sample size. Regardless of the resolution of this issue, our focus on dyadic data meant that the effects of the number of groups and the total sample size could not be separated. Our findings do not provide dyadic researchers interested in testing random effects with much reason for optimism, and we can offer few suggestions for existing solutions. Some authors (Bryk & Raudenbush, 1992; Snijders & Bosker, 1999) recommend that variance tests be conducted via likelihood ratio tests using FIML instead of Wald tests. We did not obtain either FIML estimates or nested model tests, and, thus, have no information on the usefulness of this recommendation with dyadic data. Because REML and FIML estimates are similar with larger sample sizes (Kim, 1990), we would expect little difference from our findings using FIML in place of REML. Furthermore, given the rather severe bias in estimates for random effects, it is unlikely nested tests of these biased parameters would be a substantial improvement over Wald ratio tests. Researchers also rarely report or test random effects using nested models, typically relying on the Wald tests automatically generated by the software. van der Leeden et al. (1997) have suggested the use of bootstrap methods for standard error estimates and future work on this may prove useful for testing random effects with dyads. Another possible solution may involve a bias correction factor that could be applied to parameters and standard error estimates. Until such work is completed and tested in simulation work, researchers need to be cautioned against the use of random effects with dyadic data. Nonconvergence and sample bias 21 References Aitkin, M., & Longford, N. (1986). Statistical modeling issues in school effectiveness studies. Journal of the Royal Statistical Society, Series A, 149, 1-43. Barnett, R.C., Marshall, N.L., Raudenbush, S.W., & Brennan, R.T. (1993). Gender and the relationship between job experiences and psychological distress: A study of dual-earner couples. Journal of Personality and Social Psychology, 64, 794-806. Bassiri, D. (1988). Large and small sample properties of maximum likelihood estimates for the hierarchical linear model. Unpublished doctoral dissertation. East Lansing, MI, Michigan State University. Bliese, P.D. (1998). Group size, ICC values, and group-level correlations: A simulation. Organizational Research Methods, 1, 355-373 Bryk, A.S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage. Busing, F.M.T.A. (1993). Distribution characteristics of variance estimates in two-level models. Preprint PRM 93-04. Psychometric and Research Methodology, Leiden, Netherlands. de Leeuw, J. & Kreft, I. (1986). Random coefficient models for multilevel analysis. Journal of Educational Statistics, 11, 57-85. Donoghue, J.R., & Jenkins, F. (1992). A Monte Carlo study of ethe effects of model misspecification on HLM estimates. Technical report, Educational Testing Service, Princeton, NJ. November 1992. Efron (1990). More efficient bootstrap computations. Journal of the American Statistical Association, 85, 79-89. Nonconvergence and sample bias 22 Goldstein, H. (1986). Multilevel mixed linear model analysis with iterative generalized least squares. Biometrika, 73, 43-56. Hox, J. (1998). Multilevel modeling: When and why. In R.Mathar & M. Schader, Classifcation, data analysis, and data highways. Berlin, Germany: Springer-Verlag. Kashy, D.A., & Kenny, D.A. (2000). The analysis of data from dyads and groups. In H.T. Reis & C. M. Judd, Handbook of research methods in social and personality psychology (pp. 451-477). Cambridge: Cambridge University Press. Kim, K.-S. (1990). Multilevel data analysis: A comparison of analytic alternatives. Unpublished doctoral dissteration. University of California, Los Angeles, CA. Kreft, I.G.G. (1996). Are multilevel techniques necessary? An overview, including simulation studies. Unpublished manuscript, California State University, Los Angeles, CA. Kreft, I.G.G., & de Leeuw, J. (1998). Introducing multilevel modeling. London: Sage. Kreft, I.G.T., de Leeuw, J., & Aiken, L. (1995). The effect of different forms of centering in hierarchical linear models. Multivariate Behavioral Research, 30, 1-22. Kurdek, L.A. (1997). Adjustment to relationship dissolution in gay, lesbian, and heterosexual partners. Personal Relationships, 4, 145-161. Longford, N. (1993). Random coefficient models. Oxford, England: Oxford University Press. Maas, C.J.M., & Hox, J.J. (2002). Sufficient sample sizes for multilevel modeling. Unpublished manuscript, Utrecht University, The Netherlands. Maguire, M. C. (1999). Treating the dyad as the unit of analysis: A primer on three analytic approaches. Journal of Marriage and Family, 61, 213-223 Nonconvergence and sample bias 23 Mason, W., Wong, G., & Entwisle, B. (1984). Contextual analysis through the multilevel linear model. In S. Leinhardt, (Ed.), Sociological Methodology (pp. 72-103). San Francisco: Jossey-Bass. Mok, M. (1995) Sample size requirements for 2-level designs in educational research. Unpublished manuscript, Macquarie University, Sydney Australia. Newsom, J.T. (in press). A multilevel structural equation model for dyadic data. Structural Equation Modeling. Raudenbush, S., & Bryk, A. (1986). A hierarchical model for studying school effects. Sociology of Education, 59, 1-17. Raudenbush, S., & Bryk, A. (2002). Hierarchical linear models: Applications and data analysis methods (second edition). Thousand Oaks, CA: Sage. Kreft, IG.G., de Leeuw, J., & Aiken, L. (1995). The effect of different forms of centering in hierarchical linear models. Multivariate Behavioral Research, 30, 1-22. Snijders, T. A.B., & Bosker, R.J. (1994). Modeled variance in two-level models. Sociological Methods and Research, 22, 342-363 Snijders, T. A.B., & Bosker, R.J. (1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage. van der Leeden, R., & Busing, F.M.T.A. (1994). First iteration versus IGLS/RIGLS estimates in two-level models: A Monte Carlo study with ML3. Preprint PRM 94-03. Psychometrics and Research Methodology, Leiden, Netherlands. Windle, M., & Dumenci, L. (1997). Parental and occupational stress as predictors of depressive symptoms among dual-income couples: A multilevel modeling approach. Journal of Marriage & the Family, 59, 625-634. Nonconvergence and sample bias 24 Table 1 Random Effect Nonconvergence in Models 1 (Intercept) and 2 (Slope). ICC .05 Number of dyads ICC .10 ICC .20 ICC .30 n % n % n % n % 50 78 39.0 54 27.0 14 7.0 5 2.5 100 59 29.5 49 24.5 5 2.5 0 0.0 200 54 27.0 26 13.0 0 0.0 0 0.0 500 29 14.5 2 1.0 0 0.0 0 0.0 1000 15 7.5 0 0.0 0 0.0 0 0.0 50 115 57.5 113 56.5 111 55.5 114 57.0 100 114 57.0 99 49.5 98 49.0 109 54.5 200 113 56.5 104 52.0 89 44.5 115 57.5 500 118 59.0 98 49.0 81 40.5 114 57.0 1000 114 57.0 113 56.5 71 35.5 122 61.0 Model 1 (Intercept variance) Model 2 (Slope variance) Nonconvergence and sample bias 25 Table 2 Sample Bias of Parameter Estimates for Intercepts, Slopes, and Random Effects. B0 B1 T00/ T11 Number of dyads bias % bias % bias % 50 0.00362 1.64 0.00773 3.23 0.07344 32.69 100 -0.00392 -2.37 -0.00303 -1.85 0.04269 28.18 200 -0.00264 -1.98 0.00559 4.65 0.02815 24.58 500 -0.00598 -7.74 0.00468 6.38 0.01518 18.29 1000 -0.00212 -3.95 0.00348 6.66 0.00807 12.83 50 0.00816 3.24 -0.00284 -1.15 0.05591 21.35 100 0.00295 1.72 0.00914 5.11 0.03022 16.57 200 -0.00323 -2.56 0.00455 3.88 0.01173 8.47 500 0.00036 0.48 0.00137 1.88 0.00517 5.39 1000 -0.00019 -0.36 0.00069 1.29 0.00003 0.042 50 -0.00239 -0.95 -0.00418 -1.84 0.02768 9.10 100 -0.00169 -0.92 -0.00303 -1.98 0.01978 9.45 200 0.00015 0.12 0.00092 0.82 0.00436 2.77 500 -0.00155 -1.95 -0.00139 -1.90 0.00073 0.67 1000 -0.00112 -1.96 0.00012 0.22 -0.00013 -0.18 50 -0.01604 -5.71 -0.00599 -2.59 0.01200 4.06 100 -0.00641 -3.55 -0.00080 -0.48 0.00815 3.73 200 -0.00095 -0.71 0.00173 1.46 -0.00006 -0.04 500 0.00012 0.14 -0.00043 -0.59 -0.00254 -2.58 1000 -0.00068 -1.19 -0.00069 -1.28 -0.00610 -7.80 Model 1 ICC .05 ICC .10 ICC .20 ICC.30 Nonconvergence and sample bias 26 (Table 2 continued) B0 B1 T00/ T11 Number of dyads bias % bias % bias % Model 2 ICC .05 50 0.00526 2.33 0.00525 2.36 0.07022 39.00 100 -0.00055 -0.34 0.00739 4.64 0.04671 39.15 200 -0.00260 -1.99 0.00350 2.83 0.02936 39.72 500 -0.00093 -1.15 0.00463 6.71 0.01955 39.34 1000 0.00116 2.17 0.00252 4.68 0.01204 36.18 50 -0.00454 -1.90 0.00120 0.47 0.07712 35.78 100 0.00010 0.053 0.00670 4.03 0.04568 37.56 200 -0.00096 -0.79 0.00285 2.17 0.03518 39.37 500 0.00059 0.83 0.00083 1.10 0.01821 37.05 1000 -0.00986 2.24 0.00070 -0.26 0.07924 43.08 50 -0.00986 -4.03 0.00070 0.29 0.07924 31.01 100 -0.00307 -1.79 0.00440 2.81 0.04561 35.51 200 -0.00433 -3.26 0.00068 0.56 0.03281 36.30 500 -0.00213 -2.65 0.00056 0.69 0.01817 34.42 1000 -0.00189 -3.51 0.00107 1.96 0.01169 33.12 50 -0.00142 -5.06 -0.01358 -5.48 0.06667 35.31 100 0.00308 1.62 0.00168 1.01 0.04562 39.18 200 -0.00154 -1.25 0.00590 5.11 0.03341 40.27 500 -0.00180 -2.32 -0.00145 -2.22 0.01862 39.91 1000 0.00279 4.97 0.00199 3.80 0.01222 41.34 ICC .10 ICC .20 ICC .30 Nonconvergence and sample bias 27 Table 3 Sample Bias of Standard Error Estimates for Intercepts, Slopes, and Random Effects. B0 B1 T00 /T11 Number of dyads J Bias % Bias % Bias % Model 1 ICC .05 50 122 0.01251 14.18 -0.00074 -0.78 0.05404 69.31 100 141 0.00390 5.91 0.00130 1.98 0.03617 65.96 200 146 -0.00401 -7.52 -0.00029 -0.60 0.02215 52.04 500 171 0.00032 1.03 0.00093 3.18 0.00876 27.48 1000 185 0.00047 2.18 0.00051 2.46 0.00408 16.56 50 146 0.00268 2.68 -0.00282 -2.85 0.03595 36.23 100 151 0.00367 5.37 -0.00356 -5.00 0.02359 33.42 200 174 -0.00018 -0.35 0.00100 2.13 0.01069 19.47 500 198 0.00085 2.77 0.00092 3.18 0.00283 7.41 1000 200 0.00156 7.54 -0.00002 -0.12 0.00129 4.67 50 186 0.00499 4.97 0.00325 3.60 0.01590 13.22 100 195 0.00076 1.04 0.00497 8.15 0.01161 14.05 200 200 -0.00038 -0.73 0.00183 4.07 0.00311 4.96 500 200 0.00108 3.41 0.00048 1.64 -0.00191 -4.41 1000 200 0.00047 2.09 -0.00001 -0.08 -0.00057 -1.90 50 195 -0.00355 -3.18 -0.00156 -1.70 0.01924 16.33 100 200 0.00490 6.82 -0.00163 -2.45 0.00987 11.33 200 200 0.00042 0.78 -0.00132 -2.80 0.00203 3.09 500 200 0.00053 1.57 0.00017 0.57 0.00309 7.88 1000 200 0.00120 5.27 -0.00098 -4.57 -0.00110 -3.49 ICC .10 ICC.20 ICC .30 Nonconvergence and sample bias 28 (Table 3 Continued ) B0 B1 T00 /T11 Number of dyads J Bias % Bias % Bias % Model 2 ICC .05 50 122 0.00602 6.73 0.01743 19.86 0.04722 82.50 100 141 0.00172 2.63 0.00792 12.49 0.02738 72.39 200 146 -0.00464 -8.88 0.00057 1.16 0.01988 85.69 500 171 -0.00227 -7.01 0.00362 13.23 0.00963 61.29 1000 185 -0.00005 -0.24 0.00016 0.76 0.00614 55.50 50 146 0.00034 0.36 0.00363 3.56 0.03425 47.55 100 151 -0.00650 -8.81 0.00580 8.75 0.02711 68.43 200 174 -0.00060 -1.25 -0.00188 -3.57 0.01665 59.02 500 198 0.00190 6.73 0.00113 3.77 0.00997 61.80 1000 200 0.00159 7.53 0.00038 1.79 0.00779 74.53 50 186 -0.00275 -2.82 0.01020 10.75 0.01491 16.53 100 195 -0.00112 -1.63 0.00976 15.62 0.02539 58.93 200 200 -0.00535 -10.11 0.00199 4.15 0.01477 49.25 500 200 -0.00191 -5.97 -0.00078 -2.44 0.00851 47.41 1000 200 -0.00017 -0.77 0.00019 0.86 0.00612 50.15 50 195 -0.01690 -15.15 0.00434 4.40 0.03493 55.01 100 200 -0.00844 -11.16 0.00552 8.36 0.02891 78.35 200 200 -0.00134 -2.74 0.00394 8.57 0.01916 74.39 500 200 -0.00090 -2.91 0.00413 15.37 0.01179 80.77 1000 200 -0.00105 -4.73 0.00078 3.75 0.00851 94.93 ICC .10 ICC .20 ICC .30 Nonconvergence and sample bias 29 Footnotes 1 Although there are several estimators available with HLM, such as restricted maximum likelihood, generalized least squares, or ordinary least squares, we will use “ordinary least squares” to refer to standard, nonhierarchical linear regression, and “HLM” to refer to multilevel regression using any of these possible estimators. 2 The notation for  does not typically involve an exponent, although it is a variance rather than a standard deviation. 3 Although several of the studies mentioned here (e.g., van der Leeden & Busing, 1994) address issues of the relative efficiency of different estimators, such as full maximum likelihood and generalized least squares, we focus our discussion on findings that are applicable to REML, by far the most commonly used estimator. 4 It is not possible to estimate both variances (or “random effects”) in hierarchical linear models, because the number of random parameters that can be estimated is limited by the number of variancecovariance matrix elements within each level-2 unit (Snijders & Bosker, 1999). The variancecovariance matrix among level-1 units is given by:   00  2 01 xi 1   11 xi21   2  00   01  xi 1  xi 2    11 xi 1 xi 2     00  2 01 xi 2   11 xi22   2   00   01  xi 1  xi 2    11 xi 1 xi 2 where  00 ,  11 and  01 are the variance estimates for the intercept and slope and the covariance between intercept and slope, respectively, and  2 is the variance of the level-1 residual. xi 1 and xi  2 is the predictor variable for the two observations in a dyad (Goldstein, 1999). The level-1 variancecovariance matrix for dyadic data (i.e., 2 observations per group or nj = 2), given above, contains n j  n j  1 / 2 or 2  2  1 / 2  3 unique elements. A model containing a single level-1 predictor with random effects estimated for the intercept and slope, such as that in the models tested here requires 4 Nonconvergence and sample bias 30 parameters to be estimated:  00 ,  11 ,  01 , and  2 . In general,  q  1 q  2   / 2  1 parameters are required for a model with all possible random variances and covariances. With dyadic data, a model with a random intercept and slope results in 1 too many parameters to be estimated given the number of covariance elements available, leading to identification problems.

HIERARCHICAL LINEAR MODELING OF DYADIC DATA

Related documents

Products

Support

HIERARCHICAL LINEAR MODELING OF DYADIC DATA

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib