Heterogeneous Variance 1 Effect Size Estimation from Pretest-Posttest-Control Designs with Heterogeneous Variances Scott B. Morris Illinois Institute of Technology Paper Presented at the 20th Annual Conference of the Society for Industrial and Organizational Psychology, Los Angeles, CA, April 2005. Heterogeneous Variance 2 Effect Size Estimation from Pretest-Posttest-Control Designs with Heterogeneous Variances The effectiveness of meta-analysis depends on the quality of the effect size estimates from primary research results. It is critical that effect size estimates be unbiased and that the sampling properties of the effect size estimates be known. In particular, meta-analytic procedures require estimates of the sampling variance of effect sizes in order to obtain optimal weights, to build confidence intervals, and to estimate between-study variance components (Hedges & Olkin, 1985). The standardized mean difference, d, is a common index of effect size for meta-analysis of the effectiveness of organizational interventions. The techniques for meta-analysis of d were developed under the common assumptions of independence, normality and homogeneity of variance (Hedges & Olkin, 1985). Research has shown that violating these assumptions can bias meta-analytic results (Grissom & Kim, 2001; Harwell, 1997; Morris, 2004). Therefore, it is important to develop methods that are robust to violation of these assumptions. This paper will discuss the Pretest-Posttest-Control (PPC) design, a popular design for assessing the effectiveness of organizational interventions. The PPC design involves two independent groups of participants assigned to alternate treatment conditions (e.g., treatment and control groups). All participants are measured before and after the intervention, allowing the measurement of individual change. Change scores, however, may be influenced by maturation, spontaneous remission, or historical events that occur between measurement occasions (Cook & Campbell, 1979). By comparing the change in the treatment group to the change observed in a control or placebo group, bias due to the analysis of change scores can be reduced. Consequently, the PPC design is often preferred over designs where the outcome is measured only at posttest or designs with no control group (Cook & Campbell, 1979). Methods for estimating meta-analysis of effect sizes from the PPC design have been described in a number of sources (Becker, 1988; Carlson & Schmidt, 1999; Morris & DeShon, 2002). All of the existing methods were developed under the assumption that variances are homogeneous across treatment groups at both pretest and posttest. Although this assumption is often justified (Hedges, 1981), there are many situations were it is reasonable to expect differences in the variance of the outcome variable across treatment conditions. Heterogeneous Variance 3 A potential cause of variance heterogeneity is the treatment x subject interaction (Cook & Campbell, 1979). If the effectiveness of a treatment is not the same for all research participants, some individuals will show a greater change due to treatment than others will. Consequently, the post-treatment variance will reflect both initial individual differences as well as differences in the effectiveness of treatment. In contrast, the variance of scores in the control group will only reflect initial individual differences, because the individuals do not receive the treatment. This pattern of variances is common in studies of training effectiveness (Carlson & Schmidt, 1999). Variance heterogeneity has implications for the definition of the effect size estimate as well as its sampling distribution. The impact of variance heterogeneity on the definition of the effect size estimate was given thorough consideration in early work on meta-analysis (Glass, McGaw & Smith, 1981); however, this work did not address the sampling variance of the proposed effect size. This paper will review the rationale for the Glass et al. (1981) effect size estimate, and then derive the sampling distribution for this statistic. Estimating the Standardized Mean Difference In order for the effect size to have consistent interpretation across studies, it must be expressed in a common metric. Because a collection of studies will often use a variety of measures, the mean difference between groups will not be directly comparable across studies. The standardized mean difference avoids this problem by dividing the mean difference by the within-group standard deviation. This removes differences due to the scaling of the dependent variables, and promotes comparability of effect sizes across studies. This approach assumes that, except for scaling differences, the variance of dependent variable is the same across studies. If two studies have a common scale but different variances, the effect size from the two studies will have a different interpretation, and cannot be meaningfully combined in a meta-analysis. In the PPC design, participants are assigned to either a treatment or control condition, and each participant is measured both before and after treatment occurs. Therefore, there are four means and four standard deviations that could be used to define the effect size. The pattern of heterogeneity among these four cells will depend on the theoretical mechanism causing the variances to differ. A likely source of variance heterogeneity is due to differential treatment effects. To the extent that individuals receive differing amounts of treatment or treatment is more effective for Heterogeneous Variance 4 some than others (a subject by treatment interaction), posttest variance in the treatment group will be inflated relative to pretest variance. For these reasons, Glass et al. (1981) argued that the posttest standard deviation of the treatment group might not be comparable across studies, even after removing differences due to scaling. Therefore, the posttest standard deviation of the treatment group is not a good standardizer. The standard deviation of the pretest scores, on the other hand, is more likely to be consistent across studies (assuming participants are sampled from the same population). Because pretest scores are measured before treatment has been administered, they will not be affected by the differential treatment effect. Therefore, an effect size defined in terms of the pooled pretest standard deviation across treatment and control groups is likely to have metric comparability across studies. The posttest variance in the control group should also be unaffected by the differential treatment effect. However, pooling effect sizes across pretest and posttest scores in the control group complicates the distribution of the effect size (Morris, 2003), and will not be considered here. The recommendation that effect sizes be defined using pretest standard deviations has been repeated in many treatments of meta-analysis (Becker, 1988; Carlson & Schmidt, 1999; Morris & DeShon, 2002), the impact of the recommendation on meta-analysis procedures has not been fully investigated. Specifically, there has been little consideration of the impact of variance heterogeneity on the sampling variance of effect size estimates. Sampling variance plays a central role in almost all uses of effect size estimates. Sampling variance is used to construct confidence intervals around individual effect size estimates, it may be used to define weights for estimating the mean effect size in a meta-analysis, and provides the basis for tests of homogeneity of effect size and estimates of random variance components in random effects meta-analysis. Using the pretest standard deviation does not eliminate the effect of variance heterogeneity on the sampling variance of the mean difference. The current research develops the sampling variance of the standardized mean from a PPC design when variances are heterogeneous, and illustrates how failure to use the correct sampling variance can lead to inaccurate conclusions in meta-analysis. Heterogeneous Variance 5 Definition of the Effect Size The data are assumed to be randomly sampled from two populations, corresponding to treatment and control conditions. Pretest and posttest scores in each population have a bivariate normal distribution with correlation . The pretest scores from both populations and the posttest scores in the control group are assumed to have equal variance, 2; while the posttest variance in the treatment group, T2, post , may differ from the others. The means are indicated by T,pre for the treatment population pretest, T,post for the treatment population posttest, C,pre for the control group pretest, and C,post for the control group posttest. The standardized mean change in each population is defined as the mean difference between posttest and pretest scores, divided by the common pretest standard deviation. The standardized mean change for the treatment group (T) is T T , post T , pre . ( 1) The standardized mean change for the control group (C) is C C , post C , pre . ( 2) The effect size for the PPC design is defined as the difference between the standardized mean change for the treatment and control groups (Becker, 1988; Carlson & Schmidt, 1999; Morris & DeShon, 2002), T C T , post T , pre C , post C , pre . An individual study consists of nT participants receiving treatment, and nC participants in the control group. The pretest and posttest means for the treatment group are indicated by Mpre,T and Mpost,T, respectively. The pretest and posttest means for the control group are indicated by Mpre,C and Mpost,C, respectively. A separate estimate of the standard deviation can be obtained for the treatment groups at pretest (SDpre,T) and posttest (SDpost,T), and for the control group at pretest (SDpre,C) and posttest (SDpost,C). Previous research (Carlson & Schmidt, 1999; Morris, 2003) suggests defining the effect size using the pooled pretest standard deviation, Heterogeneous Variance 6 SDP 2 2 nT 1SD pre ,T nC 1SD pre,C nT nC 2 . (3) An unbiased estimate of the effect size is given by M post,T M pre,T M post,C M pre,C d ppc c P , SDP ( 4) where c is Hedges (1981) bias correction, which is approximately c 1 3 . 4nT nC 2 1 ( 5) Distribution of Effect Size Estimate Under the assumptions of the model, the sample mean contrast in the numerator of the effect size estimate (i.e., the difference between treatment and control group change scores) is normally distributed, and is an unbiased estimate of the contrast among population means. If homogeneity of variance is assumed across all cells, the variance of the mean contrast is 1 1 . var M Post,T M Pr e,T M Post,C M Pr e,C 2 2 1 nT nC ( 6) However, when the variance of the treatment group posttest scores differs from the other conditions, the variance of the mean contrast becomes var M Post,T M Pr e,T M Post,C M Pr e,C 2 2 Post ,T 2 Post,T nT 2 2 1 .( 7) nC Let h be the ratio of the variance of the mean contrast to the variance of pretest scores, h . VAR M post,T M pre,T M post,C M pre,C 2 ( 8) For the pattern of variance heterogeneity described above, T2 2 h C 1 2 T C nT 21 . nC ( 9) Heterogeneous Variance 7 Let gPPC be the sample effect size estimate without the bias correction, g ppc Mpost,T Mpre,T Mpost,C Mpre,C , SDP The sample effect size divided by ( 10) h would be M post,T M pre,T M post,C M pre,C h g , SD h P ( 11) which is distributed as a noncentral t (Huynh, 1989) with df=nT+nC-1 and noncentrality parameter, 1 . ( 12) h Therefore, g is distributed as h times t (Huynh, 1989), and the unbiased estimate, dPPC = cgPPC is distributed as c h times the t, where c is the bias factor approximated by Equation 5. expectation and variance of the noncentral t (Johnson & Kotz, 1970) are given by E t c , ( 13) and 2 df 2 var t 1 2 . c df 2 ( 14) Therefore, the expected value of dPPC is, c . E dPPC c h ( 15) The heterogeneity-assumed variance of dPPC is c2h times the variance of t. Therefore nT nC 2 h 2 2 , nT nC 4 2 d PPC c 2 HET ( 16) The Heterogeneous Variance 8 or, T2 2 + 2 n nC 2 HET (d PPC ) = c 2 T C nT + nC - 4 1 2 T C nT 2 1 + 2 - 2 . nC ( 17) Most current methods for meta-analysis assume homogeneity of variance (e.g., Hedges & Olkin, 1985), in which case, the homogeneity-assumed variance of the effect size would be nT nC 2 21 nT nC 2 2 . nT nC nT nC 4 2 d c 2 HOM ( 18) In cases where heterogeneity of variance exists, the use of Equation 18 can be quite inaccurate. Table 1 illustrates the difference between the homogeneity-assumed variance (Equation 18) and the true variance (Equation 17) when =1 and =.5, and under different levels of heterogeneity and sample size. The results clearly show that assuming homogeneity can result in substantial bias in variance estimates. Specifically, when the treatment group has the larger variance, the homogeneity-assumed variance tends to underestimate the true variance. When the treatment group variance was four times larger than the control group variance, the homogeneityassumed variance underestimated the true variance by as much as 56%. Misestimating the sampling variance could have serious implications for the conclusions drawn from a meta-analysis. Estimates of sampling variance are needed to compute the weighted mean effect size and test for homogeneity of effect size across studies. The following section discusses the impact of using the incorrect variance formula on these procedures. Meta-Analytic Procedures Mean Effect Size In a meta-analysis, the researcher is generally interested in estimating the mean effect size and testing for homogeneity of effect size. The most precise estimate of the mean effect size is obtained by weighting the individual effect sizes by the reciprocal of the variance (Hedges & Olkin, 1985), Heterogeneous Variance 9 k d w d j j 1 j , ( 19) 1 . d j ( 20) k w j 1 j where k is the number of studies, and wj 2 In general, more accurate estimates of the variance should lead to a more precise 2 weighted mean. Therefore, when variances are unequal, using HET (d ) should result in a more 2 ( d ) . However, the benefit of using the more precise estimate of the mean than using HOM accurate variance estimate is complicated by the fact that the weights must be based on sample estimates of the population parameters. Using sample statistics to define the weights can create bias in the weighted mean. Because both the effect size estimate and the weight are based on the same data, they will tend to be correlated across samples, and this correlation creates bias in the weighted mean. Previous research has shown that this bias tends to be very small when variances are homogeneous (Van Den Noortgate & Onghena, 2003). When variances are homogeneous, the weight depends only on the sample size, the estimates of the effect size, and the estimate of the pre-post correlation. Furthermore, the bias can be avoided simply by computing the weights using the mean effect size and mean correlation rather than the sample statistics. However, for 2 HET (d ) , the weight also depends on the ratio of treatment group to control group variance. Because the sample effect size and the sample variance ratio both depend on the standard deviation of the control group, the correlation and the resulting bias may be non-trivial. Homogeneity of Effect Size Researchers are also interested in determining whether the treatment effect is homogeneous across a pool of studies. Hedges' (1981) Q-test is commonly used to test whether the observed variance in effect sizes is larger than expected due to sampling error, Heterogeneous Variance 10 k Q j 1 d d 2 j 2 d j . ( 21) Under the null hypothesis of homogeneity, Q has a chi-square distribution with k-1 df. When the variance of the treatment group is larger than the variance of the control group, 2 ( d ) in the Q-test would lead to underestimation of true variance, and correspondingly using HOM exaggerated values of Q. The potential bias in Q could be remedied by using the correct 2 variance formula, HET (d ) . As with the weighted mean, it is unknown to what extent sampling error in the estimate 2 of the variance ratio will affect the results of the Q test using HET (d ) . Thus, while the more accurate formula is correct asymptotically, it is not clear how well procedures based on the improved variance estimate will perform in small samples. To explore the viability of the proposed method, a Monte Carlo simulation was conducted to examine the accuracy of metaanalytic results using the modified variance formula (Equation 17). Monte Carlo Simulation A Monte Carlo simulation explored the effectiveness of alternate variance estimates under conditions of both homogeneous and heterogeneous variance. The simulation was repeated under a variety of conditions likely to influence the accuracy of meta-analytic results, such as the effect size, the pre-post correlation, the sample size, and the number of studies. For the simulation, the set of studies in a meta-analysis either all had homogeneous variance or all had an equal degree of heterogeneous variance. For the heterogeneous conditions, the variance of the treatment group posttest was 4.0, while the variance of the treatment group pretest, control group pretest and control group posttest were all 1.0. This represents a large difference in variance. For the homogeneous variance conditions, all variances were 1.0. The number of studies in a meta-analysis (k) was set at 10 or 25. The effect size was constant across all studies within a meta-analysis. The population effect size () was set at 0.0, 0.5, and 1.0, corresponding to no effect, a moderate effect, and a large effect. The population pre-post correlation () was equal for treatment and control groups, and was constant across studies within a meta-analysis. The values for the pre-post correlation were 0.0, 0.4, and 0.8. Heterogeneous Variance 11 Sample size was allowed to vary across studies within a meta-analysis. The sample sizes for treatment and control conditions were randomly sampled from four levels (5, 10, 20, 30) based on a specified probability distribution. The distribution of sample sizes varied across conditions, so that the average sample size was 10, 15 or 25. The probability distributions are shown in Table 2. Average sample size was manipulated separately for treatment and control groups. When the average sample size was equal across groups, a high proportion of the studies had similar sample sizes across groups. When the average sample size was different across groups, a high proportion of the studies had substantial differences in sample sizes across groups. Under each combination of the parameters, results were averaged across 10,000 metaanalyses. Each meta-analysis consisted of k studies. For each study, nC scores in the control group and nT scores in the treatment group were randomly generated from a multivariate normal distribution using the IMSL DRNMVN routine. At both pretest and posttest, the control group had a mean of 0 and a standard deviation of 1. Pretest scores in the treatment group also had a mean of 0 and a standard deviation of 1. A linear transformation was used to create treatment group scores with a mean of and a standard deviation of 1 or 4, depending on the condition. Based on these scores, an effect size was computed for each study using Equation 4. For each meta-analysis, the weighted mean effect size was computed two ways; first with weights defined using the inverse of the homogeneity assumed variance (Equation 18) and second with the weights defined using the inverse of the heterogeneity-assumed variance (Equation 17). For both approaches, the unweighted average values across studies were used as estimates of and in the variance formula. For the heterogeneity-assumed variance, the variance ratio was estimated using the treatment group posttest variance divided by the treatment group pretest variance. The resulting mean effect size was averaged across iterations of the simulation to obtain the expected value. The homogeneity of effect size test was computed twice within each meta-analysis: first using the homogeneity-assumed variance and second using the heterogeneity-assumed variance. The resulting Q-value was compared to a chi-square distribution with k-1 df. Type I error rate was defined as the proportion of meta-analyses within a condition where Q exceeded the critical chi-square value at =.05. Heterogeneous Variance 12 Results Both methods of conducting the meta-analysis produced a mean effect size that was nearly unbiased. The results are summarized in Tables 3 and 4. Consistent with past research (Morris, 2003), the weighted mean effect size using the homogeneity-assumed variance formula was essentially unbiased under all conditions examined in the study. For meta-analysis based on the heterogeneity-assumed variance formula, there was little or no bias when the population effect size was zero. For >0, there was a slight negative bias. The degree of bias was greatest when sample size was small in both groups, and when variances were unequal. For example, in the condition with unequal variance, nT=nC=10, =0 and 25 studies in the meta-analysis, an effect size of 1.0 was underestimated by 3% ( d = 0.97). In many cases, the bias was much smaller. When the population effect size was 1.0, the average bias was -.01. The accuracy of the Q-test is summarized in Tables 5 and 6. When the assumption of homogeneity of variance was met, the Q-test based on the homogeneity-assumed variance had reasonably accurate Type I error rates, ranging from .05 to .08. Type I error rates for the simulations with k=25 are shown in Figure 1. Similar results were obtained with k=10. When variance was homogeneous, the Q-test based on the heterogeneity-assumed variance produced inflated Type I error rates under some conditions (see Figure 1). When the population effect size was 0, the test was slightly conservative, with Type I error rates ranging from .02 to .05 (M = .04). When the population effect size was 1.0, the Type I error rates were overly liberal, ranging from .06 to .24 (M=.10). Type I error rate inflation was particularly high when the pre-post correlation was large, as indicated in Figure 1. As expected, when the homogeneity of variance assumption was violated, the Q-test based on the homogeneous variance formula was not accurate. The magnitude of bias was considerable, and increased with the number of studies. For k=10, Type I error rates ranged from .19 to .75 (M = .43), while for k=25, Type I error rates ranged from .33 to .97 (M = .68). Type I error rates were highest when the pre-post correlation was large, and when the treatmentgroup sample size was smaller than the control-group sample size (see Figure 2). When treatment and control group variances were unequal, Type I error rates for the Qtest based on the heterogeneity-assumed variance were considerably more accurate than the traditional Q-test (see Figure 2). Results are summarized here for k=25. Similar results were found for k=10. When sample sizes were equal, Type I error rates were close to the nominal Heterogeneous Variance 13 level on average (M=.06), although they ranged from .03 to .12. When the sample size for the treatment group was larger than the sample size for the control group, the test was reasonably accurate. For example, for nT=25 and nC=10, Type I error rates ranged from .03 to .07 (M=.05). However, when the sample size for the treatment group was smaller than the sample size for the control group, the test was overly liberal. When nC=25 and nT=10, Type I error rates ranged from .11 to .15 (M=.12). Conclusion When subgroup variances are unequal, common meta-analytic methods may be inaccurate unless appropriate modifications are made. The common recommendation to standardize the effect size using only the pretest standard deviations (Becker, 1988; Carlson & Schmidt, 1999; Morris & DeShon, 2002) is successful at producing an estimate of effect size that is unbiased. However, additional adjustments are needed to obtain accurate estimates of the sampling variance of effect sizes. Standard formulas for the sampling variance, which assume variance homogeneity, can lead to substantial errors regarding the variability of effect sizes across studies. A likely pattern of variance heterogeneity occurs when an experimental treatment does not affect all individuals equally, resulting in larger variance for posttest scores in the treatment group than for pretest or control group scores. In this situation, failure to adjust for variance heterogeneity can severely bias the Q-test for homogeneity of effect size across studies. A simulated meta-analysis yielded Type I error rates as high as 97%. Under some conditions, the traditional Q-test was almost guaranteed to find significant variance across studies, when in fact the true effect size was constant. Using this method could lead researchers to falsely conclude that random differences across studies are due to substantive moderator variables. Some support was found for an alternate estimate of sampling variance, which takes into account the degree of heterogeneity. Use of the proposed estimate avoided the excessive inflation of Type I error found with the homogeneity-assumed formula. However, the results for the new estimate were somewhat mixed. Use of the heterogeneity-assumed variance to compute the weighted mean resulted in a very small downward bias in the mean effect size, particularly when sample size was small. In addition, Type I error rates on the Q-test differed from the normative alpha level under some conditions. Specifically, when variances were homogenous, Type I error rates were more consistent using the homogeneity-assumed formula. When both Heterogeneous Variance 14 and were large, the heterogeneity-assumed formula produced Type I error rates as high as .24. When variances differed across conditions, tests based on the heterogeneity-assumed formula were consistently more accurate than the test currently used by meta-analysts. However, the new procedure still yielded Type I error rates as high as .15. Thus, although the proposed variance formula produced some improvement in the accuracy of the Q-test, additional work is needed to refine the procedure. When it is reasonable to assume that variances are equal across groups and across time, it is recommended that researchers use existing procedures based on the homogeneity-assumed variance formula. When the assumption was met, this approach better controlled for Type I error, and produced a weighted mean that was unbiased. However, when variances are unequal, the heterogeneity-assumed variance formula is recommended for procedures that rely on estimates of sampling variance, such as testing homogeneity of effect size across studies, and estimating the true variance of effect size. Several limitations of the study should be noted. The simulations only examined conditions where the degree of variance heterogeneity was constant across studies. In practice, a meta-analysis is likely to include studies with varying degrees of heterogeneity. When only a few studies have heterogeneous variance, the impact on the results will be minimized. Similarly, the simulation modeled the situation where the population effect size and the pre-post correlation were constant across studies. Future research should also consider the impact of the alternate formulas under a wider range of conditions. Another limitation is that the study only examined one pattern of heterogeneity: inflated variance in treatment-group posttest scores with homogeneity across the other conditions. Although this form of heterogeneity is often a concern, other patterns are possible. The equations derived in this paper should apply to situations where the posttest variance in the treatment group is either larger or smaller than the other conditions. The same approach can be used to derive appropriate formulas for other patterns of heterogeneity, such as situations where posttest scores are inflated, perhaps differentially, in both treatment and control groups, or when the pre-post correlation differs across groups. Heterogeneous Variance 15 References Becker, B. J. (1988). Synthesizing standardized mean-change measures. British Journal of Mathematical and Statistical Psychology, 41, 257-278. Carlson, K. D., & Schmidt, F. L. (1999). Impact of experimental design on effect size: Findings from the research literature on training. Journal of Applied Psychology, 84, 851-862. Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Boston, MA: Houghton Mifflin. Glass, G. V., McGaw, B., & Smith, M. L. (1981). Meta-analysis in social research. Beverly Hills, CA: Sage. Grissom , R. J., & Kim, J. J. (2001). Review of assumptions and problems in appropriate conceptualization of effect size. Psychological Methods, 6, 135-146. Harwell, M. (1997). An empirical study of Hedges' Homogeneity Test. Psychological Methods, 2, 219-231. Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational Statistics, 6, 107-128. Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. San Diego, CA: Academic Press. Huynh, C. L. (1989). A unified approach to the estimation of effect size in meta-analysis. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco (ERIC Document Reproduction Service No. ED 306 248). Johnson, N. L., & Kotz, S. (1970). Continuous univariate distributions. NY: John Wiley & Sons. Morris, S. B. (2003, April). Estimating Effect Size from the Pretest-Posttest-Control Design. Paper presented at the 18th annual conference of the Society for Industrial and Organizational Psychology, Orlando, FL. Morris, S. B. (2004). Effect Size Estimation from Two Independent Groups with Heterogeneous Variances. Paper Presented at the 19th Annual Conference of the Society for Industrial and Organizational Psychology, Chicago, IL, April 2004. Morris, S. B., & DeShon, R. P. (2002). Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. Psychological Methods, 7, 105-125. Heterogeneous Variance 16 Van Den Noortgate, W. & Onghena, P. (2003). Estimating the Mean Effect Size in Meta-Analysis: Bias, Precision, and Mean Squared Error of Different Weighting Methods. Behavior Research Methods, Instruments and Computers, 35, 504–511. Heterogeneous Variance 17 Table 1 Inaccuracy of Traditional Estimate of Effect Size Variance (Assuming Homogeneity) when =1 and =.5. Variance of Effect Size 2 Post ,T 0.25 0.25 0.25 0.25 0.5 0.5 0.5 0.5 2 2 2 2 4 4 4 4 Homogeneity Homogeneity nT nC Assumed 10 10 25 25 10 10 25 25 10 10 25 25 10 10 25 25 10 25 10 25 10 25 10 25 10 25 10 25 10 25 10 25 0.238 0.159 0.159 0.092 0.238 0.159 0.159 0.092 0.238 0.159 0.159 0.092 0.238 0.159 0.159 0.092 Not Assumed Difference % Difference 0.213 0.133 0.148 0.082 0.217 0.138 0.150 0.083 0.299 0.218 0.182 0.116 0.445 0.362 0.240 0.173 0.026 0.025 0.010 0.010 0.021 0.021 0.008 0.008 -0.060 -0.060 -0.024 -0.024 -0.206 -0.203 -0.081 -0.081 12 19 7 12 10 15 6 10 -20 -27 -13 -21 -46 -56 -34 -47 Note: 2=1 for pretest scores in both groups and posttest scores in control group. Heterogeneous Variance 18 Table 2 Proportion of Studies at Each Sample Size in Simulated Meta-Analyses. Sample Size Condition N Small Medium Large 5 0.6 0.2 0 10 0.2 0.4 0.1 20 0.1 0.2 0.3 30 0.1 0.2 0.6 Average N 10 15 25 Heterogeneous Variance 19 Table 3 Bias in Weighted Mean Effect Size Using Homogeneity-Assumed (HOM) and HeterogeneityAssumed (HET) Variance Estimates as Weights (k=10). 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0 0 0 0 0 0 0 0 0 nT nC 10 10 15 10 25 10 10 15 15 15 25 15 10 25 15 25 25 25 10 10 15 10 25 10 10 15 15 15 25 15 10 25 15 25 25 25 10 10 15 10 25 10 10 15 15 15 25 15 10 25 15 25 25 25 10 10 15 10 25 10 10 15 15 15 25 15 10 25 15 25 25 25 T2 = C2 HOM HET 0.0046 0.0050 0.0050 0.0047 0.0023 0.0022 0.0013 0.0012 -0.0002 -0.0004 -0.0018 -0.0015 0.0022 0.0019 0.0001 -0.0002 -0.0025 -0.0024 0.0044 0.0039 0.0000 0.0003 0.0047 0.0048 -0.0015 -0.0008 0.0003 0.0001 -0.0008 -0.0007 -0.0003 0.0001 0.0006 0.0006 -0.0013 -0.0015 -0.0010 -0.0007 0.0010 0.0009 0.0021 0.0019 0.0017 0.0019 0.0001 0.0002 0.0001 0.0000 0.0001 0.0000 -0.0004 -0.0003 0.0002 0.0002 0.0027 -0.0138 0.0024 -0.0068 0.0043 -0.0003 0.0032 -0.0084 -0.0027 -0.0119 0.0011 -0.0035 0.0028 -0.0070 -0.0009 -0.0086 -0.0005 -0.0054 T2 > C2 HOM HET 0.0034 0.0030 0.0014 0.0010 0.0033 0.0034 0.0026 0.0019 -0.0017 -0.0018 0.0026 0.0020 -0.0040 -0.0037 0.0002 0.0002 -0.0008 -0.0013 0.0025 0.0016 0.0004 -0.0007 0.0011 0.0008 0.0000 -0.0005 -0.0012 -0.0014 -0.0017 -0.0008 0.0017 0.0012 -0.0011 -0.0015 -0.0014 -0.0017 0.0009 0.0019 0.0040 0.0029 0.0014 0.0014 0.0018 0.0021 0.0033 0.0032 0.0007 0.0011 -0.0003 -0.0007 -0.0015 -0.0013 0.0008 0.0008 0.0025 -0.0255 -0.0002 -0.0199 0.0026 -0.0087 0.0034 -0.0186 0.0025 -0.0143 0.0010 -0.0096 0.0016 -0.0150 0.0000 -0.0140 -0.0003 -0.0105 (table continues) Heterogeneous Variance 20 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 nT nC 0.4 10 10 0.4 15 10 0.4 25 10 0.4 10 15 0.4 15 15 0.4 25 15 0.4 10 25 0.4 15 25 0.4 25 25 0.8 10 10 0.8 15 10 0.8 25 10 0.8 10 15 0.8 15 15 0.8 25 15 0.8 10 25 0.8 15 25 0.8 25 25 T2 = C2 HOM HET 0.0060 -0.0084 0.0025 -0.0061 0.0068 0.0027 0.0015 -0.0103 -0.0005 -0.0078 0.0008 -0.0034 -0.0008 -0.0091 0.0002 -0.0062 0.0001 -0.0043 0.0021 -0.0068 0.0015 -0.0040 -0.0001 -0.0023 0.0036 -0.0033 0.0017 -0.0025 0.0005 -0.0018 -0.0002 -0.0050 -0.0003 -0.0036 0.0000 -0.0021 Note: nT and nC represent average sample size. T2 > C2 HOM HET 0.0003 -0.0303 0.0033 -0.0163 0.0021 -0.0093 -0.0011 -0.0254 0.0015 -0.0163 0.0024 -0.0085 0.0013 -0.0112 -0.0023 -0.0152 -0.0040 -0.0140 0.0025 -0.0160 -0.0003 -0.0134 0.0027 -0.0054 0.0035 -0.0095 0.0030 -0.0082 0.0017 -0.0058 -0.0004 -0.0092 -0.0009 -0.0091 -0.0004 -0.0072 Heterogeneous Variance 21 Table 4 Bias in Weighted Mean Effect Size Using Homogeneity-Assumed (HOM) and HeterogeneityAssumed (HET) Variance Estimates as Weights (k=25). 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0 0 0 0 0 0 0 0 0 nT nC 10 10 15 10 25 10 10 15 15 15 25 15 10 25 15 25 25 25 10 10 15 10 25 10 10 15 15 15 25 15 10 25 15 25 25 25 10 10 15 10 25 10 10 15 15 15 25 15 10 25 15 25 25 25 10 10 15 10 25 10 10 15 15 15 25 15 10 25 15 25 25 25 T2 = C2 HOM HET -0.0035 -0.0034 -0.0014 -0.0013 -0.0004 -0.0004 -0.0006 -0.0006 -0.0009 -0.0010 -0.0010 -0.0011 -0.0006 -0.0001 -0.0023 -0.0021 0.0007 0.0008 0.0009 0.0013 0.0001 0.0005 -0.0013 -0.0012 0.0018 0.0017 0.0019 0.0021 0.0006 0.0006 0.0006 0.0004 -0.0003 -0.0001 0.0005 0.0004 -0.0010 -0.0009 -0.0003 -0.0003 0.0004 0.0004 -0.0006 -0.0007 0.0007 0.0007 0.0000 0.0000 0.0008 0.0007 0.0004 0.0005 0.0002 0.0003 -0.0019 -0.0188 -0.0008 -0.0114 -0.0004 -0.0053 0.0006 -0.0128 0.0001 -0.0091 -0.0010 -0.0060 0.0006 -0.0090 -0.0009 -0.0084 0.0000 -0.0051 T2 > C2 HOM HET -0.0005 -0.0003 -0.0021 -0.0028 -0.0021 -0.0019 -0.0012 -0.0028 -0.0021 -0.0016 0.0014 0.0016 0.0014 0.0008 -0.0019 -0.0020 0.0007 0.0008 -0.0007 0.0003 -0.0018 -0.0016 -0.0006 -0.0007 0.0006 0.0013 0.0002 0.0001 -0.0006 -0.0007 0.0001 -0.0005 -0.0008 -0.0001 0.0009 0.0015 0.0010 0.0011 -0.0002 -0.0003 -0.0004 -0.0003 0.0004 0.0002 0.0009 0.0007 0.0013 0.0011 0.0006 0.0003 0.0007 0.0005 0.0004 0.0005 -0.0033 -0.0334 -0.0006 -0.0221 -0.0014 -0.0130 -0.0005 -0.0234 0.0004 -0.0167 0.0000 -0.0114 0.0010 -0.0161 0.0000 -0.0142 0.0012 -0.0095 (table continues) Heterogeneous Variance 22 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 nT nC 0.4 10 10 0.4 15 10 0.4 25 10 0.4 10 15 0.4 15 15 0.4 25 15 0.4 10 25 0.4 15 25 0.4 25 25 0.8 10 10 0.8 15 10 0.8 25 10 0.8 10 15 0.8 15 15 0.8 25 15 0.8 10 25 0.8 15 25 0.8 25 25 T2 = C2 HOM HET 0.0015 -0.0136 -0.0008 -0.0103 -0.0017 -0.0058 -0.0002 -0.0118 0.0005 -0.0076 0.0000 -0.0046 -0.0002 -0.0083 0.0008 -0.0056 0.0022 -0.0023 -0.0002 -0.0096 0.0005 -0.0049 -0.0010 -0.0032 -0.0003 -0.0071 -0.0001 -0.0045 0.0002 -0.0020 0.0003 -0.0044 -0.0007 -0.0043 0.0003 -0.0019 Note: nT and nC represent average sample size. T2 > C2 HOM HET 0.0004 -0.0295 -0.0016 -0.0229 -0.0001 -0.0120 -0.0007 -0.0236 0.0005 -0.0173 0.0010 -0.0110 0.0000 -0.0168 -0.0004 -0.0149 0.0001 -0.0110 0.0009 -0.0175 -0.0004 -0.0136 -0.0008 -0.0088 0.0015 -0.0131 -0.0012 -0.0127 0.0005 -0.0075 0.0012 -0.0076 -0.0001 -0.0085 0.0006 -0.0065 Heterogeneous Variance 23 Table 5 Type I Error Rate for Homogeneity of Effect Size Test Using Homogeneity-Assumed (HOM) and Heterogeneity-Assumed (HET) Variance Estimates (k=10). 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0 0 0 0 0 0 0 0 0 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0 0 0 0 0 0 0 0 0 0.4 0.4 nT nC 10 10 15 10 25 10 10 15 15 15 25 15 10 25 15 25 25 25 10 10 15 10 25 10 10 15 15 15 25 15 10 25 15 25 25 25 10 10 15 10 25 10 10 15 15 15 25 15 10 25 15 25 25 25 10 10 15 10 25 10 10 15 15 15 25 15 10 25 15 25 25 25 10 10 15 10 T2 = C2 HOM HET 0.0680 0.0441 0.0678 0.0524 0.0570 0.0507 0.0668 0.0483 0.0602 0.0474 0.0572 0.0517 0.0629 0.0546 0.0542 0.0480 0.0589 0.0536 0.0609 0.0404 0.0585 0.0414 0.0541 0.0489 0.0619 0.0421 0.0536 0.0417 0.0549 0.0490 0.0599 0.0458 0.0560 0.0432 0.0526 0.0474 0.0583 0.0342 0.0516 0.0382 0.0516 0.0463 0.0510 0.0315 0.0528 0.0406 0.0523 0.0477 0.0529 0.0309 0.0514 0.0377 0.0514 0.0448 0.0663 0.0479 0.0670 0.0547 0.0583 0.0553 0.0703 0.0566 0.0632 0.0511 0.0570 0.0538 0.0568 0.0522 0.0583 0.0561 0.0557 0.0537 0.0606 0.0422 0.0589 0.0497 T2 > C2 HOM HET 0.3581 0.0592 0.3007 0.0539 0.2029 0.0464 0.4308 0.0746 0.3653 0.0591 0.2738 0.0504 0.5173 0.0912 0.4614 0.0675 0.3753 0.0523 0.4045 0.0552 0.3346 0.0495 0.2299 0.0433 0.4718 0.0715 0.3986 0.0577 0.2953 0.0406 0.5790 0.0886 0.5198 0.0651 0.4239 0.0482 0.5577 0.0611 0.4739 0.0492 0.3433 0.0344 0.6374 0.0702 0.5827 0.0543 0.4449 0.0348 0.7511 0.0908 0.6952 0.0593 0.5842 0.0359 0.3481 0.0572 0.2951 0.0544 0.1996 0.0519 0.4273 0.0774 0.3585 0.0614 0.2562 0.0483 0.5094 0.0972 0.4658 0.0721 0.3764 0.0551 0.3920 0.0611 0.3198 0.0504 (table continues) Heterogeneous Variance 24 T2 = C2 nT nC HOM HET 0.5 0.4 25 10 0.0528 0.0521 0.5 0.4 10 15 0.0586 0.0465 0.5 0.4 15 15 0.0577 0.0511 0.5 0.4 25 15 0.0541 0.0542 0.5 0.4 10 25 0.0570 0.0457 0.5 0.4 15 25 0.0573 0.0502 0.5 0.4 25 25 0.0496 0.0502 0.5 0.8 10 10 0.0529 0.0500 0.5 0.8 15 10 0.0528 0.0559 0.5 0.8 25 10 0.0524 0.0644 0.5 0.8 10 15 0.0561 0.0493 0.5 0.8 15 15 0.0518 0.0558 0.5 0.8 25 15 0.0544 0.0679 0.5 0.8 10 25 0.0523 0.0444 0.5 0.8 15 25 0.0518 0.0549 0.5 0.8 25 25 0.0501 0.0643 1 0 10 10 0.0676 0.0635 1 0 15 10 0.0643 0.0673 1 0 25 10 0.0600 0.0653 1 0 10 15 0.0642 0.0623 1 0 15 15 0.0636 0.0665 1 0 25 15 0.0545 0.0647 1 0 10 25 0.0570 0.0605 1 0 15 25 0.0564 0.0611 1 0 25 25 0.0605 0.0724 1 0.4 10 10 0.0620 0.0640 1 0.4 15 10 0.0617 0.0715 1 0.4 25 10 0.0547 0.0727 1 0.4 10 15 0.0607 0.0659 1 0.4 15 15 0.0573 0.0705 1 0.4 25 15 0.0546 0.0746 1 0.4 10 25 0.0574 0.0622 1 0.4 15 25 0.0572 0.0728 1 0.4 25 25 0.0545 0.0778 1 0.8 10 10 0.0536 0.1213 1 0.8 15 10 0.0561 0.1297 1 0.8 25 10 0.0521 0.1228 1 0.8 10 15 0.0586 0.1147 1 0.8 15 15 0.0517 0.1266 1 0.8 25 15 0.0538 0.1402 1 0.8 10 25 0.0479 0.0883 1 0.8 15 25 0.0526 0.1170 1 0.8 25 25 0.0508 0.1484 Note: nT and nC represent average sample size. T2 > C2 HOM HET 0.2213 0.0458 0.4590 0.0730 0.4030 0.0617 0.2917 0.0487 0.5664 0.0975 0.5133 0.0731 0.4083 0.0494 0.5252 0.0727 0.4486 0.0541 0.3326 0.0450 0.6194 0.0759 0.5599 0.0624 0.4203 0.0424 0.7375 0.0941 0.6745 0.0687 0.5635 0.0445 0.3349 0.0689 0.2727 0.0618 0.1939 0.0563 0.4099 0.0824 0.3432 0.0703 0.2632 0.0611 0.4978 0.0981 0.4523 0.0805 0.3541 0.0631 0.3615 0.0683 0.3056 0.0638 0.2135 0.0569 0.4360 0.0804 0.3765 0.0740 0.2784 0.0563 0.5506 0.1039 0.4824 0.0804 0.3829 0.0597 0.4405 0.0940 0.3830 0.0788 0.2852 0.0637 0.5514 0.0999 0.4774 0.0813 0.3642 0.0667 0.6855 0.1115 0.6020 0.0854 0.4830 0.0604 Heterogeneous Variance 25 Table 6 Type I Error Rate for Homogeneity of Effect Size Test Using Homogeneity-Assumed (HOM) and Heterogeneity-Assumed (HET) Variance Estimates (k=25). 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0 0 0 0 0 0 0 0 0 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0 0 0 0 0 0 0 0 0 0.4 0.4 nT nC 10 10 15 10 25 10 10 15 15 15 25 15 10 25 15 25 25 25 10 10 15 10 25 10 10 15 15 15 25 15 10 25 15 25 25 25 10 10 15 10 25 10 10 15 15 15 25 15 10 25 15 25 25 25 10 10 15 10 25 10 10 15 15 15 25 15 10 25 15 25 25 25 10 10 15 10 T2 = C2 HOM HET 0.0745 0.0396 0.0718 0.0474 0.0598 0.0503 0.0645 0.0397 0.0647 0.0451 0.0574 0.0486 0.0609 0.0502 0.0542 0.0432 0.0559 0.0448 0.0665 0.0334 0.0632 0.0376 0.0592 0.0489 0.0682 0.0345 0.0599 0.0376 0.0518 0.0417 0.0524 0.0318 0.0563 0.0380 0.0551 0.0443 0.0589 0.0265 0.0561 0.0342 0.0508 0.0443 0.0570 0.0245 0.0504 0.0304 0.0527 0.0439 0.0476 0.0193 0.0512 0.0307 0.0535 0.0426 0.0732 0.0429 0.0676 0.0486 0.0567 0.0512 0.0749 0.0499 0.0615 0.0494 0.0657 0.0609 0.0598 0.0516 0.0615 0.0529 0.0579 0.0538 0.0665 0.0384 0.0578 0.0442 T2 > C2 HOM HET 0.6055 0.0548 0.5209 0.0511 0.3582 0.0500 0.7119 0.0830 0.6284 0.0630 0.4724 0.0469 0.8169 0.1101 0.7646 0.0761 0.6459 0.0522 0.6607 0.0561 0.5601 0.0431 0.3985 0.0397 0.7612 0.0782 0.6750 0.0522 0.5220 0.0390 0.8686 0.1068 0.8193 0.0772 0.7106 0.0468 0.8581 0.0637 0.7711 0.0454 0.5956 0.0281 0.9182 0.0818 0.8666 0.0534 0.7404 0.0277 0.9724 0.1097 0.9490 0.0632 0.8706 0.0277 0.5953 0.0610 0.4987 0.0558 0.3473 0.0487 0.6944 0.0768 0.6043 0.0634 0.4640 0.0543 0.8149 0.1127 0.7600 0.0827 0.6272 0.0549 0.6434 0.0596 0.5539 0.0507 (table continues) Heterogeneous Variance 26 T2 = C2 T2 > C2 nT nC HOM HET HOM HET 0.5 0.4 25 10 0.0589 0.0556 0.3860 0.0412 0.5 0.4 10 15 0.0650 0.0403 0.7473 0.0828 0.5 0.4 15 15 0.0611 0.0492 0.6829 0.0589 0.5 0.4 25 15 0.0500 0.0488 0.5160 0.0464 0.5 0.4 10 25 0.0530 0.0388 0.8632 0.1103 0.5 0.4 15 25 0.0521 0.0453 0.8071 0.0803 0.5 0.4 25 25 0.0489 0.0497 0.6903 0.0460 0.5 0.8 10 10 0.0588 0.0473 0.8152 0.0737 0.5 0.8 15 10 0.0569 0.0620 0.7433 0.0494 0.5 0.8 25 10 0.0547 0.0713 0.5640 0.0361 0.5 0.8 10 15 0.0529 0.0426 0.9107 0.0922 0.5 0.8 15 15 0.0501 0.0541 0.8484 0.0664 0.5 0.8 25 15 0.0478 0.0688 0.7034 0.0343 0.5 0.8 10 25 0.0516 0.0365 0.9645 0.1251 0.5 0.8 15 25 0.0534 0.0554 0.9335 0.0756 0.5 0.8 25 25 0.0524 0.0777 0.8520 0.0339 1 0 10 10 0.0779 0.0679 0.5690 0.0761 1 0 15 10 0.0734 0.0739 0.4794 0.0688 1 0 25 10 0.0621 0.0746 0.3294 0.0564 1 0 10 15 0.0702 0.0657 0.6793 0.0990 1 0 15 15 0.0662 0.0694 0.5979 0.0737 1 0 25 15 0.0592 0.0744 0.4423 0.0624 1 0 10 25 0.0626 0.0658 0.7964 0.1173 1 0 15 25 0.0591 0.0690 0.7353 0.0938 1 0 25 25 0.0583 0.0761 0.6072 0.0700 1 0.4 10 10 0.0723 0.0728 0.6147 0.0797 1 0.4 15 10 0.0627 0.0757 0.4988 0.0654 1 0.4 25 10 0.0574 0.0834 0.3585 0.0553 1 0.4 10 15 0.0633 0.0695 0.7104 0.1010 1 0.4 15 15 0.0639 0.0839 0.6314 0.0733 1 0.4 25 15 0.0556 0.0894 0.4806 0.0622 1 0.4 10 25 0.0526 0.0612 0.8419 0.1289 1 0.4 15 25 0.0572 0.0762 0.7718 0.0969 1 0.4 25 25 0.0572 0.0920 0.6382 0.0642 1 0.8 10 10 0.0642 0.1605 0.7396 0.1194 1 0.8 15 10 0.0578 0.1854 0.6459 0.0902 1 0.8 25 10 0.0523 0.1806 0.4879 0.0704 1 0.8 10 15 0.0618 0.1473 0.8485 0.1297 1 0.8 15 15 0.0554 0.1769 0.7735 0.0960 1 0.8 25 15 0.0504 0.2079 0.6216 0.0695 1 0.8 10 25 0.0522 0.1114 0.9403 0.1485 1 0.8 15 25 0.0540 0.1635 0.8931 0.0985 1 0.8 25 25 0.0507 0.2358 0.7805 0.0679 Note: Note: nT and nC represent average sample size. Heterogeneous Variance 27 Figure 1 Type I Error Rate for Q-test When Variance is Homogeneous Across Groups (k=25). Type I Error Rate = 0 = 1 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 HomogeneityAssumed HeterogeneityAssumed 0 0 0.4 Pre-Post Correlation Note: The nominal Type I error rate was .05. 0.8 0 0.4 Pre-Post Correlation 0.8 Heterogeneous Variance 28 Figure 2 Type I Error Rate for Q-test When Variance is Heterogeneous Across Groups (2T > 2C). 1 Pre-Post Correlation Homogeneity-Assumed Type I Error Rate 0.8 r=0 0.6 r = .4 r = .8 0.4 0.2 Heterogeneity-Assumed 0.05 0 0.0 1.0 2.0 3.0 nT/nC Note: k = 25. The dashed line indicates the nominal Type I error rate (.05).