Wim Van den Noortgate Katholieke Universiteit Leuven, Belgium Belgian Campbell Group Wim.VandenNoortgate@kuleuven-kortrijk.be Workshop systematic reviews Leuven June 4-6, 2012 1 1. 2. Modelling heterogeneity Publication bias 2 3 Growing popularity of evidence-based thinking: Decisions in practice and policy should be based on scientific research about the effects of these decisions/interventions But: conflicting results (failures to replicate), especially in social sciences! 4 1. The role of chance - in measuring variables - in sampling study participants 2. Study results may be systematically biased due to - the way variables are measured - the way the study is set up 3. Studies differ from each other (e.g., in the kind of treatment, the duration of treatment, the dependent variable, the characteristics of the investigated population, …) 5 Differences between observed effect sizes due to chance only Population effect sizes all equal (1 2 ... k ) 6 H0 : 1 2 ... k Ha : at least one j differs from an other k Q w j (g j ˆ )2 j 1 H0 : Q ~ 2 ( k 1) 7 (Q df ) I² * 100% Q = percentage of variability in effect estimates due to heterogeneity rather than chance Rough guidelines: 0% to 40%: might not be important 30% to 60%: may represent moderate heterogeneity 50% to 90%: may represent substantial heterogeneity 75% to 100%: considerable heterogeneity Interpretation based on both I² and heterogeneity test! 8 (Raudenbush, S. W. (1984). Magnitude of teacher expectancy effects on pupil IQ as a function of the credibility of expectancy induction: A synthesis of findings from 18 experiments. Journal of Educational Psychology, 76, 85-97.) Study 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. Rosenthal et al. (1974) Conn et al. (1968) Jose & Cody (1971) Pellegrini & Hicks (1972) Pellegrini & Hicks (1972) Evans & Rosenthal (1969) Fielder et al. (1971) Claiborn (1969) Kester & Letchworth (1972) Maxwell (1970) Carter (1970) Flowers (1966) Keshock (1970) Henrickson (1970) Fine (1972) Greiger (1970) Rosenthal & Jacobson (1968) Fleming & Anttonen (1971) Ginsburg (1970) Weeks prior contact 2 3 3 0 0 3 3 3 0 1 0 0 1 2 3 3 1 2 3 gj 0.03 0.12 -0.14 1.18 0.26 -0.06 -0.02 -0.32 0.27 0.80 0.54 0.18 -0.02 0.23 -0.18 -0.06 0.30 0.07 -0.07 (g j ) 0.13 0.15 0.17 0.37 0.37 0.10 0.10 0.22 0.16 0.25 0.30 0.22 0.29 0.29 0.16 0.17 0.14 0.09 0.17 9 Q = 35,83, df = 18, I²= 50 %, p = .007 10 Not always wise: make set of studies more homogeneous! Can help to say something about ‘fruit’ Can help to make detailed conclusions: Does the effect depend on the kind of fruit? 11 12 Population effect size possibly depends on study category Differences between observed effect sizes within the same category due to chance only 13 (Raudenbush, S. W. (1984). Magnitude of teacher expectancy effects on pupil IQ as a function of the credibility of expectancy induction: A synthesis of findings from 18 experiments. Journal of Educational Psychology, 76, 85-97.) Study 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. Rosenthal et al. (1974) Conn et al. (1968) Jose & Cody (1971) Pellegrini & Hicks (1972) Pellegrini & Hicks (1972) Evans & Rosenthal (1969) Fielder et al. (1971) Claiborn (1969) Kester & Letchworth (1972) Maxwell (1970) Carter (1970) Flowers (1966) Keshock (1970) Henrickson (1970) Fine (1972) Greiger (1970) Rosenthal & Jacobson (1968) Fleming & Anttonen (1971) Ginsburg (1970) Weeks prior contact 2 3 3 0 0 3 3 3 0 1 0 0 1 2 3 3 1 2 3 gj 0.03 0.12 -0.14 1.18 0.26 -0.06 -0.02 -0.32 0.27 0.80 0.54 0.18 -0.02 0.23 -0.18 -0.06 0.30 0.07 -0.07 (g j ) 0.13 0.15 0.17 0.37 0.37 0.10 0.10 0.22 0.16 0.25 0.30 0.22 0.29 0.29 0.16 0.17 0.14 0.09 0.17 14 15 k 2 ˆ Q w j (g j ) j 1 Total variability in observed ES’s QT = Variability between groups QB + Variability within groups QW QT : homogeneity test H0: QT ~²k-1 QB : moderator test H0: QB ~²J-1 QW : test for within group homogeneity H0: QW ~²k-J 16 Q total = Q Between + Q within ² 35.83 20.38 15.45 df 18 3 15 0.007 0.0001 0.42 p 17 Observed effect sizes for the 3 tasks 6.5 5.5 4.5 ES 3.5 2.5 1.5 0.5 -0.5 -1.5 Semantic categorization Lexical decision Naming = Mean ES REM 18 Population effect size possibly depends on continuous study characteristic e.g., j 0 1 x1 j ... p x pj After taking into account this study characteristic, differences between observed effect sizes due to chance only 19 Initial effect is moderate (0.41, p < .001), but decreases with increasing prior contact (with -0.16 per week, p <.001) 20 Population effect size possibly varies randomly over studies Differences between observed effect sizes are due to - chance - ‘true’ differences 21 22 Population effect size possibly depends on study category Differences between observed effect sizes within the same category are due to - chance - ‘true’ differences 23 Population effect size possibly depends on continuous study characteristic e.g., x ... x u j 0 1 1j p pj j After taking into account this study characteristics, differences between observed effect sizes are due to - chance - ‘true’ differences 24 Random effects model with moderators: ◦ The least restrictive model: allows moderator variables & random variation ◦ Also called a ‘Mixed effects model’ 25 FEM REM Without moderator Categorical moderator Continuous moderator 26 1. 2. 3. 4. 5. 6. 7. Is there an overall effect? How large is this effect? Is the effect the same in all studies? How large is the variation over studies? Is this variation related to study characteristics? Is there variation that remains unexplained? What is the effect in the specific studies? 27 (Raudenbush, S. W. (1984). Magnitude of teacher expectancy effects on pupil IQ as a function of the credibility of expectancy induction: A synthesis of findings from 18 experiments. Journal of Educational Psychology, 76, 85-97.) Study 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. Rosenthal et al. (1974) Conn et al. (1968) Jose & Cody (1971) Pellegrini & Hicks (1972) Pellegrini & Hicks (1972) Evans & Rosenthal (1969) Fielder et al. (1971) Claiborn (1969) Kester & Letchworth (1972) Maxwell (1970) Carter (1970) Flowers (1966) Keshock (1970) Henrickson (1970) Fine (1972) Greiger (1970) Rosenthal & Jacobson (1968) Fleming & Anttonen (1971) Ginsburg (1970) Weeks prior contact 2 3 3 0 0 3 3 3 0 1 0 0 1 2 3 3 1 2 3 gj 0.03 0.12 -0.14 1.18 0.26 -0.06 -0.02 -0.32 0.27 0.80 0.54 0.18 -0.02 0.23 -0.18 -0.06 0.30 0.07 -0.07 (g j ) 0.13 0.15 0.17 0.37 0.37 0.10 0.10 0.22 0.16 0.25 0.30 0.22 0.29 0.29 0.16 0.17 0.14 0.09 0.17 28 Parameter REM Fixed Intercept Between study variance 0 0.084 (0.052) u2 0.019 (0.023) 29 Parameter REM MEM Fixed Intercept 0 Weeks 1 Between study variance u2 0.084 (0.052) 0.41 (0.087) -0.16 (0.036) 0.019 (0.023) 0.00 (-) 30 1. 2. 3. 4. Models can include multiple moderators REM assumes randomly sampled studies REM requires enough studies Association (over studies) ≠ causation! Be aware of potential confounding moderators (studies are not ‘RCT participants’!) 31 Dependencies between studies ◦ E.g., research group, country, … Multiple effect sizes per study ◦ Several samples ◦ Same sample but, e.g., several indicator variables 32 Ignoring dependence? NO! Avoiding dependence ◦ (Randomly choosing one ES for each study) ◦ Averaging ES’s within a study ◦ Performing separate meta-analyses for each kind of treatment or indicator Modelling dependence ◦ Performing a multivariate meta-analysis, accounting for sampling covariance. ◦ Performing a three level analysis 33 34 (Egger, M. D., & Smith, G. (1998). Meta-analysis. Bias in location and selection of studies. British Medical Journal, 316, 61-66. http://www.bmj.com/cgi/content/full/316/7124/61). 35 Proportion of publication within 5 years after conference: 81 % (of 233 trials) for significant results 68 % (of 287 trials) for nonsignificant results (Kryzanowska, M. K., Pintilie, M., & Tennock, I. F. (2003). Factors associated with failure to publish large randomized trials presented at an oncology meeting. Journal of the American Medical Association, 290, 495-501). 36 500 Sample size 400 300 200 100 0 -1 -0.5 0 0.5 1 1.5 Observed effect sizes 37 500 Sample size 400 300 200 100 0 -1 -0.5 0 0.5 1 1.5 Observed effect sizes 38 Thorough search for all relevant published and unpublished study results a) b) c) d) e) f) Articles Books Conference papers Dissertations (Un)finished research reports … 39 - - - outliers - detection using graphs (or tests) - conduct analysis with and without outliers calculation effect sizes : several analyses publication bias: analysis with and without unpublished results design & quality: compare results from studies with strong design or good quality, with those of all studies researcher: literature search, effect size calculation, coding quality, …, done by two researchers … 40 Observed effect sizes 6.5 5.5 4.5 ES 3.5 2.5 1.5 0.5 -0.5 -1.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121 123 125 127 129 131 133 Experiment 41 135 Spreadsheets (e.g., MS Excel, …) Some general statistical software (note: often not possible to fix the sampling variance) SAS Proc Mixed, Splus, R Metafor package, … Software for meta-analysis (note: often not MEM; often only one moderator!) CMA (http://www.meta-analysis.com/), RevMan, … Software for multilevel/mixed models HLM, MLwiN, … 42 Software Calculation of effect sizes Number of moderators Funnel Trim & Fill Forest Max. nr of levels Flexibility Price Complexity Excel SAS R CMA RevMan X X X X X 2 √ X ∞ X X X ∞ √√ √ ∞ √ √ √ 2 √√ √√ 1 √ √ √ 2 X X 1 (cat.) √ X √ 2 X Expensive (but student version) Free X X √ Expensive Free (but limited trial vers.) √√ 43 √√ Cooper, H., Hedges, L. V., & Valentine, J. C. (Eds.) (2009). The handbook of research synthesis and meta-analysis. New York: The Russell Sage Foundation. Lipsey, M. W., & Wilson, D. B. (2001). Practical metaanalysis. Thousand Oaks, CA: Sage. Van den Noortgate, W., & Onghena, P. (2005). Metaanalysis. In B. S. Everittt, & D. C. Howell (Eds), Encyclopedia of Statistics in Behavioral Science (Vol. 3 pp. 1206-1217). Chichester, UK: John Wiley & Sons. 44 Site of David Wilson http://mason.gmu.edu/~dwilsonb/ma.html Site of William Shadish faculty.ucmerced.edu/wshadish/ 45