Funded through the ESRC’s Researcher Development Initiative Session 1.2: Introduction Prof. Herb Marsh Ms. Alison O’Mara Dr. Lars-Erik Malmberg Department of Education, University of Oxford Meta-analysis is an increasingly popular tool for summarising research findings Cited extensively in research literature Relied upon by policymakers Important that we understand the method, whether we conduct or simply consume meta-analytic research Should be one of the topics covered in all introductory research methodology courses Meta-analysis: a statistical analysis of a set of estimates of an effect (the effect sizes), with the goal of producing an overall (summary) estimate of the effects. Often combined with analysis of variables that moderate/predict this effect Systematic review: a comprehensive, critical, structured review of studies dealing with a certain topic. They are characterised by a scientific, transparent approach to study retrieval and analysis Most meta-analyses start with a systematic review Coding: the process of extracting the information from the literature included in the meta-analysis. Involves noting the characteristics of the studies in relation to a priori variables of interest (qualitative) Effect size: the numerical outcome to be analysed in a meta-analysis; a summary statistic of the data in each study included in the meta-analysis (quantitative) Summarise effect sizes: central tendency, variability, relations to study characteristics (quantitative) Establish research question Define relevant studies Develop code materials Data entry and effect size calculation Pilot coding; coding Locate and collate studies Main analyses Supplementary analyses Establish research question Define relevant studies Develop code materials Data entry and effect size calculation Pilot coding; coding Locate and collate studies Main analyses Supplementary analyses Comparison of treatment & control groups What is the effectiveness of a reading skills program for treatment group compared to an inactive control group? Pretest-posttest differences Is there a change in motivation over time? What is the correlation between two variables What is the relation between teaching effectiveness and research productivity Moderators of an outcome Does gender moderate the effect of a peer-tutoring program on academic achievement? Do you wish to generalise your findings to other studies not in the sample? Do you have multiple outcomes per study. e.g.: achievement in different school subjects; 5 different personality scales; multiple criteria of success Such questions determine the choice of metaanalytic model fixed effects random effects multilevel Brown, S. A. (1990). Studies of educational interventions and outcomes in diabetic adults: A meta-analysis revisited. Patient Education and Counseling, 16,189-215 Establish research question Define relevant studies Develop code materials Data entry and effect size calculation Pilot coding; coding Locate and collate studies Main analyses Supplementary analyses Need to have explicit inclusion and exclusion criteria The broader the research domain, the more detailed they tend to become Refine criteria as you interact with the literature Components of a detailed criteria distinguishing features research respondents key variables research methods cultural and linguistic range time frame publication types Brown, S. A., Upchurch, S. L., & Acton, G. J. (2003). A framework for developing a coding scheme for meta-analysis. Western Journal of Nursing Research, 25, 205-222 Search electronic databases (e.g., ISI, Psychological Abstracts, Expanded Academic ASAP, Social Sciences Index, PsycINFO, and ERIC) Examine the reference lists of included studies to find other relevant studies If including unpublished data, email researchers in your discipline, take advantage of Listservs, and search Dissertation Abstracts International “motivation” OR “job satisfaction” produces ALL articles that contain EITHER motivation OR job satisfaction anywhere in the text inclusive, larger yield “motivation” AND “job satisfaction” will capture only those subsets that have BOTH motivation AND job satisfaction anywhere in the text restrictive, smaller yield Check abstract & title NO DISCARD NO DISCARD YES Check the participants and results sections YES COLLECT Inclusion process usually requires several steps to cull inappropriate studies Example from Bazzano, L. A., Reynolds, K., Holder, K. N., & He, J. (2006).Effect of Folic Acid Supplementation on Risk of Cardiovascular Diseases: A Metaanalysis of Randomized Controlled Trials. JAMA, 296, 2720-2726 Establish research question Define relevant studies Develop code materials Data entry and effect size calculation Pilot coding; coding Locate and collate studies Main analyses Supplementary analyses The researcher must have a thorough knowledge of the literature. The process typically involves (Brown et al., 2003): a) reviewing a random subset of studies to be synthesized, b) listing all relevant coding variables as they appear during the review, c) including these variables in the coding sheet, and d) pilot testing the coding sheet on a separate subset of studies. Coded data usually fall into the following four basic categories: 1. methodological features Study identification code Type of publication Year of publication Country Participant characteristics Study design (e.g., random assignment, representative sampling) 2. substantive features Variables of interest (e.g., theoretical framework) 3. study quality ‘Total’ measure of quality & study design 4. outcome measures - Effect size information The code book guides the coding process Almost like a dictionary or manual “...each variable is theoretically and operationally defined to facilitate intercoder and intracoder agreement during the coding process. The operational definition of each category should be mutually exclusive and collectively exhaustive” (Brown et al., 2003, p. 208). Code Sheet __ 1 Study ID _99_ Year of publication __ 2 Publication type (1-5) __ 1 Geographical region (1-7) _87_ _ _ Total sample size _41_ _ Total number of males _46_ _ Total number of females Code Book Publication type (1-5) 1. Journal article 2. Book/book chapter 3. Thesis or doctoral dissertation 4. Technical report 5. Conference paper From Brown, et al. (2003). Code sheet = Table 1. Code book = Table 4. Establish research question Define relevant studies Develop code materials Data entry and effect size calculation Pilot coding; coding Locate and collate studies Main analyses Supplementary analyses Random selection of papers coded by both coders Meet to compare code sheets Where there is discrepancy, discuss to reach agreement Amend code materials/definitions in code book if necessary May need to do several rounds of piloting, each time using different papers Coding should ideally be done independently by 2 or more researchers to minimise errors and subjective judgements Ways of assessing the amount of agreement between the raters: Percent agreement Cohen’s kappa coefficient Correlation between different raters Intraclass correlation Establish research question Define relevant studies Develop code materials Data entry and effect size calculation Pilot coding; coding Locate and collate studies Main analyses Supplementary analyses Lipsey & Wilson (2001) present many formulae for calculating effect sizes from different information However, need to convert all effect sizes into a common metric, typically based on the “natural” metric given research in the area. E.g.: Standardized mean difference Odds-ratio Correlation coefficient Standardized mean difference Group contrasts Treatment groups Naturally occurring groups Inherently continuous construct Odds-ratio Group contrasts Treatment groups Naturally occurring groups Inherently dichotomous construct Correlation coefficient Association between variables ES X Males X Females SD pooled ES ad bc ES r Means and standard deviations Correlations P-values F-statistics t-statistics d SE From Brown et al. (2003). Table 3 Establish research question Define relevant studies Develop code materials Data entry and effect size calculation Pilot coding; coding Locate and collate studies Main analyses Supplementary analyses Includes the entire population of studies to be considered; do not want to generalise to other studies not included (e.g., future studies). All of the variability between effect sizes is due to sampling error alone. Thus, the effect sizes are only weighted by the within-study variance. Effect sizes are independent. There are 2 general ways of conducting a fixed effects meta-analysis: ANOVA & multiple regression The analogue to the ANOVA homogeneity analysis is appropriate for categorical variables Looks for systematic differences between groups of responses within a variable Multiple regression homogeneity analysis is more appropriate for continuous variables and/or when there are multiple variables to be analysed Tests the ability of groups within each variable to predict the effect size Can include categorical variables in multiple regression as dummy variables. (ANOVA is a special case of multiple regression) Is only a sample of studies from the entire population of studies to be considered; want to generalise to other studies not included (including future studies). Variability between effect sizes is due to sampling error plus variability in the population of effects. Effect sizes are independent. Variations in sampling schemes can introduce heterogeneity to the result, which is the presence of more than one intercept in the solution Heterogeneity: between-study variation in effect estimates is greater than random (sampling) variance Could be due to differences in the study design, measurement instruments used, the researcher, etc Random effects models attempt to account for between-study differences If the homogeneity test is rejected (it almost always will be), it suggests that there are larger differences than can be explained by chance variation (at the individual participant level). There is more than one “population” in the set of different studies. The random effects model helps to determine how much of the between-study variation can be explained by study characteristics that we have coded. The total variance associated with the effect sizes has two components, one associated with differences within each study (participant level variation) and one between study variance Meta-analytic data is inherently hierarchical (i.e., effect sizes nested within studies) and has random error that must be accounted for. Effect sizes are not necessarily independent Allows for multiple effect sizes per study Level 2: study component Publications Level 1: outcome-level component Effect sizes Similar to a multiple regression equation, but accounts for error at both the outcome (effect size) level and the study level Start with the intercept-only model, which incorporates both the outcome-level and the studylevel components (analogous to the random effects model multiple regression) Expand model to include predictor variables, to explain systematic variance between the study effect sizes Fixed, random, or multilevel? Generally, if more than one effect size per study is included in sample, multilevel should be used However, if there is little variation at study level and/or if there are no predictors included in the model, the results of multilevel modelling metaanalyses are similar to random effects models Do you wish to generalise your findings to other studies not in the sample? Yes – random No – fixed effects or effects multilevel Do you have multiple outcomes per study? Yes – multilevel No – random effects or fixed effects Establish research question Define relevant studies Develop code materials Data entry and effect size calculation Pilot coding; coding Locate and collate studies Main analyses Supplementary analyses Publication bias Fail-safe N (Rosenthal, 1991) Trim and fill procedure (Duval & Tweedie, 2000a, 2000b) Sensitivity analysis E.g., Vevea & Woods (2005) Power analysis E.g., Muncer, Craigie, & Holmes (2003) Study quality Quality weighting (Rosenthal, 1991) Use of kappa statistic in determining validity of quality filtering for meta-analysis (Sands & Murphy, 1996). Regression with “quality” as a predictor of effect size (see Valentine & Cooper, 2008) Establish research question Define relevant studies Develop code materials Data entry and effect size calculation Pilot coding; coding Locate and collate studies Main analyses Supplementary analyses Brown, S. A., Upchurch, S. L., & Acton, G. J. (2003). A framework for developing a coding scheme for meta-analysis. Western Journal of Nursing Research, 25, 205-222. Duval, S., & Tweedie, R. (2000a). A Nonparametric "Trim and Fill" Method of Accounting for Publication Bias in Meta-Analysis. Journal of the American Statistical Association, 95, 89-98. Duval, S., & Tweedie, R. (2000b). Trim and fill: A simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics, 56, 455–463 Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage Publications. Muncer, S. J., Craigie, M., & Holmes, J. (2003). Meta-analysis and power: Some suggestions for the use of power in research synthesis. Understanding Statistics, 2, 112. Rosenthal, R. (1991). Quality-weighting of studies in meta-analytic research. Psychotherapy Research, 1, 25-28. Sands, M. L., & Murphy, J. R. (1996). Use of kappa statistic in determining validity of quality filtering for meta-analysis: A case study of the health effects of electromagnetic radiation. Journal of Clinical Epidemiology, 49, 1045-1051. Valentine, J. C., & Cooper, H. M. (2008). A systematic and transparent approach for assessing the methodological quality of intervention effectiveness research: The Study Design and Implementation Assessment Device (Study DIAD). Psychological Methods, 13, 130-149. Vevea, J. L., & Woods, C. M. (2005). Publication bias in research synthesis: Sensitivity analysis using a priori weight functions. Psychological Methods, 10, 428–443.