What software is available for calculating effect sizes?

Funded through the ESRC’s Researcher Development Initiative Session 2.1 – Revision of Day 1 Prof. Herb Marsh Ms. Alison O’Mara Dr. Lars-Erik Malmberg Department of Education, University of Oxford 2  What are the 3 primary types of effect sizes?  What sort of information can be used to calculate effect sizes?  What software is available for calculating effect sizes?  Standardized mean difference  Group contrasts  Treatment groups  Naturally occurring groups  Inherently continuous construct  Odds-ratio  Group contrasts  Treatment groups  Naturally occurring groups  Inherently dichotomous construct  Correlation coefficient  Association between variables X Males  X Females ES  SDpooled ad ES  bc ES  r 5 6  Standardised mean difference effect sizes indicate the amount of improvement of treatment group over control, or the difference between 2 groups.  Odds ratio effect sizes indicate the likelihood of something occurring, e.g., not catching an illness after inoculation  Correlation effect sizes indicate the strength of the relationship between 2 variables Effect size as proportion in the Treatment group doing better than the average Control group person d = .20 d = .80 d = .50 0.50 0.50 0.50 0.40 0.40 0.40 0.30 0.30 0.30 0.20 0.20 0.20 0.10 0.10 0.10 0.00 0.00 0.00 57% of T above xc 69% of T above xc 79% of T above xc = Control = Treatment Effect sizes can be thought of as the average percentile standing of the average treated participant relative to the average untreated participant. 8 Cohen's (1988) Standard LARGE MEDIUM SMALL Effect Size Percentile Standing Percent of Nonoverlap 2.0 97.7 81.1% 1.9 97.1 79.4% 1.8 96.4 77.4% 1.7 95.5 75.4% 1.6 94.5 73.1% 1.5 93.3 70.7% 1.4 91.9 68.1% 1.3 90 65.3% 1.2 88 62.2% 1.1 86 58.9% 1.0 84 55.4% 0.9 82 51.6% 0.8 79 47.4% 0.7 76 43.0% 0.6 73 38.2% 0.5 69 33.0% 0.4 66 27.4% 0.3 62 21.3% 0.2 0.1 0.0 58 54 50 14.7% 7.7% 0% 10  What are the key statistical assumptions of the 3 meta-analytic methods? Includes the entire population of studies to be considered; do not want to generalise to other studies not included (including future studies). All of the variability between effect sizes is due to sampling error alone. Thus, the effect sizes are only weighted by the within-study variance. assumes that the collected studies all represent random samples from the same population Effect sizes are independent. In this and following formulae, we will use the symbols d and δ to refer to any measure for the observed and the true effect size, which is not necessarily the standardized mean difference. d j    ej Where dj is the observed effect size in study j δ is the ‘true’ population effect and ej is the residual due to sampling variance in study j Is only a sample of studies from the entire population of studies to be considered. As a result, we do want to generalise to other studies not included in the sample (e.g., future studies). Variability between effect sizes is due to sampling error plus variability in the population of effects. In contrast to fixed effects models, there are 2 sources of variance Assumes that the studies are random samples of some population in which the underlying (infinite-sample) effect sizes have a distribution rather than having a single value. Effect sizes are independent. d j    u j  ej Where dj is the observed effect size in study j δ is the mean ‘true’ population effect size uj is the deviation of the true study effect size from the mean true effect size and ej is the residual due to sampling variance in study j Meta-analytic data is inherently hierarchical (i.e., effect sizes nested within studies) and has random error that must be accounted for Effect sizes are not necessarily independent Allows for multiple effect sizes per study  The model combines fixed and random effects (often called a mixed effects model) d j   0  u j  ej Where dj is the observed effect size in study j 0 is the mean ‘true’ population effect size uj is the deviation of the true study effect size from the mean true effect size and ej is the residual due to sampling variance in study j s d j   0   s X sj  u j  e j s 1  If between-study variance = 0, the multilevel model simplifies to the fixed effects regression model s d j   0   s X sj  e j s 1  If no predictors are included the model simplifies to random effects model d j    u j  ej  If the level 2 variance = 0 , the model simplifies to the fixed effects model d j    ej  Many meta-analysts use an adaptive (or “conditional”) approach IF between-study variance is found in the homogeneity test THEN use random effects model OTHERWISE use fixed effects model  Fixed effects models are very common, even though the assumption of homogeneity is “implausible” (Noortgate & Onghena, 2003)  There is a considerable lag in the uptake of new methods by applied meta-analysts  Meta-analysts need to stay on top of these developments by  Attending courses  Wide reading across disciplines 21  What is the first step in the analysis of metaanalytic data in fixed or random effects models?  What 2 common statistical techniques have been adapted for use in fixed and random effects metaanalytic modelling?  What common statistical technique is multilevel modelling analogous to? Usually start with a Q-test to determine the overall mean effect size and the homogeneity of the effect sizes (MeanES.sps macro) If there is significant homogeneity, then:  1) should probably conduct random effects analyses instead  2) model moderators of the effect sizes (determine the source/s of variance) ES i The homogeneity (Q) test asks whether the different effect sizes are likely to have all come from the same population (an assumption of the fixed effects model). Are the differences among the effect sizes no bigger than might be expected by chance?  Q   wi ES i  ES  2 ES i = effect size for each study (i = 1 to k) ES = mean effect size wi = a weight for each study based on the sample size However, this (chi-square) test is heavily dependent on sample size. It is almost always significant unless the numbers (studies and people in each study) are VERY small. This means that the fixed effect model will almost always be rejected in favour of a random effects model.  The Q-test is easy to conduct using the MeanES.sps macro from David Wilson’s website  MeanES ES=d /W=weight. Significant heterogeneity in the effect sizes therefore random effects more appropriate and/or moderators need to be modelled 26 The analogue to the ANOVA homogeneity analysis is appropriate for categorical variables  Looks for systematic differences between groups of responses within a variable  Easy to implement using MetaF.sps macro  MetaF ES = d /W = Weight /GROUP = TXTYPE /MODEL = FE. Multiple regression homogeneity analysis is more appropriate for continuous variables and/or when there are multiple variables to be analysed  Tests the ability of groups within each variable to predict the effect size  Can include categorical variables in multiple regression as dummy variables  Easy to implement using MetaReg.sps macro  MetaReg ES = d /W = Weight /IVS = IV1 IV2 /MODEL = FE.  If the homogeneity test is rejected (it almost always will be), it suggests that there are larger differences than can be explained by chance variation (at the individual participant level). There is more than one “population” in the set of different studies.  The random effects model determines how much of this between-study variation can be explained by study characteristics that we have coded.  The total variance associated with the effect sizes has two components, one associated with differences within each study (participant level variation) and one between study variance: vTi  v  vi The weighting for each effect size consists of the within-study variance (vi) and between-study variance (vθ) The new weighting for the random effects model (wiRE) is given by the formula: wiRE 1  vi  v 30 Thus, larger studies receive proportionally less weight in RE model than in FE model. This is because a constant is added to the denominator, so the relative effect of sample size will be smaller in RE model 31  Like the FE model, RE uses ANOVA and multiple regression to model potential moderators/predictors of the effect sizes, if the Qtest reveals significant heterogeneity  Easy to implement using MetaF.sps macro (ANOVA) or MetaReg.sps (multiple regression).  MetaF ES = d /W = Weight /GROUP = TXTYPE /MODEL = ML.  MetaReg ES = d /W = Weight /IVS = IV1 IV2 /MODEL = ML. Significant heterogeneity in the effect sizes therefore need to model moderators v  Q  ( k  1) w  i  wi 2  wi 33 Similar to multiple regression, but corrects the standard errors for the nesting of the data Start with an intercept-only (no predictors) model, which incorporates both the outcome-level and the study-level components This tells us the overall mean effect size Is similar to a random effects model Then expand the model to include predictor variables, to explain systematic variance between the study effect sizes 34 d j   0  u j  ej  (MLwiN screenshot) s d j   0   s X sj  u j  e j s 1  Using the same simulated data set with n = 15  Multilevel models:  build on the fixed and random effects models  account for between-study variance (like random effects)  Are similar to multiple regression, but correct the standard errors for the nesting of the data. Improved modelling of the nesting of levels within studies increases the accuracy of the estimation of standard errors on parameter estimates and the assessment of the significance of explanatory variables (Bateman and Jones, 2003).  Multilevel modelling is more precise when there is greater between-study heterogeneity  Also allows flexibility in modelling the data when one has multiple moderator variables (Raudenbush & Bryk, 2002)  Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates.  Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage Publications.  Van den Noortgate, W., & Onghena, P. (2003). Multilevel meta-analysis: A comparison with traditional meta-analytical procedures. Educational and Psychological Measurement, 63, 765-790.  Wilson’s “meta-analysis stuff” website: http://mason.gmu.edu/~dwilsonb/ma.html  Raudenbush, S.W. and Bryk, A.S. (2002). Hierarchical nd Linear Models (2 Ed.).Thousand Oaks: Sage Publications.

What software is available for calculating effect sizes?

Related documents

Products

Support

What software is available for calculating effect sizes?

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib