Funded through the ESRC’s Researcher Development Initiative Session 3.2: Multivariate meta-analysis Prof. Herb Marsh Ms. Alison O’Mara Dr. Lars-Erik Malmberg Department of Education, University of Oxford Session 3.2: Multivariate meta-analysis Establish research question Define relevant studies Develop code materials Data entry and effect size calculation Pilot coding; coding Locate and collate studies Main analyses Supplementary analyses Involves the analysis of multiple outcomes simultaneously Multiple outcomes could be due to: Different outcomes (e.g., math achievement and verbal achievement) Correlations with multiple variables (e.g., age with achievement and age with aspirations) Evaluation of different treatments in the same publication More than one control/comparison group Violations of independence occur when studies produce multiple effect sizes due to the presence of multiple treatment groups or multiple outcome measures Effect sizes from the same study are more likely to have a higher correlations than effect sizes from different studies Issue of within versus between-study variation Choose one outcome of interest Separate analyses on each outcome Averaging the effect sizes (one outcome study) Shifting unit of analysis (Cooper, 1998) Multivariate multilevel modelling Select the outcome that is of most interest This is appropriate for many research questions However, does not allow contrasts between outcomes, thereby restricting the questions you can ask For each analysis, only one outcome (effect size) per study is contributed to the analysis E.g., run separate analyses on maths achievement effect sizes, and a different set of analyses on the verbal achievement effect sizes The effect sizes are independent within the particular analysis, but does not allow direct comparison between the outcomes Therefore, this may not always make sense for the research question under consideration (Rosenthal & Rubin, 1986) Establish an independent set of effect sizes by calculating the average of the effect sizes in the study E.g., achievement, intelligence, satisfaction, personality, obesity However, the dependent variables need to be almost perfectly correlated for this method to work, because the mean effect size gives an estimate that is lower than expected (Rosenthal & Rubin, 1986) To make the results meaningful, outcomes should be conceptually similar The outcomes are aggregated depending on the level of analysis of interest—the study or outcome level At the study level, all effect sizes from within a study are aggregated to produce one outcome per study For each moderator analysis, effect sizes are aggregated based upon the particular moderator variable, such that each study only includes one effect size per outcome on that particular variable The effect sizes for two self-concept domains (e.g., physical and academic self-concept) from the same primary study would initially be averaged to produce a single effect size for calculations involving the overall effect size for the sample (study level) For the moderator analyses, the two self-concept domains would be considered separately if the type of domain was of interest, but would be aggregated if the moderator variable of interest was, say, the type of control group This means that the n of effect sizes contributing to the analysis will change depending on the variables being examined Total n of effect sizes = 14 4 publications Publication 2 outcomes: effect size = difference between treatment 2 interventions: & control group 1 = math, 2 = verbal Intervention 1 1 Math achieve Verbal achieve .7 .3 .5 .7 2 3 3 4 .3 .8 .3 .4 .6 .2 .9 .8 2 1 2 2 4 .3 .9 2 1 2 Publication Math achieve Verbal achieve Intervention 1 .7 .3 1 1 .5 .7 2 2 .3 .6 2 For study 4, effect size = (.4 + .3 + .8 + .9)/4 = .6 3 .8 .2 1 Publication Achieve Intervention 3 .3 .9 2 4 .4 .8 2 1 .5 1 4 .3 .9 2 1 .6 2 2 .45 2 3 .5 1 3 .6 2 4 .6 2 Only publication 4 has more than one of the same type of intervention Total n of effect sizes =6 One effect size per study for maths interventions, one per study for verbal interventions Publication Math achieve Verbal achieve Intervention 1 .7 .3 1 1 .5 .7 2 2 .3 .6 2 3 .8 .2 1 3 .3 .9 2 4 .4 .8 2 4 .3 .9 2 Total n of effect sizes =8 One math effect size per study, one verbal effect size per study Calculate the average for math for study 1: (.7 + .5)/2 = .6 Calculate the average for verbal for study 1: (.3 + .7)/2 = .5 Publicatio n Math achieve Verbal achieve 1 .6 .5 2 .3 .6 3 .55 .55 4 .35 .85 Although this strategic compromise does not eliminate the problem of independence, this approach minimizes violations of assumptions about the independence of effect sizes, whilst preserving as much of the data as possible (Cooper, 1998) Probably the most popular way of dealing with multiple outcomes in fixed and random effects models when explicitly interested in comparing different outcomes Multilevel modelling accounts for dependencies in the data because its nested structure allows for correct estimation of standard errors on parameter estimates and therefore accurate assessment of the significance of predictor variables (Bateman & Jones, 2003; Hox & de Leeuw, 2003; Raudenbush & Bryk, 2002). Meta-analytic data is inherently hierarchical (i.e., effect sizes nested within studies) and has random error that must be accounted for Effect sizes are not necessarily independent Allows for multiple effect sizes per study Also provides more precise and less biased estimates of between-study variance than traditional techniques Scholastic Aptitude Test (SAT) coaching effectiveness data reported in Kalaian and Raudenbush (1996), and Kalaian & Kasim (in press) The differences between the coached and uncoached groups on SAT scores in the collection of the SAT coaching effectiveness studies SAT tests are widely claimed to be so broad and generic (almost IQ-like) that could not be affected by short-term training program. Others suggest that a limited amount of "familiarisation" is useful but not much beyond this (i.e., non-linear effect of hours). Meta-analytic data information: Study ID Constant (cons) Effect-size for verbal SAT scores (dv) Effect-size for maths SAT scores (dm) Sampling variance (SE) and covariance (cov_VM) of the effect sizes Explanatory variables (study and sample characteristics) (hours, logHR, year) 1. Click on "responses" at the bottom of the screen 2. select "dv" and "dm" (the effect sizes for verbal and maths achievement, respectively) 1. Click on the equation 2. indicate a two level model with L2=study, L1 – resp_indicator 3. Click “done” button 1. Click "add term" 2. Select variable “cons” 3. Click “add separate coefficients” button 4.Click “Estimates” 2 3 1 4 1. Right-click on cons.dv in the equation 2. Select j 3. Click on Done 4. Right-click on cons.dm in the equation 5. Select j 6. Click on Done Click on “Estimates” Your screen should look like this 1. Click "add term" 2. Select variable “SE_V” 3. Click “add separate coefficients” button 1. Click on “estimates” to reveal numbers 2. Right click on SE_V.dm 3. Click on “Delete Term” 1. Click "add term" 2. Select variable “SE_M” 3. Click “add separate coefficients” button 1. Right click on SE_M.dv 2. Click on “Delete Term” 4. Right-click on 1. Right-click on SE_M.dm in the SE_V.dv in the equation equation 2. Select j , unselect 5. Select j , unselect Fixed parameter Fixed parameter 6. Click on Done 3. Click on Done Your model now looks like this. Some of the parameters in the random part of the model (the u’s) do not make sense to be estimated. The only random parameters that we want are those on the diagonal in the variance-covariance matrix. You can delete the unnecessary random parameters by clicking on them. For example, click on u02 The following screen will pop up. Click on Yes Delete all of the off-diagonal random parameters for the SEs, until your variancecovariance matrix looks like this Now we need to add the covariance value Click on “add term” Then select the covariance term, cov_VM, and click on add Common coefficient The following window will pop up Select dv and dm Click Done The covariance term needs to be manually calculated (see Kalaian & Kasim, in press) The formula is cov ( d ip , d ip ' ) n1ip n 2 ip n1ip n 2 ip 2 rip ,ip ' d ip d ip ' rip ,ip ' 2 ( n1ip n 2 ip ) Where n1 and n2 are the sample sizes for the 2 groups Rip,ip’ is the correlation between the two outcomes dip and dip’ are the two effect size outcomes 1. Click β4cov_VM.12j 2. Select the options as below 2 Your equation window should now look like this. Delete the off-diagonal covariance components for u4j by clicking on them Your equation window should now look like this. Under “Model” in the menu bar, click on “Constrain Parameters” The following window should pop up Click on the “random” radio button 1 2 3 4 1. Select the SE and cov_VM variances to be constrained by entering a ‘1’ in the boxes 2. Set them “to equal” 1 3. Choose a free column to store the constraint matrix in. In this case, we used C20 4. Click on attach random constraints 5. Go back to the “equations” window Select “Estimation” from the menu bar, then RIGLS Click Done when the window pops up Your model should look something like this... (you may need to click on “Estimates” to show the numbers in blue font) Click on START when you are ready to run the model This is the mean effect size for the Verbal SAT scores This is the mean effect size for the Maths SAT scores This is the between-study random effects for Verbal SAT scores This is the between-study random effects for Maths SAT scores On average, students scored higher on maths SAT than verbal SAT However, variance was larger for maths There was no significant between-study variation for maths (.012) or verbal (.004) SAT scores Given that there is no significant between-study variation, we would not normally fit the model with predictors. Figure 4. Box Plot of SAT-Verbal and SATMath Effect Sizes 0.6000 20 0.4000 0.2000 0.0000 -0.2000 -0.4000 SAT- Verbal SAT-Math Let’s look at a predictor anyway for demonstration purposes! Test whether a coaching intervention improves maths and verbal SAT scores Will the effects (size, direction, & significance) of the coaching be the same for the two outcomes? 1. Add Term 2. Select LogEHR (not LogHR) 3. Click on grand mean (mean = 19 hours) 4. Click on add separate coefficients 5. Run the model (“start”) Studies with Log coaching hours >2.75 (which is the study with 15 non-logged, ‘raw’ hours) will have a very nearly significant positive effect on Verbal SAT scores (β = .102). Studies with Log coaching hours >2.75 will have a significant positive effect on Maths SAT scores (β = .290) Calculating the covariance (in this case, ‘cov_VM’) requires knowing the correlation between the outcomes Often, primary studies do not report the correlations between the outcomes. Some methods are being developed that bypass this problem Riley, Thompson, & Abrams (2008): An alternative model for bivariate random-effects meta-analysis when the within-study correlations are unknown However, these are confined to bivariate studies What to do when more than 2 outcomes? Model will become very complex Currently under development The multivariate results account for the covariance between the verbal SAT and Maths SAT effect sizes. Kalaian, S. A. & Kasim, R. M. (in press). Applications of Multilevel Models for MetaAnalysis. Multilevel Analysis of Educational Data. O’Connell, A. and McCoach, B. D. (Eds.). Information Age Publishing. Kalaian, H. A., & Raudenbush, S. W. (1996). A Multivariate Mixed-Effects Linear Model for MetaAnalysis. Psychological Methods, 1(3). 227-235. R. D. Riley, J. R. Thompson, & K. R. Abrams (2008). An alternative model for bivariate random-effects meta-analysis when the within-study correlations are unknown, Biostatistics, 9, 172-186.