Funded through the ESRC’s Researcher Development Initiative Session 2.4: 3-level meta-analyses Prof. Herb Marsh Ms. Alison O’Mara Dr. Lars-Erik Malmberg Department of Education, University of Oxford Session 2.4: 3-level meta-analyses Establish research question Define relevant studies Develop code materials Data entry and effect size calculation Pilot coding; coding Locate and collate studies Main analyses Supplementary analyses The odds-ratio is based on a 2 by 2 contingency table The Odds-Ratio is the odds of success in the treatment group relative to the odds of success in the control group (in the present application males and females) Frequencies Success Failure Treatment Group a b Control Group c d ad ES OR bc 265 75 ES OR 3.044 32 204 loge (3.044) 1.113 1.113/ 1.83 0.61 XLS Gender Differences in Peer Review ** Bornmann, L. (2007). Bias cut. Women, it seems, often get a raw deal in science—So how can discrimination be tackled? Nature, 445(7127), 566. ** Bornmann, L., Mutz, R. & Daniel, H. D. (2007). Gender differences in grant peer review: A metaanalysis. Journal of Informetrics, 1, 226–238. Abstract: Narrative reviews of peer review research have concluded that there is negligible evidence of gender bias in the awarding of grants based on peer review. Here, we report the findings of a meta-analysis of 21 studies providing, to the contrary, evidence of robust gender differences in grant award procedures. Even though the estimates of the gender effect vary substantially from study to study, the model estimation shows that all in all, among grant applicants men have statistically significant greater odds of receiving grants than women by about 7% Gender Differences in Peer Review Bornmann et al. conducted a multilevel to meta-analysis based peer reviews (grant applications & fellowship applications): 66 effect sizes from 21 studies and a total of 353,725 applications. They found a statistically significant but small effect in favour of men (an effective odds-ratio of 1.07). However, there was systematic variation in the effect sizes beyond random sampling error, suggesting that their results were not generalizable. Typically the next step would be to consider moderators: type of application (grants vs. pre- & post-doctoral fellowships), discipline, or country. However, they noted that : “Unfortunately, the inclusion of one or more of these characteristics into the calculation of the meta-analysis resulted in models that did not converge in the estimation process. This finding indicated that the model estimation became too complex by considering specific interaction effects or the included characteristics had no influence on the outcome, respectively.” Gender Differences in Peer Review (Meta-analysis data from Bornmann, Mutza, & Daniel,, 2007 Are women disadvantaged in Peer Reviews? Based on 66 outcomes from 21 studies, we evaluate whether there are systematic gender differences in success. Important moderator variables include type of peer review (grants, fellowships), discipline, country, and year. 9 Gender Differences in Peer Review 10 Peer Review Mean: Unwted Study Level Descriptive Statistics N id study Year Disc type NMale NFem NMfail NMSucc NFFail NFSucc NTot oddrat SE logOR cntry Valid N (listwise) 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 Minimum 1 1 1979 1 0 33 8 8 12 2 1 45.00000 .22564 .02822 -1.48881 1.00000 Maximum 66 21 2004 5 1 33300 8427 25377 8495 6309 2118 41727.00 4.50591 1.05201 1.50539 8.00000 Mean 33.50 10.53 1996.28 2.44 .39 4237.35 1122.12 2903.50 1333.85 799.05 323.08 5359.470 .9614770 .3065590 -.1520434 4.3787879 Std. Deviation 19.196 6.991 3.855 1.125 .492 8533.083 2077.768 6050.820 2560.022 1479.742 626.641 10552.82326 .56391112 .25499937 .46878270 2.55863636 Peer Review Mean: Wted by Ntot Descriptive Statistics id study Year Disc type NMale NFem NMfail NMSucc NFFail NFSucc NTot oddrat SE logOR cntry Valid N (listwise) N 353725 353725 353725 353725 353725 353725 353725 353725 353725 353725 353725 353725 353725 353725 353725 353725 353725 Minimum 1 1 1979 1 0 33 8 8 12 2 1 45.00000 .22564 .02822 -1.48881 1.00000 Maximum 66 21 2004 5 1 33300 8427 25377 8495 6309 2118 41727.00 4.50591 1.05201 1.50539 8.00000 Mean 38.78 11.94 1999.54 3.53 .13 20762.62 5060.59 14606.89 6155.72 3536.99 1523.60 25823.21 1.0376342 .0566314 .0256373 7.0438985 Std. Deviation 9.556 3.414 3.706 .943 .333 10970.531 2551.219 8136.598 2998.461 1852.320 761.491 13456.88077 .16703955 .07326371 .15239431 1.98192573 Preliminary Box Plots: Total Sample Weighted Unweighted Potential Outliers 90th %tile 75th %tile No Gender Difference Median 25th %tile 10th %tile Potential Outliers Median Effect Size slightly in favour of men Median Effect Size slightly in favour of women Preliminary Box Plots: Type No Gender Difference Preliminary Box Plots: Country No Gender Difference Preliminary Box Plots: Discipline No Gender Difference Preliminary Box Plots: Year No Gender Difference Is only a sample of studies from the entire population of studies to be considered. As a result, do want to generalise to other studies not included in the sample (e.g., future studies). Variability between effect sizes is due to sampling error plus variability in the population of effects. In contrast to fixed effects models, there are 2 sources of variance Effect sizes are independent. d j u j ej Where dj is the observed effect size in study j δ is the mean ‘true’ population effect size uj is the deviation of the true study effect size from the mean true effect size and ej is the residual due to sampling variance in study j Like the fixed effects model, there are 2 general ways of conducting a random effects meta-analysis: ANOVA & multiple regression The analogue to the ANOVA homogeneity analysis is appropriate for categorical variables Looks for systematic differences between groups of responses within a variable Multiple regression homogeneity analysis is more appropriate for continuous variables and/or when there are multiple variables to be analysed Tests the ability of groups within each variable to predict the effect size Can include categorical variables in multiple regression as dummy variables SPSS Commands GET FILE= compute NTot = NMale + NFem. compute oddrat= (NFSucc/NFFail)/(NMSucc/NMFail). COMPUTE SE = SQRT(1/NFSucc+ 1/NFFail + 1/NMSucc + 1/NMfail) . COMPUTE logOR = LN(oddrat) . compute w = 1/SE**2. COMPUTE WTNTOT = NTOT/5359.47. MeanES MACRO MeanES ES=logor /W=w. ------- Distribution Description --------------------------------N Min ES Max ES Wghtd SD 66.000 -1.489 1.505 .140 ------- Fixed & Random Effects Model ----------------------------Mean ES -95%CI +95%CI SE Z P Fixed .0218 .0033 .0403 .0094 2.3105 .0209 Random -.0624 -.1122 -.0126 .0254 -2.4558 .0141 ------- Random Effects Variance Component -----------------------v = .015089 ------- Homogeneity Analysis ------------------------------------Q df p 221.2850 65.0000 .0000 Random effects v estimated viaonnoniterative moments. Conclusions: Small effect size based both Fixed & method Randomofmodels. Slightly in favour of females for Fixed effects, slightly in favour of males for random effects Significant study-to-study variation so random effects and search for moderators appropriate. MetaF MACRO (ANOVA) Type (0=grants, 1=fellowships) METAF ES=logor /W=w /group = type /MODEL=ML . ------- Analog ANOVA table (Homogeneity Q) ------Q df p Between 27.1997 1.0000 .0000 Within 82.4019 64.0000 .0606 Total 109.6017 65.0000 .0005 ------- Q by Group ------Group Qw df p .0000 35.6143 39.0000 .6251 1.0000 46.7876 25.0000 .0052 ------- Effect Size Results Total ------Mean ES SE -95%CI +95%CI Z P k Total -.0473 .0197 -.0859 -.0088 -2.4073 .0161 66.0000 ------- Effect Size Results by Group ------Group Mean ES SE -95%CI +95%CI Z P k .0000 .0224 .0238 -.0242 .0690 .9438 .3453 40.0000 1.0000 -.1980 .0349 -.2665 -.1295 -5.6661 .0000 26.0000 ------- Maximum Likelihood Random Effects Variance Component ------v = .00653 se(v) = .00295 Conclusions: Grants: NS effect in favour of women; Fellowships: significant effect in favour of men (but varies from study-to-study); Within variance NS but Var Comp significant; some study-to-study variation remains (particularly in fellowship applications); MetaF MACRO (ANOVA) Discipline: 1=Phys 2=Biomed 3=SocSc 4=Mult 5=human METAF ES=logor /W=w /group = Disc /MODEL=ML . Q df p Between 21.8346 4.0000 .0002 Within 78.7070 61.0000 .0631 Total 100.5417 65.0000 .0031 ------- Q by Group ------Group Qw df p 1.0000 32.8674 13.0000 .0018 2.0000 27.2003 25.0000 .3460 3.0000 6.5260 10.0000 .7693 4.0000 11.6156 12.0000 .4770 5.0000 .4977 1.0000 .4805 ------- Effect Size Results Total ------Mean ES SE -95%CI +95%CI Z P k Total -.0527 .0212 -.0943 -.0110 -2.4799 .0131 66.0000 ------- Effect Size Results by Group ------Group Mean ES SE -95%CI +95%CI Z P k 1.0000 -.0382 .0647 -.1650 .0886 -.5908 .5546 14.0000 2.0000 -.1403 .0344 -.2077 -.0730 -4.0833 .0000 26.0000 3.0000 -.3285 .1070 -.5381 -.1188 -3.0710 .0021 11.0000 4.0000 .0383 .0312 -.0228 .0994 1.2281 .2194 13.0000 5.0000 .0404 .2694 -.4876 .5685 .1500 .8807 2.0000 ------- Maximum Likelihood Random Effects Variance Component ------v = .00850 se(v) = .00358 Conclusions: Significant differences in favour of men for biomedical (2) and Social Sciences (3); other disciplines NS MetaF MACRO (ANOVA): Country: 1=Australia 2=Canada 3=Germany 4=Europe 5=Netherlands 6=Sweden 7=UK 8=USA METAF ES=logor /W=w /group = cntry /MODEL=ML ------- Analog ANOVA table (Homogeneity Q) ------Q df p Between 35.4511 7.0000 .0000 Within 74.3872 58.0000 .0723 Total 109.8383 65.0000 .0004 ------- Q by Group ------Group Qw df 1.0000 6.5832 12.0000 2.0000 1.3786 2.0000 3.0000 14.2938 14.0000 4.0000 12.1417 6.0000 5.0000 12.8904 4.0000 6.0000 .0000 .0000 7.0000 12.0012 9.0000 8.0000 15.0982 11.0000 p .8839 .5019 .4281 .0589 .0118 -9.0000 -9.0000 -9.0000 ------- Effect Size Results Total Mean ES SE -95%CI Total -.0472 .0196 -.0856 ------+95%CI -.0087 ------- Effect Size Results by Group ------Group Mean ES SE -95%CI +95%CI 1.0000 -.0245 .0877 -.1965 .1474 2.0000 -.1390 .0848 -.3052 .0271 3.0000 -.1843 .0419 -.2665 -.1022 4.0000 -.1767 .0572 -.2888 -.0646 5.0000 -.1762 .2029 -.5738 .2214 6.0000 -1.4289 .6013 -2.6074 -.2503 7.0000 .0539 .0719 -.0870 .1948 8.0000 .0460 .0285 -.0098 .1018 Z -2.4047 Z -.2795 -1.6400 -4.3988 -3.0890 -.8686 -2.3763 .7502 1.6158 P .0162 P .7799 .1010 .0000 .0020 .3851 .0175 .4531 .1061 k 66.0000 k 13.0000 3.0000 15.0000 7.0000 5.0000 1.0000 10.0000 12.0000 Sweden ------- Maximum Likelihood Random Effects Variance Component ------v = .00649 se(v) = .00294 Conclusions: BIG difference in favour of men in Sweden; smaller differences in favour of men in Germany and Europe; NS differences for other countries. Website Address to get MLWIN Harvey Goldstein developed the MLWIN statistical package used here and has made many contributions to multilevel modeling, including meta-analysis. Setting Up Meta-analysis 2 3 4 1. Click on the equation 2. make logOR the “y” variable 3. indicate a three level model with L3=study, L2=id, L3=LogOR 4. Click “done” button Setting Up Meta-analysis 1 1.Click “Cons” in the equation 2 2.Tick “Fixed Parameter” “(study)” & “i(d)” but not “logOR” 3.Click the “done” 3 button Setting Up Meta-analysis 23 4 1 1.Now click “add term” button 2. This will bring up the “X-Variable” select SE (the standard error computed earlier) 3.Tick only the “logOR” box 4. Click “done” Setting Up Meta-analysis 1 1 Now we want to constrain the variance at level 1 to be fixed at 1.0. Under “model” select “constrain parameters”; will bring up “parameter constraint” window Setting Up Meta-analysis 6 2&3 1 5 4 In the parameter constraint window: 1. Click the “random” button; 2.Change “logOR: SE/SE” to 1; 3. Change “to equal” to 1”; 4. “store” the constraints in the first empty column (here “C27”); 5. Click the “attach random constraints” button; 6. Close “Parameter Constraint” Window “null” model with no predictors Conclusion: The mean effect size (-.101/.040) is significant. The chi-square (389.88) is signif; there is study-tostudy variation. explore moderator variables ->pred c50->calc c51=(('logOR'-c50)/'se')**2->sum c51 to b1 = 389.88 ->cprob b1 65 = 5.6052e-045 After Closing the “parameter constraint” window (last slide) Click on “start” button in “equation” window (may have to click estimates button to get values). Compute chi-square value in command interface window Add “Type” (0=grant, 1=fellow) ->pred c50->calc c51=(('logOR'-c50)/'se')**2->sum c51 to b1= 171.34 ->cprob b1 64 1.4376e-011 Conclusion: The effect of type (-.196/.052) is highly significant The mean effect size (-.007/.034) NS for Type = grant (intercept) chi-sq (171.34) signif; remaining study-to-study variation. Add “DISC”: 1=Phys 2=Biomed 3=SocSc *4=Mult 5=human ->pred c50->calc c51 = (('logor' - c50)/'se')**2->sum c51 to b1 = 188.59 ->cprob b1 61 = .1875e-014 Conclusion: The effect of DISC is highly significant (change in chi-sq = 389.88 -188.59 = 200.29 (df = 4). Men signif more successful than women in SocSci (relative to multidis, the reference category that is NS. Add “CNTRY”: 1=Australia 2=Canada 3=Germany 4=Europe 5=Netherlands 6=Sweden 7=UK 8=USA >pred c50->calc c51 = (('logor'-c50)/'se')**2->sum c51 to b1 = 158.76 ->cprob b1 58 = 1.7144e-011 Conclusion: The effect of CNTRY is highly significant (change in chi-sq = 389.88 -158.76 = 189.59 (df = 7). Men signif more successful is Swenden (but note large SE) and Germany relative to US (reference category which is NS). Add “Year” ->pred c50->calc c51 = (('logor' - c50)/'se')**2->sum c51 to b1 = 344.02 ->cprob b1 64 = 6.6568e-038 Conclusion: The Linear effect of YEAR is NS. Notice that I changed the intercept to be 2000 (rather than “0” – which is completely out of the range. Add “Type” & “DISC”: 1=Phys 2=Biomed 3=SocSc *4=Mult 5=human Note that solution is technically improper (study level constrained to be non-negative) ->pred c50->calc c51 = (('logor' - c50)/'se')**2->sum c51 to b1 = 105.47 ->cprob b1 60 = 0.00026315 Conclusion: General pattern of results for each variable considered separately still evident. Reference category (Type = grants, Disc = Multi) still NS. Results should be interpreted cautiously because improper solution. Add “Type” x “DISC” Interact 1=Phys 2=Biomed 3=SocSc *4=Mult 5=human Note that solution is technically improper (study level constrained to be non-negative) ->pred c50->calc c51 = (('logor' - c50)/'se')**2->sum c51 to b1= 103.80 ->cprob b1 56 = 0.00010859 Conclusion: The change in chi-sq is NS, suggesting that there is no interaction. Results should be interpreted cautiously because improper solution. Add “Type” & “CNTRY”: 1=Australia 2=Canada 3=Germany 4=Europe 5=Netherlands 6=Sweden 7=UK 8=USA Improper 3-level solution as study-level variance component negative ( constrained non-negative). ->calc c51 = (('logor' - c50)/'se')**2->sum c51 to b1 = 99.068 ->cprob b1 57 = 0.00046801 Conclusion: General pattern of results similar. Men signif more successful is Sweden (but note large SE) and Germany relative to reference category (US Grants). Type x CNTRY Interaction 1=Australia 2=Canada 3=Germany 4=Europe 5=Netherlands 6=Sweden 7=UK 8=USA ->pred c50->calc c51 = (('logor' - c50)/'se')**2->sum c51 to b1 = 96.890 ->cprob b1 50 = 7.8690e-005 Conclusion: The change in chi-sq is NS, suggesting that there is no interaction. Results should be interpreted cautiously because improper solution. Main Effects of Type, Disc & Country -->pred c50->calc c51 = (('logor' - c50)/'se')**2->sum c51 to b1 92.164 >cprob b1 53 0.00069082 Conclusion: When all main effects are included, Type effect nearly unaffected. However, none of the disc effects are significant, although the Sweden and (marginally) Germany are still significant. Results should be interpreted cautiously because improper solution. Graphs: Caterpillar Plots Caterpillar plot based on L1 residuals. Go to the “model” menu and select “residuals” option. This will bring up the “settings” window. Set “SD (comparative)” to 1.96; 3. Set “level” to “1logOR”; 4. click the “Calc” button; 5. click on the “plot” button to bring up the next window. In the “plot” window select “residual +/1 1.96SD x rank. This brings up the original graph. Clicking on the graph bring up a window to modify the graph (a bit) The mean effect size was very small, but significantly in favour of men. However, the results did not generalise across studies (there was study-to-study variation). The effect size was significantly moderated by the type; it was almost exactly 0 for grants and in favour of men for fellowship applications. This difference was not moderated or mediated by other moderators. There appeared to be some discipline effects (bias in favour of men in social sciences) and country effects (large bias in favour of men for Sweden). However, when all “main” effects included, discipline effects disappeared. For Grant Proposals there was no evidence of any effect of gender on outcome. Purpose-built Comprehensive Meta-analysis (commercial) Schwarzer (free, http://userpage.fuberlin.de/~health/meta_e.htm) Extensions to standard statistics packages SPSS, Stata and SAS macros, downloadable from http://mason.gmu.edu/~dwilsonb/ma.html Stata add-ons, downloadable from http://www.stata.com/support/faqs/stat/meta.html HLM – V-known routine MLwiN MPlus Bornmann, L. (2007). Bias cut. Women, it seems, often get a raw deal in science—So how can discrimination be tackled? Nature, 445(7127), 566. Bornmann, L., Mutz, R. & Daniel, H. D. (2007). Gender differences in grant peer review: A meta-analysis. Journal of Informetrics, 1, 226–238. Cooper, H., & Hedges, L. V. (Eds.) (1994). The handbook of research synthesis (pp. 521–529). New York: Russell Sage Foundation. Hox, J. (2003). Applied multilevel analysis. Amsterdam: TT Publishers. Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and bias in research findings. Newbury Park: Sage Publications. Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage Publications.