Gender Differences in Peer Review

advertisement
Funded through the ESRC’s Researcher
Development Initiative
Session 2.4: 3-level meta-analyses
Prof. Herb Marsh
Ms. Alison O’Mara
Dr. Lars-Erik Malmberg
Department of Education,
University of Oxford
Session 2.4: 3-level meta-analyses
Establish
research
question
Define
relevant
studies
Develop code
materials
Data entry
and effect size
calculation
Pilot coding;
coding
Locate and
collate studies
Main analyses
Supplementary
analyses
 The odds-ratio is based on a 2 by 2 contingency
table
 The Odds-Ratio is the odds of success in the
treatment group relative to the odds of success in
the control group (in the present application males
and females)
Frequencies
Success
Failure
Treatment Group
a
b
Control Group
c
d
ad
ES OR 
bc
265 75
ES OR 
 3.044
32 204
loge (3.044)  1.113
1.113/ 1.83  0.61
XLS
Gender Differences in Peer Review
** Bornmann, L. (2007). Bias cut. Women, it seems, often get a raw deal in science—So how can
discrimination be tackled? Nature, 445(7127), 566.
** Bornmann, L., Mutz, R. & Daniel, H. D. (2007). Gender differences in grant peer review: A metaanalysis. Journal of Informetrics, 1, 226–238.
Abstract: Narrative reviews of peer review research have
concluded that there is negligible evidence of gender bias
in the awarding of grants based on peer review. Here, we
report the findings of a meta-analysis of 21 studies
providing, to the contrary, evidence of robust gender
differences in grant award procedures. Even though the
estimates of the gender effect vary substantially from
study to study, the model estimation shows that all in all,
among grant applicants men have statistically significant
greater odds of receiving grants than women by about 7%
Gender Differences in Peer Review
Bornmann et al. conducted a multilevel to meta-analysis based
peer reviews (grant applications & fellowship applications): 66
effect sizes from 21 studies and a total of 353,725 applications.
They found a statistically significant but small effect in favour of
men (an effective odds-ratio of 1.07). However, there was
systematic variation in the effect sizes beyond random sampling
error, suggesting that their results were not generalizable.
Typically the next step would be to consider moderators: type of
application (grants vs. pre- & post-doctoral fellowships), discipline,
or country. However, they noted that :
“Unfortunately, the inclusion of one or more of these
characteristics into the calculation of the meta-analysis resulted
in models that did not converge in the estimation process. This
finding indicated that the model estimation became too complex
by considering specific interaction effects or the included
characteristics had no influence on the outcome, respectively.”
Gender Differences in Peer Review
(Meta-analysis data from Bornmann, Mutza, & Daniel,, 2007
Are women disadvantaged in Peer Reviews?
Based on 66 outcomes from 21 studies, we evaluate whether there
are systematic gender differences in success. Important moderator
variables include type of peer review (grants, fellowships),
discipline, country, and year.
9
Gender Differences in Peer Review
10
Peer Review Mean: Unwted Study Level
Descriptive Statistics
N
id
study
Year
Disc
type
NMale
NFem
NMfail
NMSucc
NFFail
NFSucc
NTot
oddrat
SE
logOR
cntry
Valid N (listwise)
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
66
Minimum
1
1
1979
1
0
33
8
8
12
2
1
45.00000
.22564
.02822
-1.48881
1.00000
Maximum
66
21
2004
5
1
33300
8427
25377
8495
6309
2118
41727.00
4.50591
1.05201
1.50539
8.00000
Mean
33.50
10.53
1996.28
2.44
.39
4237.35
1122.12
2903.50
1333.85
799.05
323.08
5359.470
.9614770
.3065590
-.1520434
4.3787879
Std. Deviation
19.196
6.991
3.855
1.125
.492
8533.083
2077.768
6050.820
2560.022
1479.742
626.641
10552.82326
.56391112
.25499937
.46878270
2.55863636
Peer Review Mean: Wted by Ntot
Descriptive Statistics
id
study
Year
Disc
type
NMale
NFem
NMfail
NMSucc
NFFail
NFSucc
NTot
oddrat
SE
logOR
cntry
Valid N (listwise)
N
353725
353725
353725
353725
353725
353725
353725
353725
353725
353725
353725
353725
353725
353725
353725
353725
353725
Minimum
1
1
1979
1
0
33
8
8
12
2
1
45.00000
.22564
.02822
-1.48881
1.00000
Maximum
66
21
2004
5
1
33300
8427
25377
8495
6309
2118
41727.00
4.50591
1.05201
1.50539
8.00000
Mean
38.78
11.94
1999.54
3.53
.13
20762.62
5060.59
14606.89
6155.72
3536.99
1523.60
25823.21
1.0376342
.0566314
.0256373
7.0438985
Std. Deviation
9.556
3.414
3.706
.943
.333
10970.531
2551.219
8136.598
2998.461
1852.320
761.491
13456.88077
.16703955
.07326371
.15239431
1.98192573
Preliminary Box Plots: Total Sample
Weighted
Unweighted
Potential
Outliers
90th %tile
75th %tile
No Gender
Difference
Median
25th %tile
10th %tile
Potential
Outliers
Median
Effect Size
slightly in
favour of
men
Median
Effect Size
slightly in
favour of
women
Preliminary Box Plots: Type
No Gender
Difference
Preliminary Box Plots: Country
No Gender
Difference
Preliminary Box Plots: Discipline
No Gender
Difference
Preliminary Box Plots: Year
No Gender
Difference
Is only a sample of studies from the entire
population of studies to be considered. As a result,
do want to generalise to other studies not included
in the sample (e.g., future studies).
Variability between effect sizes is due to sampling
error plus variability in the population of effects.
In contrast to fixed effects models, there are 2 sources of
variance
Effect sizes are independent.
d j    u j  ej
Where
dj is the observed effect size in study j
δ is the mean ‘true’ population effect size
uj is the deviation of the true study effect size from the
mean true effect size
and ej is the residual due to sampling variance in study j
Like the fixed effects model, there are 2 general
ways of conducting a random effects meta-analysis:
ANOVA & multiple regression
The analogue to the ANOVA homogeneity analysis
is appropriate for categorical variables
 Looks for systematic differences between groups of
responses within a variable
Multiple regression homogeneity analysis is more
appropriate for continuous variables and/or when
there are multiple variables to be analysed
 Tests the ability of groups within each variable to predict
the effect size
 Can include categorical variables in multiple regression
as dummy variables
SPSS Commands
GET FILE=
compute NTot = NMale + NFem.
compute oddrat=
(NFSucc/NFFail)/(NMSucc/NMFail).
COMPUTE SE = SQRT(1/NFSucc+ 1/NFFail +
1/NMSucc + 1/NMfail) .
COMPUTE logOR = LN(oddrat) .
compute w = 1/SE**2.
COMPUTE WTNTOT = NTOT/5359.47.
MeanES MACRO
MeanES ES=logor /W=w.
------- Distribution Description --------------------------------N
Min ES
Max ES
Wghtd SD
66.000
-1.489
1.505
.140
------- Fixed & Random Effects Model ----------------------------Mean ES
-95%CI
+95%CI
SE
Z
P
Fixed
.0218
.0033
.0403
.0094
2.3105
.0209
Random
-.0624
-.1122
-.0126
.0254 -2.4558
.0141
------- Random Effects Variance Component -----------------------v
=
.015089
------- Homogeneity Analysis ------------------------------------Q
df
p
221.2850
65.0000
.0000
Random effects
v estimated
viaonnoniterative
moments.
Conclusions:
Small effect
size based
both Fixed & method
Randomofmodels.
Slightly
in favour of females for Fixed effects, slightly in favour of males for random
effects
Significant study-to-study variation so random effects and search for
moderators appropriate.
MetaF MACRO (ANOVA)
Type (0=grants, 1=fellowships)
METAF ES=logor /W=w /group = type /MODEL=ML .
------- Analog ANOVA table (Homogeneity Q) ------Q
df
p
Between
27.1997
1.0000
.0000
Within
82.4019
64.0000
.0606
Total
109.6017
65.0000
.0005
------- Q by Group ------Group
Qw
df
p
.0000 35.6143 39.0000
.6251
1.0000 46.7876 25.0000
.0052
------- Effect Size Results Total
------Mean ES
SE
-95%CI
+95%CI
Z
P
k
Total
-.0473
.0197
-.0859
-.0088 -2.4073
.0161 66.0000
------- Effect Size Results by Group ------Group Mean ES
SE
-95%CI
+95%CI
Z
P
k
.0000
.0224
.0238
-.0242
.0690
.9438
.3453 40.0000
1.0000
-.1980
.0349
-.2665
-.1295 -5.6661
.0000 26.0000
------- Maximum Likelihood Random Effects Variance Component ------v
=
.00653
se(v) =
.00295
Conclusions: Grants: NS effect in favour of women; Fellowships: significant
effect in favour of men (but varies from study-to-study);
Within variance NS but Var Comp significant; some study-to-study variation
remains (particularly in fellowship applications);
MetaF MACRO (ANOVA)
Discipline: 1=Phys 2=Biomed 3=SocSc 4=Mult 5=human
METAF ES=logor /W=w /group = Disc /MODEL=ML .
Q
df
p
Between
21.8346
4.0000
.0002
Within
78.7070
61.0000
.0631
Total
100.5417
65.0000
.0031
------- Q by Group ------Group
Qw
df
p
1.0000 32.8674 13.0000
.0018
2.0000 27.2003 25.0000
.3460
3.0000
6.5260 10.0000
.7693
4.0000 11.6156 12.0000
.4770
5.0000
.4977
1.0000
.4805
------- Effect Size Results Total
------Mean ES
SE
-95%CI
+95%CI
Z
P
k
Total
-.0527
.0212
-.0943
-.0110 -2.4799
.0131 66.0000
------- Effect Size Results by Group ------Group Mean ES
SE
-95%CI
+95%CI
Z
P
k
1.0000
-.0382
.0647
-.1650
.0886
-.5908
.5546 14.0000
2.0000
-.1403
.0344
-.2077
-.0730 -4.0833
.0000 26.0000
3.0000
-.3285
.1070
-.5381
-.1188 -3.0710
.0021 11.0000
4.0000
.0383
.0312
-.0228
.0994
1.2281
.2194 13.0000
5.0000
.0404
.2694
-.4876
.5685
.1500
.8807
2.0000
------- Maximum Likelihood Random Effects Variance Component ------v
=
.00850
se(v) =
.00358
Conclusions: Significant differences in favour of men for biomedical (2) and
Social Sciences (3); other disciplines NS
MetaF MACRO (ANOVA): Country:
1=Australia 2=Canada 3=Germany 4=Europe 5=Netherlands 6=Sweden 7=UK 8=USA
METAF ES=logor
/W=w /group = cntry /MODEL=ML
------- Analog ANOVA table (Homogeneity Q)
------Q
df
p
Between
35.4511
7.0000
.0000
Within
74.3872
58.0000
.0723
Total
109.8383
65.0000
.0004
------- Q by Group ------Group
Qw
df
1.0000
6.5832
12.0000
2.0000
1.3786
2.0000
3.0000
14.2938
14.0000
4.0000
12.1417
6.0000
5.0000
12.8904
4.0000
6.0000
.0000
.0000
7.0000
12.0012
9.0000
8.0000
15.0982
11.0000
p
.8839
.5019
.4281
.0589
.0118
-9.0000
-9.0000
-9.0000
------- Effect Size Results Total
Mean ES
SE
-95%CI
Total
-.0472
.0196
-.0856
------+95%CI
-.0087
------- Effect Size Results by Group ------Group
Mean ES
SE
-95%CI
+95%CI
1.0000
-.0245
.0877
-.1965
.1474
2.0000
-.1390
.0848
-.3052
.0271
3.0000
-.1843
.0419
-.2665
-.1022
4.0000
-.1767
.0572
-.2888
-.0646
5.0000
-.1762
.2029
-.5738
.2214
6.0000
-1.4289
.6013
-2.6074
-.2503
7.0000
.0539
.0719
-.0870
.1948
8.0000
.0460
.0285
-.0098
.1018
Z
-2.4047
Z
-.2795
-1.6400
-4.3988
-3.0890
-.8686
-2.3763
.7502
1.6158
P
.0162
P
.7799
.1010
.0000
.0020
.3851
.0175
.4531
.1061
k
66.0000
k
13.0000
3.0000
15.0000
7.0000
5.0000
1.0000
10.0000
12.0000
Sweden
------- Maximum Likelihood Random Effects Variance Component ------v
=
.00649
se(v)
=
.00294
Conclusions: BIG difference in favour of men in Sweden; smaller differences
in favour of men in Germany and Europe; NS differences for other countries.
Website Address to get MLWIN
Harvey Goldstein
developed the
MLWIN statistical
package used here
and has made
many contributions
to multilevel
modeling, including
meta-analysis.
Setting Up Meta-analysis
2
3
4
1. Click on the equation
2. make logOR the “y”
variable
3. indicate a three level
model with L3=study,
L2=id, L3=LogOR
4. Click “done” button
Setting Up Meta-analysis
1
1.Click “Cons” in the
equation
2 2.Tick “Fixed
Parameter”
“(study)” & “i(d)”
but not “logOR”
3.Click
the
“done”
3
button
Setting Up Meta-analysis
23
4
1
1.Now click “add term” button
2. This will bring up the “X-Variable” select SE
(the standard error computed earlier)
3.Tick only the “logOR” box
4. Click “done”
Setting Up Meta-analysis
1
1
Now we want to constrain the variance at
level 1 to be fixed at 1.0. Under “model”
select “constrain parameters”; will bring
up “parameter constraint” window
Setting Up Meta-analysis
6
2&3
1
5
4
In the parameter constraint window:
1. Click the “random” button; 2.Change “logOR: SE/SE” to 1; 3.
Change “to equal” to 1”; 4. “store” the constraints in the first
empty column (here “C27”); 5. Click the “attach random
constraints” button; 6. Close “Parameter Constraint” Window
“null” model with no predictors
Conclusion:
The mean effect
size (-.101/.040)
is significant.
The chi-square
(389.88) is signif;
there is study-tostudy variation.
explore moderator
variables
->pred c50->calc c51=(('logOR'-c50)/'se')**2->sum c51 to b1 = 389.88
->cprob b1 65 = 5.6052e-045
After Closing the “parameter constraint” window (last slide)
Click on “start” button in “equation” window (may have to click
estimates button to get values). Compute chi-square value in
command interface window
Add “Type” (0=grant, 1=fellow)
->pred c50->calc c51=(('logOR'-c50)/'se')**2->sum c51 to b1= 171.34
->cprob b1 64 1.4376e-011
Conclusion:
The effect of type (-.196/.052) is highly significant
The mean effect size (-.007/.034) NS for Type = grant (intercept)
chi-sq (171.34) signif; remaining study-to-study variation.
Add “DISC”:
1=Phys 2=Biomed 3=SocSc *4=Mult 5=human
->pred c50->calc c51 = (('logor' - c50)/'se')**2->sum c51 to b1 = 188.59
->cprob b1 61 = .1875e-014
Conclusion: The effect of DISC is highly significant (change
in chi-sq = 389.88 -188.59 = 200.29 (df = 4). Men
signif more successful than women in SocSci (relative to
multidis, the reference category that is NS.
Add “CNTRY”:
1=Australia 2=Canada 3=Germany 4=Europe 5=Netherlands 6=Sweden 7=UK 8=USA
>pred c50->calc c51 = (('logor'-c50)/'se')**2->sum c51 to b1 = 158.76
->cprob b1 58 = 1.7144e-011
Conclusion: The effect of CNTRY is highly significant (change in
chi-sq = 389.88 -158.76 = 189.59 (df = 7). Men signif more
successful is Swenden (but note large SE) and Germany relative to
US (reference category which is NS).
Add “Year”
->pred c50->calc c51 = (('logor' - c50)/'se')**2->sum c51 to b1 = 344.02
->cprob b1 64 = 6.6568e-038
Conclusion: The Linear effect of YEAR is NS. Notice that I changed
the intercept to be 2000 (rather than “0” – which is completely
out of the range.
Add “Type” & “DISC”:
1=Phys 2=Biomed 3=SocSc *4=Mult 5=human
Note that solution is technically improper (study level constrained to be non-negative)
->pred c50->calc c51 = (('logor' - c50)/'se')**2->sum c51 to b1 = 105.47
->cprob b1 60 = 0.00026315
Conclusion: General pattern of results for each variable
considered separately still evident. Reference category
(Type = grants, Disc = Multi) still NS. Results should be
interpreted cautiously because improper solution.
Add “Type” x “DISC” Interact
1=Phys 2=Biomed 3=SocSc *4=Mult 5=human
Note that solution is technically improper (study level constrained to be non-negative)
->pred c50->calc c51 = (('logor' - c50)/'se')**2->sum c51 to b1= 103.80
->cprob b1 56 = 0.00010859
Conclusion: The change in chi-sq is NS, suggesting that
there is no interaction. Results should be interpreted
cautiously because improper solution.
Add “Type” & “CNTRY”:
1=Australia 2=Canada 3=Germany 4=Europe 5=Netherlands 6=Sweden 7=UK 8=USA
Improper 3-level solution as study-level variance component negative ( constrained non-negative).
->calc c51 = (('logor' - c50)/'se')**2->sum c51 to b1 = 99.068 ->cprob b1 57 = 0.00046801
Conclusion: General pattern of results similar. Men signif more
successful is Sweden (but note large SE) and Germany relative to
reference category (US Grants).
Type x CNTRY Interaction
1=Australia 2=Canada 3=Germany 4=Europe 5=Netherlands 6=Sweden 7=UK 8=USA
->pred c50->calc c51 = (('logor' - c50)/'se')**2->sum c51 to b1 = 96.890
->cprob b1 50 = 7.8690e-005
Conclusion: The change in chi-sq is NS, suggesting that
there is no interaction. Results should be interpreted
cautiously because improper solution.
Main Effects of Type, Disc & Country
-->pred c50->calc c51 = (('logor' - c50)/'se')**2->sum c51 to b1 92.164 >cprob b1 53 0.00069082
Conclusion: When all main effects are included, Type
effect nearly unaffected. However, none of the disc effects
are significant, although the Sweden and (marginally)
Germany are still significant. Results should be
interpreted cautiously because improper solution.
Graphs: Caterpillar Plots
Caterpillar plot based on L1 residuals.
Go to the “model” menu and select “residuals” option. This will bring up
the “settings” window. Set “SD (comparative)” to 1.96; 3. Set “level” to
“1logOR”; 4. click the “Calc” button; 5. click on the “plot” button to
bring up the next window. In the “plot” window select “residual +/1
1.96SD x rank. This brings up the original graph. Clicking on the graph
bring up a window to modify the graph (a bit)
 The mean effect size was very small, but significantly
in favour of men. However, the results did not
generalise across studies (there was study-to-study
variation).
 The effect size was significantly moderated by the
type; it was almost exactly 0 for grants and in favour
of men for fellowship applications. This difference was
not moderated or mediated by other moderators.
 There appeared to be some discipline effects (bias in
favour of men in social sciences) and country effects
(large bias in favour of men for Sweden). However,
when all “main” effects included, discipline effects
disappeared.
 For Grant Proposals there was no evidence of any
effect of gender on outcome.
Purpose-built
 Comprehensive Meta-analysis (commercial)
 Schwarzer (free, http://userpage.fuberlin.de/~health/meta_e.htm)
Extensions to standard statistics packages
 SPSS, Stata and SAS macros, downloadable from




http://mason.gmu.edu/~dwilsonb/ma.html
Stata add-ons, downloadable from
http://www.stata.com/support/faqs/stat/meta.html
HLM – V-known routine
MLwiN
MPlus
 Bornmann, L. (2007). Bias cut. Women, it seems, often get a raw
deal in science—So how can discrimination be tackled? Nature,
445(7127), 566.
 Bornmann, L., Mutz, R. & Daniel, H. D. (2007). Gender differences
in grant peer review: A meta-analysis. Journal of Informetrics, 1,
226–238.
 Cooper, H., & Hedges, L. V. (Eds.) (1994). The handbook of
research synthesis (pp. 521–529). New York: Russell Sage
Foundation.
 Hox, J. (2003). Applied multilevel analysis. Amsterdam: TT
Publishers.
 Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis:
Correcting error and bias in research findings. Newbury Park:
Sage Publications.
 Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis.
Thousand Oaks, CA: Sage Publications.
Download