presentation_of_publication_bias

advertisement
International Initiative for Impact Evaluation
Publication bias in impact
evaluation: evidence from a
systematic review of farmer field
schools
Hugh Waddington, 3ie
Hugh Waddington
www.3ieimpact.org
Acknowledgements
• Jorge Hombrados, J-PAL Latin America (co-author)
• Birte Snilstveit, co-PI on farmer field school review
• FFS co-authors: Martina Vojtkova, Daniel Phillips
• Presentation based on training on publication bias
provided by Emily Tanner-Smith, Campbell
Collaboration
Hugh Waddington
www.3ieimpact.org
“The haphazard way we individually and
collectively study the fragility of inferences
leaves most of us unconvinced that any
inference is believable... It is important we
study fragility in a much more systematic
way”
Edward Leamer: Let’s take the con out of
econometrics, AER 1983
Hugh Waddington
www.3ieimpact.org
What is publication bias?
•
Publication bias refers to bias that occurs when research found in the
published literature is systematically unrepresentative of the population
of studies (Rothstein et al., 2005)
•
On average published studies have a larger mean effect size than
unpublished studies, providing evidence for a publication bias (Lipsey and
Wilson 1993)
•
Also referred to as the ‘file drawer’ problem:
“…journals are filled with the 5% of studies that show Type I errors, while the
file drawers back at the lab are filled with the 95% of the studies that show nonsignificant (e.g. p < 0.05) results” (Rosenthal, 1979)
•
Well-documented in other fields of research (biomedicine, public health,
education, crime & justice, social welfare) – entertaining overviews in Ben
Goldacre’s Bad Science and Bad Pharma
Hugh Waddington
www.3ieimpact.org
Types of reporting biases
Definition
Publication bias
The publication or non-publication of research findings, depending
on the nature and direction of results
Time lag bias
The rapid or delayed publication of research findings, depending on
the nature and direction of results
Multiple publication bias
The multiple or singular publication of research findings, depending
on the nature and direction of results
Location bias
The publication of research findings in journals with different ease of
access or levels of indexing in standard databases, depending on the
nature and direction of results
Citation bias
The citation or non-citation publication of research findings,
depending on the nature and direction of results
Language bias
The publication of research findings in a particular language,
depending on the nature and direction of results
Outcome reporting bias
The selective reporting of some outcomes but not others, depending
on the nature and direction of results
Source: Sterne et al. (Eds.) (2008: 298)
Hugh Waddington
www.3ieimpact.org
How much of a problem is it likely to be in
international development research?
• ‘Exploratory’ research tradition in social sciences suggests potentially
severe problems of file drawer effects
• Publication bias may be partly mitigated by tradition of publishing
‘working papers’ and modern electronic dissemination
• File drawer effects arguably more problematic for observational data
(and small sample intervention studies)
• Testing for publication bias usually relies on testing for ‘small study
effects’; but biases due to small study effects may also result from other
factors
=> But we need more evidence since very little development research
has addressed this topic
Hugh Waddington
www.3ieimpact.org
Farmer field schools
Hugh Waddington
www.3ieimpact.org
•
FFS originally associated with FAO
and Integrated Pest Management
(IPM)
•
Originated in response to the
overuse of pesticides in irrigated
rice systems in Asia
•
Belief that farmers need confidence
to reduce dependence on
pesticides, through ‘discovery
learning’
•
Aim to promote use of good
practices and improve agriculture
and other outcomes
•
Now applied globally in 90+
countries, millions of beneficiaries,
range of crops and curricula
A ‘best practice’ FFS
(c)Hugh
JM Micaud
Waddington
for FAO
www.3ieimpact.org
•
Group of 25 farmers, meeting once
a week in a designated field during
the growing season
•
Exploratory: facilitator encourages
farmers to ask questions, and to
seek answers, rather than lecturing
or giving recommendations.
•
Experimentation: group manages
two plots
•
Participatory: emphasis on social
learning with exercises to build
group dynamics
•
Field days and follow-up activities
may be provided for diffusion of
message to neighbours
3ie review motivated by polarised debate
• "Studies reported substantial and consistent reductions
in pesticide use attributable to the effect of training. In a
number of cases, there was also a convincing increase
in yield due to training.... Results demonstrated
remarkable, widespread and lasting developmental
impacts” (Van den Berg 2004, FAO)
• “The analysis, employing a modified ‘difference-indifferences’ model, indicates that the program did not
have significant impacts on the performance of
graduates and their neighbors” (Feder et al. 2004)
• But how good are they really - what does a systematic
review of the evidence say?
Hugh Waddington
www.3ieimpact.org
3ie’s review objectives and background
• Produce high quality review of relevance to
decision makers
• Mixed methods review of effects on outcomes
along causal chain and barriers and facilitators of
change
• Peer review managed by Campbell Collaboration
• Discussion with FAO led to inclusion of wide range
of impact evaluation research being included in the
effectiveness review
Hugh Waddington
www.3ieimpact.org
Large body of evidence found
• 3ie systematic review found 93 separate ‘impact evaluations’ in LMICs
•
•
•
Experimental, quasi-experimental with controlled comparison (no treatment,
pipeline, other intervention) were included
Wide variation in attribution methods used: no RCTs, quasi-experiments of
varying quality
Small samples: 400 farmers on average (sample size ranges from 24 to
3,000), often in only a handful of villages, and short follow-up periods
(usually less than 2 years)
• Studies collected measuring outcomes across causal chain:
–
–
–
–
Knowledge
Adoption
Agriculture outcomes (yields, net revenues)
Health, environment, empowerment outcomes
• Analysis today focuses today on impacts on yields for FFS participants:
usually self-reported weight of production per unit of area
Hugh Waddington
www.3ieimpact.org
Study
Ali and Sharif, 2011
Birthal et al., 2000
Carlberg et al., 2012
Cavatassi et al., 2011
Davis et al., 2012
Study characteristics
Crop
Cotton
Cotton
Other staple/veg.
Other staple/veg.
Other staple/veg.
Dinpanah et al., 2010
Feder et al., 2004
Gockowski et al., 2010
Hiller et al., 2009
Huan et al., 1999
Region (country)
SA (Pakistan)
SA (India)
SSA (Ghana)
LAC (Ecuador)
SSA (Kenya,
Tanzania)
MENA (Iran)
EAP (Indonesia)
SSA (Ghana)
SSA (Kenya)
EAP (Vietnam)
Rice
Rice
Tree crop
Tree crop
Rice
Yield outcome measure
Yield (kg per ha)
Value of Yield (value per ha)
Yield (50 kg bags per acre 2010).
Yield (kg per ha)
Value of Yield (growth rate in value local currency per
acre)
Yield (ton per ha)
Yield (growth rate in yield, kg per ha)
Sales (quantity of produce sold in 2004/05 season)
Yield (growth rate in yield, kg per acre)
Yield (ton per ha)
Khan et al., n.d.
Labarta, 2005
Mutandwa & Mpangwa, 2004
SA (Pakistan)
LAC (Nicaragua)
SSA (Zimbabwe)
Cotton
Other staple/veg.
Cotton
Yield (growth rate in yield, kg per ha)
Yield (per ha)
Yield (number of bales)
Naik et al., 2008
Orozco Cirilo et al., 2008
Palis, 1998
SA (India)
LAC (Mexico)
EAP (Philippines)
Other staple/veg.
Other staple/veg.
Rice
Yield (quintals of produce)
Yield (growth rate in ton per ha)
Yield (growth rate in ton per ha)
Pananurak, 2010
EAP (China)
SA (India, Pakistan)
Cotton
Yield (growth rate in kg per ha)
Pande et al., 2009
SA (Nepal)
Rice
Rejesus et al., 2010
EAP (Vietnam)
Rice
Todo & Takahashi, 2011
SSA (Ethiopia)
Other staple/veg.
Van den Berg et al., 2002
SA (Sri Lanka)
Rice
Van Rijn, 2010
LAC (Peru)
Tree crop
Wandji et al., 2007
SSA (Cameroon)
Tree crop
Wu Lifeng, 2010
EAP (China)
Cotton
Yang et al., 2005
EAP (China)
Cotton
Yamazaki and Resosudarmo,
EAP (Indonesia)
Rice
2007Hugh Waddington
www.3ieimpact.org
Zuger, 2004
LAC (Peru)
Other staple/veg.
Yields (ton/ha)
Yields (growth rate in tonnes per ha)
Value of production (growth rate, in Eth birr)
Yield (kg per ha)
Yield (kg per ha, 2007)
Sales (Kg of cocoa sold in the 2004-05 season)
Yield (growth rate in kg per ha)
Yield (kg per ha)
Yield (growth rate in kg per ha)
Yield (ton per ha)
Unit of analysis is the study-level ‘effect size’
• ‘Response ratio’ effect size calculated for each study:
Yt
RR 
Yc
or
Yc  
Yc
• RR standard error calculations:
SERR
 1

1

 Sp

 ntYt 2 ncYc 2 


Hugh Waddington
www.3ieimpact.org
or
ln( RR)
exp
t
Before we turn to examination of
publication bias, here’s some summary
results from the meta-analysis of
outcomes along the causal chain
Hugh Waddington
www.3ieimpact.org
Input 1 Training
of trainers
Input 2 Field
school
Farmer field school
stylised causal chain
Knowledge
(FFS
participants)
Knowledge
(FFS
neighbours)
Adoption
(FFS
participants)
Adoption
(FFS
neighbours)
Measured impacts:
Hugh Waddington
Yield, income/net
revenue, empowerment,
environment & health
www.3ieimpact.org
outcomes
Knowledge of ‘improved’ farming practices
Study
ID
ES (95% CI)
FFS participants
Huan et al., 1999 (Vietnam)
0.02 (-0.06, 0.10)
Endalew, 2009 (Ethiopia)
0.27 (-0.06, 0.60)
Price et al., 2001 (Philippines)
0.42 (-0.17, 1.01)
Rao et al., 2012 (India)
0.43 (-0.02, 0.87)
Reddy & Suryamani, 2005 (India)
0.45 (-0.04, 0.94)
Mutandwa & Mpangwa, 2004 (Zimbabwe)
0.59 (0.25, 0.92)
Dinpanah et al., 2010 (Iran)
0.67 (0.41, 0.92)
Khan et al., 2007 (Pakistan)
0.79 (0.29, 1.29)
Bunyatta et al., 2006 (Kenya)
1.03 (0.65, 1.41)
Erbaugh, 2010 (Uganda)
1.14 (0.93, 1.34)
Rebaudo & Dangles, 2011 (Ecuador)
1.79 (1.17, 2.41)
Subtotal (I-squared = 93.9%, p = 0.000)
0.67 (0.33, 1.02)
.
FFS neighbours
Khan et al., 2007 (Pakistan)
-0.13 (-0.68, 0.42)
Reddy & Suryamani, 2005 (India)
0.05 (-0.45, 0.56)
Ricker-Gilbert et al, 2008 (Bangladesh)
0.17 (-0.25, 0.59)
Rebaudo & Dangles, 2011 (Ecuador)
0.38 (-0.15, 0.91)
Subtotal (I-squared = 0.0%, p = 0.610)
0.13 (-0.12, 0.37)
.
NOTE: Weights are from random effects analysis
-.5
Hugh Waddington
www.3ieimpact.org
0
.5
1
Favours intervention
3
Study
ID
Pesticide demand
ES (95% CI)
FFS participants
Yamazaki & Resosudarmo, 2007 (Indonesia)
Birthal et al., 2000 (India)
Yang et al., 2005 (China)
Yorobe & Rejesus, 2011 (Philippines)
Yang et al., 2005 (China)
Khan et al., 2007 (Pakistan)
Khalid, n.d. (Sudan)
Rejesus et al, 2010 (Vietnam)
Pananurak, 2010 (India)
Mutandwa & Mpangwa, 2004 (Zimbabwe)
Pananurak, 2010 (Pakistan)
Amera, 2008 (Kenya)
Pananurak, 2010 (China)
Mancini et al., 2008 (India)
Wu Lifeng, 2010 (China)
Huan et al., 1999 (Vietnam)
Van den Berg et al., 2002 (Sri Lanka)
Praneetvatakul & Waibel, 2006 (Thailand)
Murphy et al., 2002 Vietnam)
Cole et al., 2007 (Ecuador)
Ali & Sharif, 2011 (Pakistan)
Khan et al., 2007 (Pakistan)
Labarta, 2005 (Nicaragua)
Feder et al, 2004 (Indonesia)
Cavatassi et al., 2011 (Ecuador)
Friis-Hansen et al., 2004 (Uganda)
Subtotal (I-squared = 93.2%, p = 0.000)
.
FFS neighbours
Pananurak, 2010 (India)
Khan et al., 2007 (Pakistan)
Yamazaki & Resosudarmo, 2007 (Indonesia)
Wu Lifeng, 2010 (China)
Pananurak, 2010 (Pakistan)
Labarta, 2005 (Nicaragua)
Pananurak, 2010 (China)
Praneetvatakul & Waibel, 2006 (Thailand)
Khan et al., 2007 (Pakistan)
Feder et al, 2004 (Indonesia)
Subtotal (I-squared = 84.6%, p = 0.000)
.
NOTE: Weights are from random effects analysis
.1
Hugh Waddington
.25
.5
Favours intervention
www.3ieimpact.org
1
2
0.20
0.21
0.32
0.37
0.41
0.46
0.48
0.52
0.52
0.57
0.59
0.61
0.65
0.67
0.71
0.72
0.82
0.82
0.83
0.88
0.90
0.91
0.95
1.30
1.34
1.42
0.66
(0.01,
(0.17,
(0.21,
(0.18,
(0.36,
(0.39,
(0.31,
(0.24,
(0.30,
(0.36,
(0.41,
(0.52,
(0.50,
(0.46,
(0.64,
(0.62,
(0.74,
(0.68,
(0.75,
(0.68,
(0.75,
(0.28,
(0.39,
(1.08,
(0.99,
(1.09,
(0.56,
3.23)
0.26)
0.48)
0.78)
0.46)
0.54)
0.75)
1.12)
0.92)
0.89)
0.87)
0.71)
0.84)
0.97)
0.80)
0.84)
0.90)
0.98)
0.93)
1.13)
1.09)
2.94)
2.34)
1.57)
1.80)
1.86)
0.78)
0.54
0.61
0.67
0.68
0.78
0.99
1.11
1.15
1.20
1.30
0.88
(0.25,
(0.51,
(0.12,
(0.62,
(0.40,
(0.42,
(0.69,
(0.92,
(0.40,
(1.09,
(0.68,
1.15)
0.74)
3.88)
0.76)
1.49)
2.33)
1.79)
1.43)
3.53)
1.55)
1.14)
Yields
Study
ID
ES (95% CI)
FFS participants
Pananurak, 2010 (India)
Van Rijn, 2010 (Peru)
Naik et al., 2008 (India)
Huan et al., 1999 (Vietnam)
Labarta, 2005 (Nicaragua)
Rejesus et al, 2010 (Vietnam)
Feder et al, 2004 (Indonesia)
Wu Lifeng, 2010 (China)
Ali & Sharif, 2011 (Pakistan)
Pananurak, 2010 (China)
Gockowski et al., 2010 (Ghana)
Yang et al., 2005 (China)
Hiller et al., 2009 (Kenya)
Khan et al., 2007 (Pakistan)
Gockowski et al., 2010 (Ghana)
Cavatassi et al., 2011 (Ecuador)
Davis et al, 2012 (Tanzania)
Birthal et al., 2000 (India)
Pananurak, 2010 (Pakistan)
Dinpanah et al., 2010 (Iran)
Wandji et al., 2007 (Cameroon)
Mutandwa & Mpangwa, 2004 (Zimbabwe)
Palis, 1998 (Philippines)
Zuger 2004 (Peru)
Carlberg et al., 2012 (Ghana)
Yamazaki & Resosudarmo, 2007 (Indonesia)
Van den Berg et al., 2002 (Sri Lanka)
Davis et al, 2012 (Kenya)
Pande et al., 2009 (Nepal)
Gockowski et al., 2010 (Ghana)
Dinpanah et al., 2010 (Iran)
Orozco Cirilo et al., 2008 b) (Mexico)
Todo & Takahashi, 2011 (Ethiopia)
Subtotal (I-squared = 92.7%, p = 0.000)
.
FFS neighbours
Pananurak, 2010 (India)
Khan et al., 2007 (Pakistan)
Feder et al, 2004 (Indonesia)
Labarta, 2005 (Nicaragua)
Pananurak, 2010 (China)
Wu Lifeng, 2010 (China)
Pananurak, 2010 (Pakistan)
Yamazaki & Resosudarmo, 2007 (Indonesia)
Subtotal (I-squared = 49.5%, p = 0.054)
.
NOTE: Weights are from random effects analysis
.5
1
2
3
Favours intervention
Hugh Waddington
www.3ieimpact.org
0.80
0.86
0.89
0.95
0.97
0.97
0.98
1.08
1.09
1.09
1.14
1.15
1.17
1.17
1.18
1.22
1.23
1.24
1.24
1.32
1.32
1.36
1.36
1.44
1.58
1.67
1.68
1.81
2.11
2.16
2.52
2.62
2.71
1.24
(0.61,
(0.63,
(0.83,
(0.92,
(0.92,
(0.72,
(0.96,
(1.03,
(1.03,
(1.04,
(1.03,
(0.94,
(0.53,
(0.97,
(1.07,
(0.97,
(1.00,
(1.13,
(1.01,
(1.22,
(1.07,
(1.06,
(0.97,
(1.09,
(1.19,
(1.23,
(1.30,
(1.15,
(1.25,
(0.99,
(2.05,
(2.23,
(1.11,
(1.16,
1.05)
1.18)
0.96)
0.98)
1.02)
1.31)
1.01)
1.14)
1.15)
1.14)
1.25)
1.41)
2.56)
1.42)
1.30)
1.53)
1.51)
1.36)
1.54)
1.42)
1.63)
1.73)
1.92)
1.92)
2.10)
2.26)
2.18)
2.84)
3.56)
4.69)
3.11)
3.08)
6.60)
1.32)
0.79
0.97
0.99
1.00
1.02
1.03
1.03
1.43
1.01
(0.63,
(0.74,
(0.97,
(0.99,
(0.98,
(0.99,
(0.86,
(1.05,
(0.98,
1.00)
1.26)
1.01)
1.01)
1.07)
1.08)
1.25)
1.96)
1.03)
Study
ID
Net revenues (income less costs)
ES (95% CI)
FFS participants
Labarta, 2005 (Nicaragua)
Pananurak, 2010 (India)
Waarts et al., 2012 (Kenya)
Pananurak, 2010 (China)
Pananurak, 2010 (Pakistan)
Naik et al., 2008 (India)
Van de Fliert 2000 (Indonesia)
Van den Berg et al., 2002 (Sri Lanka)
Yang et al., 2005 (China)
Khan et al., 2007 (Pakistan)
Subtotal (I-squared = 57.1%, p = 0.013)
.
FFS training + input/marketing support
Birthal et al., 2000 (India)
Van Rijn, 2010 (Peru)
Cavatassi et al., 2011 (Ecuador)
Palis, 1998 (Philippines)
Subtotal (I-squared = 96.2%, p = 0.000)
.
FFS neighbours
Pananurak, 2010 (India)
Pananurak, 2010 (China)
Pananurak, 2010 (Pakistan)
Labarta, 2005 (Nicaragua)
Khan et al., 2007 (Pakistan)
Subtotal (I-squared = 0.0%, p = 0.706)
.NOTE: Weights are from random effects analysis
.2
Hugh Waddington
.5
1 2 3
Favours intervention
www.3ieimpact.org
0.28
1.06
1.14
1.17
1.23
1.25
1.31
1.41
1.53
3.40
1.28
(0.02,
(0.68,
(0.92,
(1.08,
(1.09,
(1.09,
(1.11,
(1.19,
(1.10,
(1.94,
(1.17,
3.48)
1.66)
1.41)
1.27)
1.40)
1.42)
1.55)
1.67)
2.15)
5.97)
1.41)
1.43
2.00
3.34
4.61
2.57
(1.19,
(1.02,
(1.56,
(3.83,
(1.18,
1.72)
3.94)
7.15)
5.56)
5.58)
0.93
1.07
1.13
1.39
1.51
1.08
(0.66,
(1.00,
(1.01,
(0.66,
(0.51,
(1.03,
1.32)
1.14)
1.26)
2.92)
4.45)
1.15)
Detecting publication bias
• The only direct evidence for publication
bias is through comparison of published
and unpublished study results
• But there are also ways of assessing
likelihood of publication bias directly and
indirectly
– Assess reporting biases in each study
– Statistical analysis based on sample size
Hugh Waddington
www.3ieimpact.org
“An ounce of prevention is worth a pound of cure”
Sources of grey literature:
(1) Multidisciplinary: Google, Google Scholar
(2) International development specific: JOLIS, BLDS and
ELDIS (Institute of Development Studies)
(3) Good sources for impact evaluations: J-PAL/IPA
databases, 3ie’s database of impact evaluations
(4) Subject-specific, e.g. IDEAS/Repec for economics,
ERIC for education, LILACS for Latin American health
publications, ALNAP for humanitarian
(5) Conference proceedings, technical reports (research,
governmental agencies), organization websites,
dissertations, theses, contact with primary researchers
Hugh Waddington
www.3ieimpact.org
Meta-analysis of studies by publication status: journal v other
Study
ID
ES (95% CI)
Not published in journal
Pananurak, 2010 (India)
Van Rijn, 2010 (Peru)
Naik et al., 2008 (India)
Labarta, 2005 (Nicaragua)
Wu Lifeng, 2010 (China)
Pananurak, 2010 (China)
Hiller et al., 2009 (Kenya)
Pananurak, 2010 (Pakistan)
Wandji et al., 2007 (Cameroon)
Zuger 2004 (Peru)
Carlberg et al., 2012 (Ghana)
Van den Berg et al., 2002 (Sri Lanka)
Pande et al., 2009 (Nepal)
Todo & Takahashi, 2011 (Ethiopia)
Subtotal (I-squared = 84.2%, p = 0.000)
.
Published in journal
Huan et al., 1999 (Vietnam)
Rejesus et al, 2010 (Vietnam)
Feder et al, 2004 (Indonesia)
Ali & Sharif, 2011 (Pakistan)
Gockowski et al., 2010 (Ghana)
Yang et al., 2005 (China)
Khan et al., 2007 (Pakistan)
Gockowski et al., 2010 (Ghana)
Cavatassi et al., 2011 (Ecuador)
Davis et al, 2012 (Tanzania)
Birthal et al., 2000 (India)
Dinpanah et al., 2010 (Iran)
Mutandwa & Mpangwa, 2004 (Zimbabwe)
Palis, 1998 (Philippines)
Yamazaki & Resosudarmo, 2007 (Indonesia)
Davis et al, 2012 (Kenya)
Gockowski et al., 2010 (Ghana)
Dinpanah et al., 2010 (Iran)
Orozco Cirilo et al., 2008 b) (Mexico)
Subtotal (I-squared = 95.0%, p = 0.000)
.
Overall (I-squared = 92.7%, p = 0.000)
(0.61,
(0.63,
(0.83,
(0.92,
(1.03,
(1.04,
(0.53,
(1.01,
(1.07,
(1.09,
(1.19,
(1.30,
(1.25,
(1.11,
(1.04,
1.05)
1.18)
0.96)
1.02)
1.14)
1.14)
2.56)
1.54)
1.63)
1.92)
2.10)
2.18)
3.56)
6.60)
1.24)
0.95
0.97
0.98
1.09
1.14
1.15
1.17
1.18
1.22
1.23
1.24
1.32
1.36
1.36
1.67
1.81
2.16
2.52
2.62
1.30
(0.92,
(0.72,
(0.96,
(1.03,
(1.03,
(0.94,
(0.97,
(1.07,
(0.97,
(1.00,
(1.13,
(1.22,
(1.06,
(0.97,
(1.23,
(1.15,
(0.99,
(2.05,
(2.23,
(1.19,
0.98)
1.31)
1.01)
1.15)
1.25)
1.41)
1.42)
1.30)
1.53)
1.51)
1.36)
1.42)
1.73)
1.92)
2.26)
2.84)
4.69)
3.11)
3.08)
1.42)
1.24 (1.16, 1.32)
NOTE: Weights are from random effects analysis
.5
Hugh Waddington
0.80
0.86
0.89
0.97
1.08
1.09
1.17
1.24
1.32
1.44
1.58
1.68
2.11
2.71
1.14
www.3ieimpact.org
1
2
3
Favours intervention
Assess likelihood of file-drawer effects in each
study
• Is there evidence that results have been
reported selectively:
– outcomes not reported despite data collected (or
indicated in methods section, or reported in study
protocol if available)?
– existence of studies reporting other outcomes?
• Have outcomes been constructed in a way
which is uncommon which might suggest biased
exploratory research?
Hugh Waddington
www.3ieimpact.org
Risk of bias (including file drawer effects)
assessment for studies included in meta-analysis
Hugh Waddington
www.3ieimpact.org
Additional evidence for file-drawer effects
• 34% (14/41) of studies which report data on
yields not includable in meta-analysis because
do not provide standard errors or information to
calculate them
• 30% (27/91) of all studies do not provide
information on yields or other agriculture
outcomes (net revenues) despite collecting data
on knowledge/adoption
Hugh Waddington
www.3ieimpact.org
Detecting publication bias statistically
Methods for detecting publication bias assume:
• Large n studies are likely to get published
regardless of results due to time and money
investments
• Medium n studies will have some modest
significant effects that are reported, others
may never be published
• Small n studies with the largest effects are
most likely to be reported, but many will never
be published or will be difficult to locate
Hugh Waddington
www.3ieimpact.org
Funnel Plots
• Exploratory tool used to visually assess the possibility of
publication bias in a meta-analysis
• Scatter plot of effect size (x-axis) against some measure
of study size (y-axis)
• Precision of estimates increases as the sample size of a
study increases
– Estimates from small n studies (i.e., less precise, larger standard
errors) will show more variability in the effect size estimates, thus
a wider scatter on the plot
– Estimates from larger n studies will show less variability in effect
size estimates, thus have a narrower scatter on the plot
• If publication bias is present, we would expect null or
‘negative’ findings from small n studies to be suppressed
(i.e., missing from the plot)
Hugh Waddington
www.3ieimpact.org
Hugh Waddington
www.3ieimpact.org
Farmer field schools – FFS participant yields
.5
.4
.3
.2
.1
0
Funnel plot with pseudo 95% confidence limits
-1
-.5
0
Ln RR
Not published in journal
Lower CI
Pooled
Hugh Waddington
www.3ieimpact.org
.5
Published in journal
Lower CI
1
Tests for Funnel Plot Asymmetry
• Several regression tests are available to test for funnel
plot asymmetry – attempt to overcome subjectivity of
visual funnel plot inspection
• Framed as tests for “small study effects”, or the tendency
for smaller n studies to show greater effects than larger n
studies; i.e., effects aren’t necessarily a result of bias
• Egger test, Peters test (modified Egger test for use with
log odds ratio effect sizes), Begg’s test, selection
modeling (Hedges & Vevea, 2005), failsafe n (not
recommended) (Becker, 2005)
Hugh Waddington
www.3ieimpact.org
Egger Test
• Weighted regression of the effect size on standard error
(w=inverse variance)
ESi  1   0 sei   i
H0 : 0  0
– β0 = 0 indicates a symmetric funnel plot
– β0 > 0 shows less precise (i.e., smaller n) studies yield bigger effects
– Can be extended to include p predictors hypothesized to potentially
explain funnel plot asymmetry (Sterne et al., 2001) (see analysis below)
• Limitations:
– Low power unless there is severe bias and large n
– Inflated Type I error with large treatment effects, rare event data,
or equal sample sizes across studies
– Inflated Type I error with log odds ratio effect sizes
Hugh Waddington
www.3ieimpact.org
Egger test for FFS-participant yields
Egger's publication bias plot
Coef.
t
P>t
-0.047
3.085
-1.70
4.14
0.100
0.000
15
const
slope
standardized effect
10
5
0
-5
0
Hugh Waddington
50
precision
www.3ieimpact.org
100
‘Trim and fill’ analysis (Duval & Tweedie, 2000)
• Iteratively trims (removes) smaller studies causing
asymmetry
• Uses trimmed plot to re-estimate the mean effect size
• Fills (replaces) omitted studies and mirror-images
• Provides an estimate of the number of missing (filled)
studies and a new estimate of the mean effect size
• Major limitations include: misinterpretation of results,
assumption of a symmetric funnel plot, poor performance
in the presence of heterogeneity
Hugh Waddington
www.3ieimpact.org
‘Trim & fill’ for FFS-participant yields
Hugh Waddington
www.3ieimpact.org
Results of meta-trim
95%
lower
Effect
size
95%
upper
Num.
studies
1.16
1.23
1.32
31
Filled meta1.03
analysis
1.10
1.17
40
Metaanalysis
Hugh Waddington
www.3ieimpact.org
Cumulative meta-analysis
• Typically used to update pooled effect size estimate with
each new study cumulatively over time
• Can use as an alternative to update pooled effect size
estimate with each study in order of largest to smallest
sample size
• If pooled effect size does not shift with the addition of
small n studies, provides some evidence against
publication bias
Hugh Waddington
www.3ieimpact.org
Cumulative meta-analysis for FFS-participant yields: studies
ordered by sample size from largest to smallest
Study
ID
ES (95% CI)
Pande et al., 2009 (Nepal)
Huan et al., 1999 (Vietnam)
Danida, 2011 (Bangladesh)
Wu Lifeng, 2010 (China)
Cavatassi et al., 2011 (Ecuador)
Islam et al., 2006 (Bangladesh)
Pananurak, 2010 (China)
Van den Berg et al., 2002 (Sri Lanka)
Kabir & Uphoff, 2007 (Myanmar)
Labarta, 2005 (Nicaragua)
Dinpanah et al., 2010 (Iran)
Dinpanah et al., 2010 (Iran)
Davis et al, 2012 (Kenya)
Davis et al, 2012 (Uganda)
Davis et al, 2012 (Tanzania)
Carlberg et al., 2012 (Ghana)
Yamazaki & Resosudarmo, 2007 (Indonesia)
Feder et al, 2004 (Indonesia)
Gockowski et al., 2010 (Ghana)
Gockowski et al., 2010 (Ghana)
Gockowski et al., 2010 (Ghana)
Ali & Sharif, 2011 (Pakistan)
Todo & Takahashi, 2011 (Ethiopia)
Gockowski et al., 2005 (Nigeria)
Zuger 2004 (Peru)
Waarts et al., 2012 (Kenya)
Van Rijn, 2010 (Peru)
Pananurak, 2010 (Pakistan)
Wandji et al., 2007 (Cameroon)
Mutandwa & Mpangwa, 2004 (Zimbabwe)
Mancini et al., 2008 (India)
Khan et al., 2007 (Pakistan)
Van de Fliert 2000 (Indonesia)
Hiller et al., 2009 (Kenya)
Naik et al., 2008 (India)
Pananurak, 2010 (India)
Birthal et al., 2000 (India)
Birthal et al., 2000 (India)
Orozco Cirilo et al., 2008 b) (Mexico)
Rejesus et al, 2010 (Vietnam)
Palis, 1998 (Philippines)
Torrez et al., 1999 (Bolivia)
Yang et al., 2005 (China)
Williamson et al., 2003 (Kenya)
Williamson et al., 2003 (India)
Jones, n.d. (Sri Lanka)
2.11
1.35
1.35
1.06
1.09
1.09
1.08
1.14
1.14
1.09
1.23
1.25
1.27
1.27
1.27
1.29
1.31
1.24
1.23
1.24
1.23
1.22
1.22
1.22
1.23
1.23
1.22
1.22
1.22
1.23
1.23
1.23
1.23
1.23
1.20
1.19
1.19
1.19
1.24
1.24
1.24
1.24
1.24
1.24
1.24
1.24
.281
Hugh Waddington
www.3ieimpact.org
1
3.56
(1.25,
(0.62,
(0.62,
(0.92,
(0.96,
(0.96,
(0.98,
(1.02,
(1.02,
(1.00,
(1.10,
(1.12,
(1.14,
(1.14,
(1.14,
(1.16,
(1.18,
(1.15,
(1.14,
(1.15,
(1.15,
(1.14,
(1.14,
(1.14,
(1.15,
(1.15,
(1.14,
(1.14,
(1.15,
(1.15,
(1.15,
(1.15,
(1.15,
(1.15,
(1.13,
(1.12,
(1.13,
(1.13,
(1.17,
(1.16,
(1.16,
(1.16,
(1.16,
(1.16,
(1.16,
(1.16,
3.56)
2.95)
2.95)
1.23)
1.25)
1.25)
1.20)
1.27)
1.27)
1.19)
1.38)
1.40)
1.42)
1.42)
1.41)
1.43)
1.45)
1.35)
1.33)
1.34)
1.33)
1.30)
1.31)
1.31)
1.32)
1.32)
1.30)
1.30)
1.30)
1.31)
1.31)
1.30)
1.30)
1.30)
1.28)
1.26)
1.27)
1.27)
1.33)
1.32)
1.32)
1.32)
1.32)
1.32)
1.32)
1.32)
• The evidence for ‘small study effects’ seems strong, but is
this due to publication bias?
• Asymmetry could be due to factors other than publication
bias, e.g.,
– methodological quality (smaller studies with lower quality
may have exaggerated treatment effects)
– Artefactual variation (e.g. outcome measurement)
– Chance
– True heterogeneity due to intervention characteristics
(FFS-type, region, crop, follow-up length)
• Assessing funnel plot symmetry relies entirely on subjective
visual judgment
Hugh Waddington
www.3ieimpact.org
Analysis by study quality
.5
.4
.3
.2
.1
0
Funnel plot with pseudo 95% confidence limits
-1
-.5
0
Ln RR
High risk of bias
Lower CI
Pooled
Hugh Waddington
www.3ieimpact.org
.5
Medium risk of bias
Lower CI
1
Contour Enhanced Funnel Plots
• Based on premise that statistical significance is most
important factor determining publication
• Funnel plot with additional contour lines associated with
‘milestones’ of statistical significance: p = .01, .05, .1
– If studies are missing in areas of statistical nonsignificance, publication bias may be present
– If studies are missing in areas of statistical
significance, asymmetry may be due to factors other
than publication bias
– If there are no studies in areas of statistical
significance, publication bias may be present
• Can help distinguish funnel plot asymmetry due to
publication bias versus other factors (Peters et al., 2008)
Hugh Waddington
www.3ieimpact.org
0
Studies
p < 1%
1% < p < 5%
.1
5% < p < 10%
p > 10%
.2
.3
.4
.5
-1
Hugh Waddington
-.5
0
Effect estimate
www.3ieimpact.org
.5
1
Meta-regression analysis (t-stats reported)
STANDARD ERROR (LN_SE)
1
2
3
4
5
6
7
4.37***
4.33***
3.90***
3.81***
3.21***
3.53***
4.60***
0.52
0.61
0.15
1.37
0.07
1.64
HIGH QUALITY
INTERACTION(HIGH
QUALITY*LN_SE)
-1.83*
FFS+
0.51
YIELD MEASURE DUMMIES
-1.01
0.73
0.35
Yes
REGION DUMMIES
Yes
CROP-TYPE DUMMIES
Yes
ADJ. R-SQU
0.36
0.34
0.33
0.45
0.42
0.29
0.39
N.OBS
33
33
33
33
33
33
33
Specification
7 suggests heterogeneity from small study effects due to study quality
Hugh Waddington
www.3ieimpact.org
Meta-analysis also suggests bias due to study quality
Study
ID
ES (95% CI)
High risk of bias
Naik et al., 2008 (India)
Huan et al., 1999 (Vietnam)
Labarta, 2005 (Nicaragua)
Gockowski et al., 2010 (Ghana)
Yang et al., 2005 (China)
Hiller et al., 2009 (Kenya)
Khan et al., 2007 (Pakistan)
Gockowski et al., 2010 (Ghana)
Birthal et al., 2000 (India)
Dinpanah et al., 2010 (Iran)
Wandji et al., 2007 (Cameroon)
Mutandwa & Mpangwa, 2004 (Zimbabwe)
Palis, 1998 (Philippines)
Zuger 2004 (Peru)
Carlberg et al., 2012 (Ghana)
Van den Berg et al., 2002 (Sri Lanka)
Pande et al., 2009 (Nepal)
Gockowski et al., 2010 (Ghana)
Dinpanah et al., 2010 (Iran)
Orozco Cirilo et al., 2008 b) (Mexico)
Subtotal (I-squared = 94.9%, p = 0.000)
.
Medium
riskrisk
of bias
Medium
of bias
Pananurak, 2010 (India)
Van Rijn, 2010 (Peru)
Rejesus et al, 2010 (Vietnam)
Feder et al, 2004 (Indonesia)
Wu Lifeng, 2010 (China)
Ali & Sharif, 2011 (Pakistan)
Pananurak, 2010 (China)
Cavatassi et al., 2011 (Ecuador)
Davis et al, 2012 (Tanzania)
Pananurak, 2010 (Pakistan)
Yamazaki & Resosudarmo, 2007 (Indonesia)
Davis et al, 2012 (Kenya)
Todo & Takahashi, 2011 (Ethiopia)
Subtotal (I-squared = 81.0%, p = 0.000)
.
Overall (I-squared = 92.7%, p = 0.000)
(0.83,
(0.92,
(0.92,
(1.03,
(0.94,
(0.53,
(0.97,
(1.07,
(1.13,
(1.22,
(1.07,
(1.06,
(0.97,
(1.09,
(1.19,
(1.30,
(1.25,
(0.99,
(2.05,
(2.23,
(1.20,
0.96)
0.98)
1.02)
1.25)
1.41)
2.56)
1.42)
1.30)
1.36)
1.42)
1.63)
1.73)
1.92)
1.92)
2.10)
2.18)
3.56)
4.69)
3.11)
3.08)
1.51)
0.80
0.86
0.97
0.98
1.08
1.09
1.09
1.22
1.23
1.24
1.67
1.81
2.71
1.10
(0.61,
(0.63,
(0.72,
(0.96,
(1.03,
(1.03,
(1.04,
(0.97,
(1.00,
(1.01,
(1.23,
(1.15,
(1.11,
(1.03,
1.05)
1.18)
1.31)
1.01)
1.14)
1.15)
1.14)
1.53)
1.51)
1.54)
2.26)
2.84)
6.60)
1.17)
1.24 (1.16, 1.32)
NOTE: Weights are from random effects analysis
.5
Hugh Waddington
0.89
0.95
0.97
1.14
1.15
1.17
1.17
1.18
1.24
1.32
1.32
1.36
1.36
1.44
1.58
1.68
2.11
2.16
2.52
2.62
1.35
www.3ieimpact.org
1
2
3
Favours intervention
Final thoughts
• Evidence of upwards bias in low quality vs higher quality
quasi-experiments
=> Where ‘relevance’ of review is important for users, careful risk of
bias assessment and sensitivity analysis required
• Study quality appears more important than publication
bias in explaining small study effects, but we do also find
evidence for file drawer effects in the literature
• Statistical tests available are sensitive to number of
effect sizes available and are of limited validity where
sample sizes homogeneous
Hugh Waddington
www.3ieimpact.org
Recommended Reading
• Duval, S. J., & Tweedie, R. L. (2000). A non-parametric ‘trim and fill’
method of accounting for publication bias in meta-analysis. Journal of the
American Statistical Association, 95, 89-98.
• Egger, M., Davey Smith, G., Schneider, M., & Minder, C. (1997). Bias
in meta-analysis detected by a simple, graphical test. British Medical
Journal, 315, 629-634.
• Hammerstrøm, K., Wade, A., Jørgensen, A. K. (2010). Searching for
studies: A guide to information retrieval for Campbell systematic reviews.
Campbell Systematic Review, Supplement 1.
• Harbord, R. M., Egger, M., & Sterne, J. A. C. (2006). A modified test for
small-study effects in meta-analyses of controlled trials with binary
endpoints. Statistics in Medicine, 25, 3443-3457.
• Peters, J. L., Sutton, A. J., Jones, D. R., Abrams, K. R., & Rushton, L.
(2008). Contour-enhanced meta-analysis funnel plots help distinguish
publication bias from other causes of asymmetry. Journal of Clinical
Epidemiology, 61, 991-996.
Hugh Waddington
www.3ieimpact.org
Recommended Reading
•
Rosenthal, R. (1979). The ‘file-drawer problem’ and tolerance for null results.
Psychological Bulletin, 86, 638-641.
•
Rothstein, H. R., Sutton, A. J., & Borenstein, M. L. (Eds). (2005). Publication bias in
meta-analysis: Prevention, assessment and adjustments. Hoboken, NJ: Wiley.
•
Rücker, G., Schwarzer, G., & Carpenter, J. (2008). Arcsine test for publication bias in
meta-analyses with binary outcomes. Statistics in Medicine, 27, 746-763
•
Sterne, J. A., & Egger, M. (2001). Funnel plots for detecting bias in meta-analysis:
Guidelines on choice of axis. Journal of Clinical Epidemiology, 54, 1046-1055.
•
Sterne, J. A. C., Egger, M., & Moher, D. (Eds.) (2008). Chapter 10: Addressing
reporting biases. In J. P. T. Higgins & S. Green (Eds.), Cochrane handbook for
systematic reviews of interventions, pp. 297 – 333. Chichester, UK: Wiley.
•
Sterne, J. A. C., et al. (2011). Recommendations for examining and interpreting
funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ, 343,
d4002.
•
Waddington, H., White, H., Snilstveit, B., Hombrados, J. Vojtkova, M. (2012) How to
do a good systematic review of effects in international development: a tool-kit. Journal of
Development Effectiveness, 4 (3).
Hugh Waddington
www.3ieimpact.org
Download