International Initiative for Impact Evaluation Publication bias in impact evaluation: evidence from a systematic review of farmer field schools Hugh Waddington, 3ie Hugh Waddington www.3ieimpact.org Acknowledgements • Jorge Hombrados, J-PAL Latin America (co-author) • Birte Snilstveit, co-PI on farmer field school review • FFS co-authors: Martina Vojtkova, Daniel Phillips • Presentation based on training on publication bias provided by Emily Tanner-Smith, Campbell Collaboration Hugh Waddington www.3ieimpact.org “The haphazard way we individually and collectively study the fragility of inferences leaves most of us unconvinced that any inference is believable... It is important we study fragility in a much more systematic way” Edward Leamer: Let’s take the con out of econometrics, AER 1983 Hugh Waddington www.3ieimpact.org What is publication bias? • Publication bias refers to bias that occurs when research found in the published literature is systematically unrepresentative of the population of studies (Rothstein et al., 2005) • On average published studies have a larger mean effect size than unpublished studies, providing evidence for a publication bias (Lipsey and Wilson 1993) • Also referred to as the ‘file drawer’ problem: “…journals are filled with the 5% of studies that show Type I errors, while the file drawers back at the lab are filled with the 95% of the studies that show nonsignificant (e.g. p < 0.05) results” (Rosenthal, 1979) • Well-documented in other fields of research (biomedicine, public health, education, crime & justice, social welfare) – entertaining overviews in Ben Goldacre’s Bad Science and Bad Pharma Hugh Waddington www.3ieimpact.org Types of reporting biases Definition Publication bias The publication or non-publication of research findings, depending on the nature and direction of results Time lag bias The rapid or delayed publication of research findings, depending on the nature and direction of results Multiple publication bias The multiple or singular publication of research findings, depending on the nature and direction of results Location bias The publication of research findings in journals with different ease of access or levels of indexing in standard databases, depending on the nature and direction of results Citation bias The citation or non-citation publication of research findings, depending on the nature and direction of results Language bias The publication of research findings in a particular language, depending on the nature and direction of results Outcome reporting bias The selective reporting of some outcomes but not others, depending on the nature and direction of results Source: Sterne et al. (Eds.) (2008: 298) Hugh Waddington www.3ieimpact.org How much of a problem is it likely to be in international development research? • ‘Exploratory’ research tradition in social sciences suggests potentially severe problems of file drawer effects • Publication bias may be partly mitigated by tradition of publishing ‘working papers’ and modern electronic dissemination • File drawer effects arguably more problematic for observational data (and small sample intervention studies) • Testing for publication bias usually relies on testing for ‘small study effects’; but biases due to small study effects may also result from other factors => But we need more evidence since very little development research has addressed this topic Hugh Waddington www.3ieimpact.org Farmer field schools Hugh Waddington www.3ieimpact.org • FFS originally associated with FAO and Integrated Pest Management (IPM) • Originated in response to the overuse of pesticides in irrigated rice systems in Asia • Belief that farmers need confidence to reduce dependence on pesticides, through ‘discovery learning’ • Aim to promote use of good practices and improve agriculture and other outcomes • Now applied globally in 90+ countries, millions of beneficiaries, range of crops and curricula A ‘best practice’ FFS (c)Hugh JM Micaud Waddington for FAO www.3ieimpact.org • Group of 25 farmers, meeting once a week in a designated field during the growing season • Exploratory: facilitator encourages farmers to ask questions, and to seek answers, rather than lecturing or giving recommendations. • Experimentation: group manages two plots • Participatory: emphasis on social learning with exercises to build group dynamics • Field days and follow-up activities may be provided for diffusion of message to neighbours 3ie review motivated by polarised debate • "Studies reported substantial and consistent reductions in pesticide use attributable to the effect of training. In a number of cases, there was also a convincing increase in yield due to training.... Results demonstrated remarkable, widespread and lasting developmental impacts” (Van den Berg 2004, FAO) • “The analysis, employing a modified ‘difference-indifferences’ model, indicates that the program did not have significant impacts on the performance of graduates and their neighbors” (Feder et al. 2004) • But how good are they really - what does a systematic review of the evidence say? Hugh Waddington www.3ieimpact.org 3ie’s review objectives and background • Produce high quality review of relevance to decision makers • Mixed methods review of effects on outcomes along causal chain and barriers and facilitators of change • Peer review managed by Campbell Collaboration • Discussion with FAO led to inclusion of wide range of impact evaluation research being included in the effectiveness review Hugh Waddington www.3ieimpact.org Large body of evidence found • 3ie systematic review found 93 separate ‘impact evaluations’ in LMICs • • • Experimental, quasi-experimental with controlled comparison (no treatment, pipeline, other intervention) were included Wide variation in attribution methods used: no RCTs, quasi-experiments of varying quality Small samples: 400 farmers on average (sample size ranges from 24 to 3,000), often in only a handful of villages, and short follow-up periods (usually less than 2 years) • Studies collected measuring outcomes across causal chain: – – – – Knowledge Adoption Agriculture outcomes (yields, net revenues) Health, environment, empowerment outcomes • Analysis today focuses today on impacts on yields for FFS participants: usually self-reported weight of production per unit of area Hugh Waddington www.3ieimpact.org Study Ali and Sharif, 2011 Birthal et al., 2000 Carlberg et al., 2012 Cavatassi et al., 2011 Davis et al., 2012 Study characteristics Crop Cotton Cotton Other staple/veg. Other staple/veg. Other staple/veg. Dinpanah et al., 2010 Feder et al., 2004 Gockowski et al., 2010 Hiller et al., 2009 Huan et al., 1999 Region (country) SA (Pakistan) SA (India) SSA (Ghana) LAC (Ecuador) SSA (Kenya, Tanzania) MENA (Iran) EAP (Indonesia) SSA (Ghana) SSA (Kenya) EAP (Vietnam) Rice Rice Tree crop Tree crop Rice Yield outcome measure Yield (kg per ha) Value of Yield (value per ha) Yield (50 kg bags per acre 2010). Yield (kg per ha) Value of Yield (growth rate in value local currency per acre) Yield (ton per ha) Yield (growth rate in yield, kg per ha) Sales (quantity of produce sold in 2004/05 season) Yield (growth rate in yield, kg per acre) Yield (ton per ha) Khan et al., n.d. Labarta, 2005 Mutandwa & Mpangwa, 2004 SA (Pakistan) LAC (Nicaragua) SSA (Zimbabwe) Cotton Other staple/veg. Cotton Yield (growth rate in yield, kg per ha) Yield (per ha) Yield (number of bales) Naik et al., 2008 Orozco Cirilo et al., 2008 Palis, 1998 SA (India) LAC (Mexico) EAP (Philippines) Other staple/veg. Other staple/veg. Rice Yield (quintals of produce) Yield (growth rate in ton per ha) Yield (growth rate in ton per ha) Pananurak, 2010 EAP (China) SA (India, Pakistan) Cotton Yield (growth rate in kg per ha) Pande et al., 2009 SA (Nepal) Rice Rejesus et al., 2010 EAP (Vietnam) Rice Todo & Takahashi, 2011 SSA (Ethiopia) Other staple/veg. Van den Berg et al., 2002 SA (Sri Lanka) Rice Van Rijn, 2010 LAC (Peru) Tree crop Wandji et al., 2007 SSA (Cameroon) Tree crop Wu Lifeng, 2010 EAP (China) Cotton Yang et al., 2005 EAP (China) Cotton Yamazaki and Resosudarmo, EAP (Indonesia) Rice 2007Hugh Waddington www.3ieimpact.org Zuger, 2004 LAC (Peru) Other staple/veg. Yields (ton/ha) Yields (growth rate in tonnes per ha) Value of production (growth rate, in Eth birr) Yield (kg per ha) Yield (kg per ha, 2007) Sales (Kg of cocoa sold in the 2004-05 season) Yield (growth rate in kg per ha) Yield (kg per ha) Yield (growth rate in kg per ha) Yield (ton per ha) Unit of analysis is the study-level ‘effect size’ • ‘Response ratio’ effect size calculated for each study: Yt RR Yc or Yc Yc • RR standard error calculations: SERR 1 1 Sp ntYt 2 ncYc 2 Hugh Waddington www.3ieimpact.org or ln( RR) exp t Before we turn to examination of publication bias, here’s some summary results from the meta-analysis of outcomes along the causal chain Hugh Waddington www.3ieimpact.org Input 1 Training of trainers Input 2 Field school Farmer field school stylised causal chain Knowledge (FFS participants) Knowledge (FFS neighbours) Adoption (FFS participants) Adoption (FFS neighbours) Measured impacts: Hugh Waddington Yield, income/net revenue, empowerment, environment & health www.3ieimpact.org outcomes Knowledge of ‘improved’ farming practices Study ID ES (95% CI) FFS participants Huan et al., 1999 (Vietnam) 0.02 (-0.06, 0.10) Endalew, 2009 (Ethiopia) 0.27 (-0.06, 0.60) Price et al., 2001 (Philippines) 0.42 (-0.17, 1.01) Rao et al., 2012 (India) 0.43 (-0.02, 0.87) Reddy & Suryamani, 2005 (India) 0.45 (-0.04, 0.94) Mutandwa & Mpangwa, 2004 (Zimbabwe) 0.59 (0.25, 0.92) Dinpanah et al., 2010 (Iran) 0.67 (0.41, 0.92) Khan et al., 2007 (Pakistan) 0.79 (0.29, 1.29) Bunyatta et al., 2006 (Kenya) 1.03 (0.65, 1.41) Erbaugh, 2010 (Uganda) 1.14 (0.93, 1.34) Rebaudo & Dangles, 2011 (Ecuador) 1.79 (1.17, 2.41) Subtotal (I-squared = 93.9%, p = 0.000) 0.67 (0.33, 1.02) . FFS neighbours Khan et al., 2007 (Pakistan) -0.13 (-0.68, 0.42) Reddy & Suryamani, 2005 (India) 0.05 (-0.45, 0.56) Ricker-Gilbert et al, 2008 (Bangladesh) 0.17 (-0.25, 0.59) Rebaudo & Dangles, 2011 (Ecuador) 0.38 (-0.15, 0.91) Subtotal (I-squared = 0.0%, p = 0.610) 0.13 (-0.12, 0.37) . NOTE: Weights are from random effects analysis -.5 Hugh Waddington www.3ieimpact.org 0 .5 1 Favours intervention 3 Study ID Pesticide demand ES (95% CI) FFS participants Yamazaki & Resosudarmo, 2007 (Indonesia) Birthal et al., 2000 (India) Yang et al., 2005 (China) Yorobe & Rejesus, 2011 (Philippines) Yang et al., 2005 (China) Khan et al., 2007 (Pakistan) Khalid, n.d. (Sudan) Rejesus et al, 2010 (Vietnam) Pananurak, 2010 (India) Mutandwa & Mpangwa, 2004 (Zimbabwe) Pananurak, 2010 (Pakistan) Amera, 2008 (Kenya) Pananurak, 2010 (China) Mancini et al., 2008 (India) Wu Lifeng, 2010 (China) Huan et al., 1999 (Vietnam) Van den Berg et al., 2002 (Sri Lanka) Praneetvatakul & Waibel, 2006 (Thailand) Murphy et al., 2002 Vietnam) Cole et al., 2007 (Ecuador) Ali & Sharif, 2011 (Pakistan) Khan et al., 2007 (Pakistan) Labarta, 2005 (Nicaragua) Feder et al, 2004 (Indonesia) Cavatassi et al., 2011 (Ecuador) Friis-Hansen et al., 2004 (Uganda) Subtotal (I-squared = 93.2%, p = 0.000) . FFS neighbours Pananurak, 2010 (India) Khan et al., 2007 (Pakistan) Yamazaki & Resosudarmo, 2007 (Indonesia) Wu Lifeng, 2010 (China) Pananurak, 2010 (Pakistan) Labarta, 2005 (Nicaragua) Pananurak, 2010 (China) Praneetvatakul & Waibel, 2006 (Thailand) Khan et al., 2007 (Pakistan) Feder et al, 2004 (Indonesia) Subtotal (I-squared = 84.6%, p = 0.000) . NOTE: Weights are from random effects analysis .1 Hugh Waddington .25 .5 Favours intervention www.3ieimpact.org 1 2 0.20 0.21 0.32 0.37 0.41 0.46 0.48 0.52 0.52 0.57 0.59 0.61 0.65 0.67 0.71 0.72 0.82 0.82 0.83 0.88 0.90 0.91 0.95 1.30 1.34 1.42 0.66 (0.01, (0.17, (0.21, (0.18, (0.36, (0.39, (0.31, (0.24, (0.30, (0.36, (0.41, (0.52, (0.50, (0.46, (0.64, (0.62, (0.74, (0.68, (0.75, (0.68, (0.75, (0.28, (0.39, (1.08, (0.99, (1.09, (0.56, 3.23) 0.26) 0.48) 0.78) 0.46) 0.54) 0.75) 1.12) 0.92) 0.89) 0.87) 0.71) 0.84) 0.97) 0.80) 0.84) 0.90) 0.98) 0.93) 1.13) 1.09) 2.94) 2.34) 1.57) 1.80) 1.86) 0.78) 0.54 0.61 0.67 0.68 0.78 0.99 1.11 1.15 1.20 1.30 0.88 (0.25, (0.51, (0.12, (0.62, (0.40, (0.42, (0.69, (0.92, (0.40, (1.09, (0.68, 1.15) 0.74) 3.88) 0.76) 1.49) 2.33) 1.79) 1.43) 3.53) 1.55) 1.14) Yields Study ID ES (95% CI) FFS participants Pananurak, 2010 (India) Van Rijn, 2010 (Peru) Naik et al., 2008 (India) Huan et al., 1999 (Vietnam) Labarta, 2005 (Nicaragua) Rejesus et al, 2010 (Vietnam) Feder et al, 2004 (Indonesia) Wu Lifeng, 2010 (China) Ali & Sharif, 2011 (Pakistan) Pananurak, 2010 (China) Gockowski et al., 2010 (Ghana) Yang et al., 2005 (China) Hiller et al., 2009 (Kenya) Khan et al., 2007 (Pakistan) Gockowski et al., 2010 (Ghana) Cavatassi et al., 2011 (Ecuador) Davis et al, 2012 (Tanzania) Birthal et al., 2000 (India) Pananurak, 2010 (Pakistan) Dinpanah et al., 2010 (Iran) Wandji et al., 2007 (Cameroon) Mutandwa & Mpangwa, 2004 (Zimbabwe) Palis, 1998 (Philippines) Zuger 2004 (Peru) Carlberg et al., 2012 (Ghana) Yamazaki & Resosudarmo, 2007 (Indonesia) Van den Berg et al., 2002 (Sri Lanka) Davis et al, 2012 (Kenya) Pande et al., 2009 (Nepal) Gockowski et al., 2010 (Ghana) Dinpanah et al., 2010 (Iran) Orozco Cirilo et al., 2008 b) (Mexico) Todo & Takahashi, 2011 (Ethiopia) Subtotal (I-squared = 92.7%, p = 0.000) . FFS neighbours Pananurak, 2010 (India) Khan et al., 2007 (Pakistan) Feder et al, 2004 (Indonesia) Labarta, 2005 (Nicaragua) Pananurak, 2010 (China) Wu Lifeng, 2010 (China) Pananurak, 2010 (Pakistan) Yamazaki & Resosudarmo, 2007 (Indonesia) Subtotal (I-squared = 49.5%, p = 0.054) . NOTE: Weights are from random effects analysis .5 1 2 3 Favours intervention Hugh Waddington www.3ieimpact.org 0.80 0.86 0.89 0.95 0.97 0.97 0.98 1.08 1.09 1.09 1.14 1.15 1.17 1.17 1.18 1.22 1.23 1.24 1.24 1.32 1.32 1.36 1.36 1.44 1.58 1.67 1.68 1.81 2.11 2.16 2.52 2.62 2.71 1.24 (0.61, (0.63, (0.83, (0.92, (0.92, (0.72, (0.96, (1.03, (1.03, (1.04, (1.03, (0.94, (0.53, (0.97, (1.07, (0.97, (1.00, (1.13, (1.01, (1.22, (1.07, (1.06, (0.97, (1.09, (1.19, (1.23, (1.30, (1.15, (1.25, (0.99, (2.05, (2.23, (1.11, (1.16, 1.05) 1.18) 0.96) 0.98) 1.02) 1.31) 1.01) 1.14) 1.15) 1.14) 1.25) 1.41) 2.56) 1.42) 1.30) 1.53) 1.51) 1.36) 1.54) 1.42) 1.63) 1.73) 1.92) 1.92) 2.10) 2.26) 2.18) 2.84) 3.56) 4.69) 3.11) 3.08) 6.60) 1.32) 0.79 0.97 0.99 1.00 1.02 1.03 1.03 1.43 1.01 (0.63, (0.74, (0.97, (0.99, (0.98, (0.99, (0.86, (1.05, (0.98, 1.00) 1.26) 1.01) 1.01) 1.07) 1.08) 1.25) 1.96) 1.03) Study ID Net revenues (income less costs) ES (95% CI) FFS participants Labarta, 2005 (Nicaragua) Pananurak, 2010 (India) Waarts et al., 2012 (Kenya) Pananurak, 2010 (China) Pananurak, 2010 (Pakistan) Naik et al., 2008 (India) Van de Fliert 2000 (Indonesia) Van den Berg et al., 2002 (Sri Lanka) Yang et al., 2005 (China) Khan et al., 2007 (Pakistan) Subtotal (I-squared = 57.1%, p = 0.013) . FFS training + input/marketing support Birthal et al., 2000 (India) Van Rijn, 2010 (Peru) Cavatassi et al., 2011 (Ecuador) Palis, 1998 (Philippines) Subtotal (I-squared = 96.2%, p = 0.000) . FFS neighbours Pananurak, 2010 (India) Pananurak, 2010 (China) Pananurak, 2010 (Pakistan) Labarta, 2005 (Nicaragua) Khan et al., 2007 (Pakistan) Subtotal (I-squared = 0.0%, p = 0.706) .NOTE: Weights are from random effects analysis .2 Hugh Waddington .5 1 2 3 Favours intervention www.3ieimpact.org 0.28 1.06 1.14 1.17 1.23 1.25 1.31 1.41 1.53 3.40 1.28 (0.02, (0.68, (0.92, (1.08, (1.09, (1.09, (1.11, (1.19, (1.10, (1.94, (1.17, 3.48) 1.66) 1.41) 1.27) 1.40) 1.42) 1.55) 1.67) 2.15) 5.97) 1.41) 1.43 2.00 3.34 4.61 2.57 (1.19, (1.02, (1.56, (3.83, (1.18, 1.72) 3.94) 7.15) 5.56) 5.58) 0.93 1.07 1.13 1.39 1.51 1.08 (0.66, (1.00, (1.01, (0.66, (0.51, (1.03, 1.32) 1.14) 1.26) 2.92) 4.45) 1.15) Detecting publication bias • The only direct evidence for publication bias is through comparison of published and unpublished study results • But there are also ways of assessing likelihood of publication bias directly and indirectly – Assess reporting biases in each study – Statistical analysis based on sample size Hugh Waddington www.3ieimpact.org “An ounce of prevention is worth a pound of cure” Sources of grey literature: (1) Multidisciplinary: Google, Google Scholar (2) International development specific: JOLIS, BLDS and ELDIS (Institute of Development Studies) (3) Good sources for impact evaluations: J-PAL/IPA databases, 3ie’s database of impact evaluations (4) Subject-specific, e.g. IDEAS/Repec for economics, ERIC for education, LILACS for Latin American health publications, ALNAP for humanitarian (5) Conference proceedings, technical reports (research, governmental agencies), organization websites, dissertations, theses, contact with primary researchers Hugh Waddington www.3ieimpact.org Meta-analysis of studies by publication status: journal v other Study ID ES (95% CI) Not published in journal Pananurak, 2010 (India) Van Rijn, 2010 (Peru) Naik et al., 2008 (India) Labarta, 2005 (Nicaragua) Wu Lifeng, 2010 (China) Pananurak, 2010 (China) Hiller et al., 2009 (Kenya) Pananurak, 2010 (Pakistan) Wandji et al., 2007 (Cameroon) Zuger 2004 (Peru) Carlberg et al., 2012 (Ghana) Van den Berg et al., 2002 (Sri Lanka) Pande et al., 2009 (Nepal) Todo & Takahashi, 2011 (Ethiopia) Subtotal (I-squared = 84.2%, p = 0.000) . Published in journal Huan et al., 1999 (Vietnam) Rejesus et al, 2010 (Vietnam) Feder et al, 2004 (Indonesia) Ali & Sharif, 2011 (Pakistan) Gockowski et al., 2010 (Ghana) Yang et al., 2005 (China) Khan et al., 2007 (Pakistan) Gockowski et al., 2010 (Ghana) Cavatassi et al., 2011 (Ecuador) Davis et al, 2012 (Tanzania) Birthal et al., 2000 (India) Dinpanah et al., 2010 (Iran) Mutandwa & Mpangwa, 2004 (Zimbabwe) Palis, 1998 (Philippines) Yamazaki & Resosudarmo, 2007 (Indonesia) Davis et al, 2012 (Kenya) Gockowski et al., 2010 (Ghana) Dinpanah et al., 2010 (Iran) Orozco Cirilo et al., 2008 b) (Mexico) Subtotal (I-squared = 95.0%, p = 0.000) . Overall (I-squared = 92.7%, p = 0.000) (0.61, (0.63, (0.83, (0.92, (1.03, (1.04, (0.53, (1.01, (1.07, (1.09, (1.19, (1.30, (1.25, (1.11, (1.04, 1.05) 1.18) 0.96) 1.02) 1.14) 1.14) 2.56) 1.54) 1.63) 1.92) 2.10) 2.18) 3.56) 6.60) 1.24) 0.95 0.97 0.98 1.09 1.14 1.15 1.17 1.18 1.22 1.23 1.24 1.32 1.36 1.36 1.67 1.81 2.16 2.52 2.62 1.30 (0.92, (0.72, (0.96, (1.03, (1.03, (0.94, (0.97, (1.07, (0.97, (1.00, (1.13, (1.22, (1.06, (0.97, (1.23, (1.15, (0.99, (2.05, (2.23, (1.19, 0.98) 1.31) 1.01) 1.15) 1.25) 1.41) 1.42) 1.30) 1.53) 1.51) 1.36) 1.42) 1.73) 1.92) 2.26) 2.84) 4.69) 3.11) 3.08) 1.42) 1.24 (1.16, 1.32) NOTE: Weights are from random effects analysis .5 Hugh Waddington 0.80 0.86 0.89 0.97 1.08 1.09 1.17 1.24 1.32 1.44 1.58 1.68 2.11 2.71 1.14 www.3ieimpact.org 1 2 3 Favours intervention Assess likelihood of file-drawer effects in each study • Is there evidence that results have been reported selectively: – outcomes not reported despite data collected (or indicated in methods section, or reported in study protocol if available)? – existence of studies reporting other outcomes? • Have outcomes been constructed in a way which is uncommon which might suggest biased exploratory research? Hugh Waddington www.3ieimpact.org Risk of bias (including file drawer effects) assessment for studies included in meta-analysis Hugh Waddington www.3ieimpact.org Additional evidence for file-drawer effects • 34% (14/41) of studies which report data on yields not includable in meta-analysis because do not provide standard errors or information to calculate them • 30% (27/91) of all studies do not provide information on yields or other agriculture outcomes (net revenues) despite collecting data on knowledge/adoption Hugh Waddington www.3ieimpact.org Detecting publication bias statistically Methods for detecting publication bias assume: • Large n studies are likely to get published regardless of results due to time and money investments • Medium n studies will have some modest significant effects that are reported, others may never be published • Small n studies with the largest effects are most likely to be reported, but many will never be published or will be difficult to locate Hugh Waddington www.3ieimpact.org Funnel Plots • Exploratory tool used to visually assess the possibility of publication bias in a meta-analysis • Scatter plot of effect size (x-axis) against some measure of study size (y-axis) • Precision of estimates increases as the sample size of a study increases – Estimates from small n studies (i.e., less precise, larger standard errors) will show more variability in the effect size estimates, thus a wider scatter on the plot – Estimates from larger n studies will show less variability in effect size estimates, thus have a narrower scatter on the plot • If publication bias is present, we would expect null or ‘negative’ findings from small n studies to be suppressed (i.e., missing from the plot) Hugh Waddington www.3ieimpact.org Hugh Waddington www.3ieimpact.org Farmer field schools – FFS participant yields .5 .4 .3 .2 .1 0 Funnel plot with pseudo 95% confidence limits -1 -.5 0 Ln RR Not published in journal Lower CI Pooled Hugh Waddington www.3ieimpact.org .5 Published in journal Lower CI 1 Tests for Funnel Plot Asymmetry • Several regression tests are available to test for funnel plot asymmetry – attempt to overcome subjectivity of visual funnel plot inspection • Framed as tests for “small study effects”, or the tendency for smaller n studies to show greater effects than larger n studies; i.e., effects aren’t necessarily a result of bias • Egger test, Peters test (modified Egger test for use with log odds ratio effect sizes), Begg’s test, selection modeling (Hedges & Vevea, 2005), failsafe n (not recommended) (Becker, 2005) Hugh Waddington www.3ieimpact.org Egger Test • Weighted regression of the effect size on standard error (w=inverse variance) ESi 1 0 sei i H0 : 0 0 – β0 = 0 indicates a symmetric funnel plot – β0 > 0 shows less precise (i.e., smaller n) studies yield bigger effects – Can be extended to include p predictors hypothesized to potentially explain funnel plot asymmetry (Sterne et al., 2001) (see analysis below) • Limitations: – Low power unless there is severe bias and large n – Inflated Type I error with large treatment effects, rare event data, or equal sample sizes across studies – Inflated Type I error with log odds ratio effect sizes Hugh Waddington www.3ieimpact.org Egger test for FFS-participant yields Egger's publication bias plot Coef. t P>t -0.047 3.085 -1.70 4.14 0.100 0.000 15 const slope standardized effect 10 5 0 -5 0 Hugh Waddington 50 precision www.3ieimpact.org 100 ‘Trim and fill’ analysis (Duval & Tweedie, 2000) • Iteratively trims (removes) smaller studies causing asymmetry • Uses trimmed plot to re-estimate the mean effect size • Fills (replaces) omitted studies and mirror-images • Provides an estimate of the number of missing (filled) studies and a new estimate of the mean effect size • Major limitations include: misinterpretation of results, assumption of a symmetric funnel plot, poor performance in the presence of heterogeneity Hugh Waddington www.3ieimpact.org ‘Trim & fill’ for FFS-participant yields Hugh Waddington www.3ieimpact.org Results of meta-trim 95% lower Effect size 95% upper Num. studies 1.16 1.23 1.32 31 Filled meta1.03 analysis 1.10 1.17 40 Metaanalysis Hugh Waddington www.3ieimpact.org Cumulative meta-analysis • Typically used to update pooled effect size estimate with each new study cumulatively over time • Can use as an alternative to update pooled effect size estimate with each study in order of largest to smallest sample size • If pooled effect size does not shift with the addition of small n studies, provides some evidence against publication bias Hugh Waddington www.3ieimpact.org Cumulative meta-analysis for FFS-participant yields: studies ordered by sample size from largest to smallest Study ID ES (95% CI) Pande et al., 2009 (Nepal) Huan et al., 1999 (Vietnam) Danida, 2011 (Bangladesh) Wu Lifeng, 2010 (China) Cavatassi et al., 2011 (Ecuador) Islam et al., 2006 (Bangladesh) Pananurak, 2010 (China) Van den Berg et al., 2002 (Sri Lanka) Kabir & Uphoff, 2007 (Myanmar) Labarta, 2005 (Nicaragua) Dinpanah et al., 2010 (Iran) Dinpanah et al., 2010 (Iran) Davis et al, 2012 (Kenya) Davis et al, 2012 (Uganda) Davis et al, 2012 (Tanzania) Carlberg et al., 2012 (Ghana) Yamazaki & Resosudarmo, 2007 (Indonesia) Feder et al, 2004 (Indonesia) Gockowski et al., 2010 (Ghana) Gockowski et al., 2010 (Ghana) Gockowski et al., 2010 (Ghana) Ali & Sharif, 2011 (Pakistan) Todo & Takahashi, 2011 (Ethiopia) Gockowski et al., 2005 (Nigeria) Zuger 2004 (Peru) Waarts et al., 2012 (Kenya) Van Rijn, 2010 (Peru) Pananurak, 2010 (Pakistan) Wandji et al., 2007 (Cameroon) Mutandwa & Mpangwa, 2004 (Zimbabwe) Mancini et al., 2008 (India) Khan et al., 2007 (Pakistan) Van de Fliert 2000 (Indonesia) Hiller et al., 2009 (Kenya) Naik et al., 2008 (India) Pananurak, 2010 (India) Birthal et al., 2000 (India) Birthal et al., 2000 (India) Orozco Cirilo et al., 2008 b) (Mexico) Rejesus et al, 2010 (Vietnam) Palis, 1998 (Philippines) Torrez et al., 1999 (Bolivia) Yang et al., 2005 (China) Williamson et al., 2003 (Kenya) Williamson et al., 2003 (India) Jones, n.d. (Sri Lanka) 2.11 1.35 1.35 1.06 1.09 1.09 1.08 1.14 1.14 1.09 1.23 1.25 1.27 1.27 1.27 1.29 1.31 1.24 1.23 1.24 1.23 1.22 1.22 1.22 1.23 1.23 1.22 1.22 1.22 1.23 1.23 1.23 1.23 1.23 1.20 1.19 1.19 1.19 1.24 1.24 1.24 1.24 1.24 1.24 1.24 1.24 .281 Hugh Waddington www.3ieimpact.org 1 3.56 (1.25, (0.62, (0.62, (0.92, (0.96, (0.96, (0.98, (1.02, (1.02, (1.00, (1.10, (1.12, (1.14, (1.14, (1.14, (1.16, (1.18, (1.15, (1.14, (1.15, (1.15, (1.14, (1.14, (1.14, (1.15, (1.15, (1.14, (1.14, (1.15, (1.15, (1.15, (1.15, (1.15, (1.15, (1.13, (1.12, (1.13, (1.13, (1.17, (1.16, (1.16, (1.16, (1.16, (1.16, (1.16, (1.16, 3.56) 2.95) 2.95) 1.23) 1.25) 1.25) 1.20) 1.27) 1.27) 1.19) 1.38) 1.40) 1.42) 1.42) 1.41) 1.43) 1.45) 1.35) 1.33) 1.34) 1.33) 1.30) 1.31) 1.31) 1.32) 1.32) 1.30) 1.30) 1.30) 1.31) 1.31) 1.30) 1.30) 1.30) 1.28) 1.26) 1.27) 1.27) 1.33) 1.32) 1.32) 1.32) 1.32) 1.32) 1.32) 1.32) • The evidence for ‘small study effects’ seems strong, but is this due to publication bias? • Asymmetry could be due to factors other than publication bias, e.g., – methodological quality (smaller studies with lower quality may have exaggerated treatment effects) – Artefactual variation (e.g. outcome measurement) – Chance – True heterogeneity due to intervention characteristics (FFS-type, region, crop, follow-up length) • Assessing funnel plot symmetry relies entirely on subjective visual judgment Hugh Waddington www.3ieimpact.org Analysis by study quality .5 .4 .3 .2 .1 0 Funnel plot with pseudo 95% confidence limits -1 -.5 0 Ln RR High risk of bias Lower CI Pooled Hugh Waddington www.3ieimpact.org .5 Medium risk of bias Lower CI 1 Contour Enhanced Funnel Plots • Based on premise that statistical significance is most important factor determining publication • Funnel plot with additional contour lines associated with ‘milestones’ of statistical significance: p = .01, .05, .1 – If studies are missing in areas of statistical nonsignificance, publication bias may be present – If studies are missing in areas of statistical significance, asymmetry may be due to factors other than publication bias – If there are no studies in areas of statistical significance, publication bias may be present • Can help distinguish funnel plot asymmetry due to publication bias versus other factors (Peters et al., 2008) Hugh Waddington www.3ieimpact.org 0 Studies p < 1% 1% < p < 5% .1 5% < p < 10% p > 10% .2 .3 .4 .5 -1 Hugh Waddington -.5 0 Effect estimate www.3ieimpact.org .5 1 Meta-regression analysis (t-stats reported) STANDARD ERROR (LN_SE) 1 2 3 4 5 6 7 4.37*** 4.33*** 3.90*** 3.81*** 3.21*** 3.53*** 4.60*** 0.52 0.61 0.15 1.37 0.07 1.64 HIGH QUALITY INTERACTION(HIGH QUALITY*LN_SE) -1.83* FFS+ 0.51 YIELD MEASURE DUMMIES -1.01 0.73 0.35 Yes REGION DUMMIES Yes CROP-TYPE DUMMIES Yes ADJ. R-SQU 0.36 0.34 0.33 0.45 0.42 0.29 0.39 N.OBS 33 33 33 33 33 33 33 Specification 7 suggests heterogeneity from small study effects due to study quality Hugh Waddington www.3ieimpact.org Meta-analysis also suggests bias due to study quality Study ID ES (95% CI) High risk of bias Naik et al., 2008 (India) Huan et al., 1999 (Vietnam) Labarta, 2005 (Nicaragua) Gockowski et al., 2010 (Ghana) Yang et al., 2005 (China) Hiller et al., 2009 (Kenya) Khan et al., 2007 (Pakistan) Gockowski et al., 2010 (Ghana) Birthal et al., 2000 (India) Dinpanah et al., 2010 (Iran) Wandji et al., 2007 (Cameroon) Mutandwa & Mpangwa, 2004 (Zimbabwe) Palis, 1998 (Philippines) Zuger 2004 (Peru) Carlberg et al., 2012 (Ghana) Van den Berg et al., 2002 (Sri Lanka) Pande et al., 2009 (Nepal) Gockowski et al., 2010 (Ghana) Dinpanah et al., 2010 (Iran) Orozco Cirilo et al., 2008 b) (Mexico) Subtotal (I-squared = 94.9%, p = 0.000) . Medium riskrisk of bias Medium of bias Pananurak, 2010 (India) Van Rijn, 2010 (Peru) Rejesus et al, 2010 (Vietnam) Feder et al, 2004 (Indonesia) Wu Lifeng, 2010 (China) Ali & Sharif, 2011 (Pakistan) Pananurak, 2010 (China) Cavatassi et al., 2011 (Ecuador) Davis et al, 2012 (Tanzania) Pananurak, 2010 (Pakistan) Yamazaki & Resosudarmo, 2007 (Indonesia) Davis et al, 2012 (Kenya) Todo & Takahashi, 2011 (Ethiopia) Subtotal (I-squared = 81.0%, p = 0.000) . Overall (I-squared = 92.7%, p = 0.000) (0.83, (0.92, (0.92, (1.03, (0.94, (0.53, (0.97, (1.07, (1.13, (1.22, (1.07, (1.06, (0.97, (1.09, (1.19, (1.30, (1.25, (0.99, (2.05, (2.23, (1.20, 0.96) 0.98) 1.02) 1.25) 1.41) 2.56) 1.42) 1.30) 1.36) 1.42) 1.63) 1.73) 1.92) 1.92) 2.10) 2.18) 3.56) 4.69) 3.11) 3.08) 1.51) 0.80 0.86 0.97 0.98 1.08 1.09 1.09 1.22 1.23 1.24 1.67 1.81 2.71 1.10 (0.61, (0.63, (0.72, (0.96, (1.03, (1.03, (1.04, (0.97, (1.00, (1.01, (1.23, (1.15, (1.11, (1.03, 1.05) 1.18) 1.31) 1.01) 1.14) 1.15) 1.14) 1.53) 1.51) 1.54) 2.26) 2.84) 6.60) 1.17) 1.24 (1.16, 1.32) NOTE: Weights are from random effects analysis .5 Hugh Waddington 0.89 0.95 0.97 1.14 1.15 1.17 1.17 1.18 1.24 1.32 1.32 1.36 1.36 1.44 1.58 1.68 2.11 2.16 2.52 2.62 1.35 www.3ieimpact.org 1 2 3 Favours intervention Final thoughts • Evidence of upwards bias in low quality vs higher quality quasi-experiments => Where ‘relevance’ of review is important for users, careful risk of bias assessment and sensitivity analysis required • Study quality appears more important than publication bias in explaining small study effects, but we do also find evidence for file drawer effects in the literature • Statistical tests available are sensitive to number of effect sizes available and are of limited validity where sample sizes homogeneous Hugh Waddington www.3ieimpact.org Recommended Reading • Duval, S. J., & Tweedie, R. L. (2000). A non-parametric ‘trim and fill’ method of accounting for publication bias in meta-analysis. Journal of the American Statistical Association, 95, 89-98. • Egger, M., Davey Smith, G., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. British Medical Journal, 315, 629-634. • Hammerstrøm, K., Wade, A., Jørgensen, A. K. (2010). Searching for studies: A guide to information retrieval for Campbell systematic reviews. Campbell Systematic Review, Supplement 1. • Harbord, R. M., Egger, M., & Sterne, J. A. C. (2006). A modified test for small-study effects in meta-analyses of controlled trials with binary endpoints. Statistics in Medicine, 25, 3443-3457. • Peters, J. L., Sutton, A. J., Jones, D. R., Abrams, K. R., & Rushton, L. (2008). Contour-enhanced meta-analysis funnel plots help distinguish publication bias from other causes of asymmetry. Journal of Clinical Epidemiology, 61, 991-996. Hugh Waddington www.3ieimpact.org Recommended Reading • Rosenthal, R. (1979). The ‘file-drawer problem’ and tolerance for null results. Psychological Bulletin, 86, 638-641. • Rothstein, H. R., Sutton, A. J., & Borenstein, M. L. (Eds). (2005). Publication bias in meta-analysis: Prevention, assessment and adjustments. Hoboken, NJ: Wiley. • Rücker, G., Schwarzer, G., & Carpenter, J. (2008). Arcsine test for publication bias in meta-analyses with binary outcomes. Statistics in Medicine, 27, 746-763 • Sterne, J. A., & Egger, M. (2001). Funnel plots for detecting bias in meta-analysis: Guidelines on choice of axis. Journal of Clinical Epidemiology, 54, 1046-1055. • Sterne, J. A. C., Egger, M., & Moher, D. (Eds.) (2008). Chapter 10: Addressing reporting biases. In J. P. T. Higgins & S. Green (Eds.), Cochrane handbook for systematic reviews of interventions, pp. 297 – 333. Chichester, UK: Wiley. • Sterne, J. A. C., et al. (2011). Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ, 343, d4002. • Waddington, H., White, H., Snilstveit, B., Hombrados, J. Vojtkova, M. (2012) How to do a good systematic review of effects in international development: a tool-kit. Journal of Development Effectiveness, 4 (3). Hugh Waddington www.3ieimpact.org