Systematic Reviews in International Development: 10-Year Review

Journal of Development Effectiveness
What have we learned after ten years of
systematic reviews in international development?
Hugh Waddington, Edoardo Masset & Emmanuel Jimenez
VOL. 10, NO. 1, 1–16
What have we learned after ten years of systematic reviews in
international development?
Hugh Waddingtona, Edoardo Massetb and Emmanuel Jimenezc
International Initiative for Impact Evaluation (3ie), London International Development Centre, London School of
Hygiene and Tropical Medicine (LSHTM), London, UK; bCentre for Development, Impact and Learning, LSHTM,
London, UK; c3ie, New Delhi, India
The paper discusses the role of systematic evidence in helping make
better decisions to reach global development targets. Coming at the end
of the first decade of serious funding and support for systematic evidence generation in development economics and development studies,
the paper presents opportunities and challenges for the continued
development of systematic review methodologies. It concludes by introducing the papers collected in the issue, which make and demonstrate
the case for theory-based approaches to evidence synthesis.
Systematic review; impact
evaluation; sustainable
development goals;
evidence-based policy
“The intellectual climate has changed quite dramatically over the last few decades. . . One set of prejudices has
given way to another – opposite – set of preconceptions. Yesterday’s unexamined faith has become today’s
heresy, and yesterday’s heresy is now the new superstition. . . The need for critical scrutiny of standard
preconceptions and political-economic attitudes has never been stronger.”
Amartya Sen, Development as Freedom (1999, 111-12).
“In my view a systematic review is gold dust. A systematic review is a global public good and I would like to
see these commissioned much more from communities of practice.”
Richard Manning, former Chair of the OECD Development Assistance Committee and 3ie Board.
1. The central role of systematic evidence in evidence-based policy
Reliable evidence is essential for making good choices in international development. Decision
makers are demanding better data, partly in recognition of the need to ensure coverage and
quality of programmes to meet ambitious Sustainable Development Goal (SDG) targets by 2030
(United Nations 2015a). Some Millennium Development Goals (MDGs), such as maternal and child
mortality and sanitation targets, were missed by a wide margin (United Nations 2015b). Others,
such as the halving of global dollar-a-day poverty and lack of access to drinking water, were
reached but substantial progress will be needed to attain universal coverage and ensure ‘no one is
left behind’. This implies that big improvements in resource allocation are needed over a short
period of time, and a recognition of the crucial role that rigorous and relevant evidence can play in
facilitating that change.
Rigorous primary studies that can answer questions about ‘what difference’ development
interventions make to people’s lives and ‘why’ have expanded rapidly since 2000 (Cameron,
Mishra and Brown 2016). They can help improve decisions around scaling up, scaling down or
redesigning projects in the contexts in which they are carried out. However, results from these
primary impact studies are often not communicated in relevant formats for decision-making. Even
when efforts are made to ensure results are available, the reliability and generalisability of that
evidence to other contexts is uncertain. A key additional value of systematic reviews is in the work
of systematically collecting, extracting and synthesising policy-important findings from these
studies and presenting implications for policy and practice.
Systematic reviews are not literature reviews. Literature reviews enable academics to communicate among themselves the latest developments in a field. Systematic reviews, on the other hand,
are undertaken by teams of researchers primarily to help those outside research take decisions –
what we may refer to as the ‘5 Ps’ (Rader et al. 2014): programme participants (and their families
and communities), practitioners, policy makers, the press, and members of the public. Systematic
reviewing is rather more like primary research, in the approach taken to formulating questions,
collecting data, critical appraisal and analysis (Cooper 1982). This is clear, for example, from the
efforts that should be made for internal-study quality assurance (double coding and checking), or
to extract and to transform data reported in primary studies into policy-relevant information, and
the transparency in reporting results based on analysis and policy implications based on results. For
this reason, 'systematic reviews and meta-analysis... embody a scientific approach' to the synthesis
of existing evidence (Littell, Corcoran, and Pillai 2008, 1).
In 2012, a special issue of the Journal of Development Effectiveness was devoted to systematic
reviews (White and Waddington 2012). Papers in that issue made the case for rigorous, theorybased systematic reviews that answered relevant questions using mixed-methods approaches.
Now, nearing the end of this first decade of global production of systematic evidence in development studies and development economics,1 we return to the topic to provide an assessment of the
progress made, drawing on examples of recent reviews.
In the next sections of this introduction, we present opportunities and challenges to ensure
rigour and relevance in reviewing: using programme theory; and keeping reviews up to date. The
final section introduces the papers contained in this issue.
2. Fostering learning by using programme theory and telling a good story
The production of systematic reviews on development topics has expanded dramatically in the
past decade. The International Initiative for Impact Evaluation’s (3ie’s) Systematic Reviews
Repository now contains over 600 completed reviews.2 Table 1 provides a list of completed reviews
in international development, commissioned largely by bilateral and multilateral donors and
produced by 3ie in partnership with the Campbell Collaboration International Development
Coordinating Group (IDCG).
There have been a number of calls for the incorporation of programme theory into systematic
reviews over the years (for example, Pawson 2002; Davies 2006; van der Knaap et al., 2008;
Waddington et al., 2009; Anderson et al. 2011; Waddington et al. 2012; Snilstveit 2012; Kneale,
Thomas, and Harris 2015; Maden et al. 2017; White 2018), as well as for multi-disciplinary working
(for example, Thomas et al. 2004; Snilstveit 2012; Oliver et al. 2017; White 2018). Programme theory
is usually incorporated into systematic reviews through logic models (flow diagrams which present
the intervention causal chain from inputs through to final outcomes) or theories of change (which
articulate the assumptions underlying the causal chain and the contexts and stakeholders for
whom the intervention is relevant), and sometimes through economic, social or psychological
theory to help articulate programme mechanisms.
As indicated in Table 1, the importance of using theory to develop relevant review questions,
structure evidence collection, and present findings is well-recognised by reviewers working in
international development.
One often hears the argument that reviews (and primary studies) drawing solely on quantitative
causal evidence from impact evaluations are unable to answer questions about why interventions
are successful or not. This is not true since reviews drawing on programme theory that collect
evidence on outcomes along the causal chain can explain heterogeneity in findings – usually
variation in quality of life outcomes due to differences in rates of programme adherence (for
example, Waddington and Snilstveit 2009; Welch et al. 2017). However, analysis of the rest of the
causal chain usually requires turning to evidence in studies which are often excluded from
systematic reviews on the grounds of study design. These mixed-methods reviews are able to
answer some of the most pressing development questions for policymakers and implementers –
reasons for successful implementation and participation drawing on participant or implementer
views, the effectiveness of targeting, unintended or adverse outcomes for vulnerable groups, or
questions about cost-effectiveness. We are learning important things from these reviews. For
example, systematic reviews in agriculture show:
● Land tenure reform tends to increase agricultural productivity and incomes in Asia and Latin
America, but not in sub-Saharan Africa where customary tenure may already provide tenure
security, or farmers are too poor to invest without additional support (Lawry et al. 2017). Land
reform may also have negative consequences, such as conflict, displacement, or reduced
property rights for women, as the qualitative evidence in this review indicated.
Top-down agricultural extension does not appear to be effective in improving harvests for
African smallholders (Stewart et al. 2016). On the other hand, farmer field schools (FFS), a
bottom-up learning approach, tend to improve outcomes along the causal chain (knowledge,
adoption, yields, income) for project participants. But evidence suggests that these FFS do not
work at scale due to problems in recruiting, training and supporting FFS facilitators, and they
may not be cost-effective as there are no spillovers to non-participants (Waddington and
White 2014).
Certification schemes are effective in raising prices and income from agriculture, but do not
usually improve household income and wages (Oya et al. 2017). Costs of implementing
standards can prevent poor farmers joining the schemes, and training is often not oriented
to the needs of smallholders and workers.
Contract farming may increase farmer income substantially, between 40 and 87 per cent on
average, but poorer farmers are not usually part of the schemes and biases in the primary
research studies mean that impacts are likely overestimated (Ton et al. 2017).
Payments for environmental services are effective in reducing deforestation and increasing
forest cover, and in improving household incomes (Samii et al., 2014b). But the effects are
small and unlikely to justify the costs of the schemes, and may not benefit poor people.
Reviews could still do more to engage with theory and evidence on second order outcomes (see also
Brown 2016). For example, in evaluating effects on net employment, wages and prices, reviews need
to draw on appropriate economic theory and incorporate incorporate evidence of spillover effects
collected from non-participants more consistently (assuming that the primary studies with appropriate clustering are available), or studies that measure or simulate outcomes in general equilibrium
(for a forthcoming example see Dorward et al. 2014).
While there has been much progress in how reviews incorporate theory to ensure relevance, there are legitimate concerns about the ways in which findings from reviews are
communicated, often in formats impenetrable to decision makers or insufficiently nuanced
to apply to complex interventions and contexts. Review findings need to be written in easily
accessible language and available to a wide audience of policymakers, international development professionals, and other users of evidence. Campbell produces short, plain language
summaries,3 and Cochrane’s summary of findings tables can effectively communicate technical information about the evidence base. These methods are useful but not sufficient to
communicate the nuanced findings typical of reviews of (often, multiple) complex interventions implemented throughout the developing world. 3ie publishes a series of Systematic
Review Summary reports, which present review evidence structured around the theory of
change, and focus more clearly on how the interventions themselves work and the reasons
underlying heterogeneity in findings. Examples of these summaries are in Table 1 (De Buck
et al. 2017; Kristjansson et al. 2016; Oya et al. 2017; Snilstveit et al. 2016; Stewart et al., 2016;
Waddington and White 2014).
3. Keeping reviews up to date
Systematic reviews need to be kept up to date. The field of impact evaluation is fast expanding,
with new studies produced at a rate of around 250 per year in health, nutrition and population, 100
new studies per year in education, and 50 new studies each in agriculture and social protection
sectors (Cameron, Mishra, and Brown 2016). In contrast, an average systematic review in international development includes 10 to 20 studies, and often fewer. Reviews may become outdated
quickly, requiring regular updates to ensure currency of searches.
The evolution of the MDGs to SDGs is also likely to have caused shifts in priorities, scope,
questions, outcomes and interventions that are of relevance for policy, programming and practice.
The shift towards promoting universal coverage is likely to generate a change in the outcomes of
interest of a review – for example greater interest in the equity of the distribution of outcomes and
gender and equity-responsive sub-group analysis (for example, by sex, gender, age, ethnicity and
so on). In addition, the focus of primary research is likely to shift from first generation evaluation
questions (does intervention work compared to doing nothing or ‘standard practice’?) to second
generation comparative questions about different ways of reaching a particular goal (is intervention a relatively more effective than intervention b?) and implementation (how to most effectively
deliver the intervention?). As well as updating searches, answering these questions may require
new theories of change, eligible evidence, data collection (for example, population sub-groups and
moderator variables) and so on, hence updating the scope of the review.
In addition, we have limited confidence in the findings of many international development
systematic reviews. 3ie’s Repository of Systematic Reviews includes an assessment of the
methodological quality of each review included in the database.4 Only 31 percent of reviews
with information publicly available about methodological quality are rated as having minor
limitations, while 69 percent have some important or major limitations. Reviews are often
based on inadequate searches (for example, limited to published studies, omitting important
international development databases, only searching English language sources), quality assessment (for example, lack of risk of bias analysis or inappropriate approach to critical appraisal)
and synthesis methods (for example, use of statistical significance vote-counting).
Finally, most systematic reviews are undertaken without planned, effective stakeholder engagement. Failure to ensure the demand for and likely usefulness of a review can consign the work to
irrelevancy and limited uptake. Engaging with a range of users from the outset can ensure the
review asks the right questions, and fosters ownership of the evidence in key communities and
decision makers (3ie, 2015). There are different models of stakeholder participation in systematic
reviews depending on the degree of control over the review, the type of engagement whether
individual or group, and the type of dialogue (Rees and Oliver 2012).
In sum, reviews may need to be updated against multiple criteria. We identify four main types of
systematic review update:
● Update search to incorporate new evidence. Outdated reviews or reviews based on ‘old
evidence’, perhaps from policy contexts which are no longer relevant, may provide biased
advice to policy and practice.
● Update scope to ensure reviews answer relevant questions. For example, there may be
greater demand from policy-makers for reviews which answer different evaluation questions
or which pay attention to outcomes or experiences of particular groups (for example,
disadvantaged people). In these cases it would be more efficient to expand existing reviews
on similar topics rather than producing new ones from scratch.
● Update quality to ensure reviews use up-to-date methods. Reviews often omit unpublished
literature, fail to conduct comprehensive critical appraisal of included studies, or use inappropriate methods of synthesis. In addition, methodological developments allow analyses
that were not previously possible, or incorporate new tools or new software.
● Update engagement to improve the user experience of existing reviews and hence uptake by
decision-makers. The objective of systematic reviews is to inform policy and practice and one
component of that is to ensure that reviews are informed by appropriate stakeholder
engagement processes and that findings are presented in user-friendly formats.
There is, currently, no consensus in the literature regarding when and how a systematic review
should be updated. Cochrane recommends updating systematic reviews every two years. However,
the rationale for this time cut-off is not clear, and it is not normally respected – only 38 per cent of
Cochrane reviews are updated within two years of publication, and only 3 per cent of reviews
published in peer-review journals are updated within two years of publication (Moher et al. 2008).
The Campbell Collaboration (2016) allows authors of reviews up to five years after publication of an
original review to undertake an update. 3ie, a major source of systematic reviews, has proposed a
policy in which development reviews (funded by 3ie or others) are ranked according to priority for
the need of update (search, scope, quality or engagement). 3ie plans to publish the list and
advocate for updating reviews with funders of systematic reviews in coordination with other
institutions producing reviews. 3ie also incorporates updates as a means of building research
capacity in its reviews programme (for example, Waddington and Snilstveit 2009).
Technical developments in updates have mostly consisted of methods for incorporating new
evidence from clinical trials into existing meta-analyses. Efforts have focused on producing statistical tools that assess the need for updating and its benefit on existing meta-analyses (Sutton et al.
2009). This type of research is entirely focused on the incorporation of additional effect sizes as new
evidence from primary studies becomes available.
In sectors where primary evidence generation is rapid, reviews of clusters of interventions that
were previously ‘lumped’ together may be ‘split’ by intervention in the update, to incorporate broader
mixed-methods evidence to answer questions about implementation, for example. Researchers have
developed models that update quantitative as well as qualitative components of the review. These
approaches take into account the policy relevance of the review and changes in the policy environments in addition to changes in review methodology. This approach has led to multicomponent
decision tools based on a mix of qualitative and quantitative judgments (Takwoingi et al. 2013).
More radical approaches have also been suggested. These include the production of ‘living
systematic reviews’ whereby reviews are an online document which is persistent, dynamic and
constantly updated as more evidence becomes available (Elliott et al. 2014). Others have proposed
an automated process for updating reviews, whereby the systematic review, or part of it, such as
the search and the meta-analysis, is entirely produced by machines at the press of a button (Tsafnat
et al. 2013).
The first generation of international development systematic reviews had relatively high fixed costs
for various reasons including the need to build capacity in a community of practice. In theory, the costs
of second generation reviews, especially updates, will be lower owing to that capacity being already
built, or in the case of updates, the scope of the review is already well defined by answerable questions,
to enable further capacity to be built more easily.
A standard systematic review is completed within 12–24 months. The process is demanding and
reviews can take a long time to produce findings, quickly becoming outdated in such a way that
they often fail the task of informing policies in a timely manner (Whitty 2015). 3ie’s experience is
that updates can be completed at a fraction of the costs (typically between 3 and 9 months)
depending on the scope of the update. One way to speed up the process of knowledge translation
from systematic searches is the evidence gap map (EGM) (Snilstveit et al. 2013). EGMs summarise
the density and paucity of evidence in an interactive format and have proven incredibly popular
with researchers and development organisations (Phillips et al. 2017). However, EGMs are not a
substitute for systematic reviews since they are not designed to critically appraise or extract policyrelevant findings from primary studies. Rather, they are a way of scoping future review topics, and
provide a more efficient way of communicating primary research gaps than ‘empty reviews’.5
Systematic reviews are produced by large teams of researchers that scan all the relevant
literature and filter and quality appraise the evidence through a process of search, selection and
data collection. Much of this mechanical work can take several months to complete. The process of
producing systematic reviews is becoming more and more demanding as more evidence is
produced and more databases that require searching become available. Hence, much of the
time spent in conducting a systematic review is absorbed by the process of searching, screening
and evaluating the available literature, often using word-recognition devices, with little time left for
activities requiring higher order cognitive functions, such as evaluating and synthesising the
Much research and a number of projects are underway that employ machine learning algorithms
to assist researchers in conducting systematic reviews (O’Mara-Eves et al. 2015; Tsafnat et al. 2014; see
also Snilstveit et al., 2018). In these trials, researchers screen a subset of the population of studies. The
result of the screening process is fed into a machine which develops a rule to include or exclude a
given study based on the information provided by the researchers. This is normally performed by a
logistic regression where the dependent variable is the inclusion-exclusion of the study and the
explanatory variables are words and combinations of words in the studies reviewed. The inclusion rule
is then applied to a new subset of the data and the selection performed by the computer algorithm is
returned to the researchers. The researchers at this point can perform an additional screening on the
results of the search conducted by the computer, that can be fed back again to the machine to
improve and refine the inclusion process at successive trials. In this way, the machine iteratively learns
to include the studies using the criteria followed by the researchers. We hope and expect that the
findings of these studies will suggest substantial savings from these approaches for updates and new
4. Overview of the papers collected in this issue
The papers in this collection serve to discuss new approaches to systematic reviews in international
development as well as present examples of the state of the art in experience in conducting
reviews on development topics. White (2018) argues that the gold standard in systematic reviewing
is not just to follow the conduct and reporting requirements of the Campbell Collaboration and
Cochrane, but also to incorporate programme theory and analysis of evidence along the causal
chain. White argues for the centrality of theory of change analysis in systematic reviews, and
presents approaches to building different types of evidence into mixed-methods effectiveness
reviews including project portfolio information and qualitative studies, and different methods of
presentation including the ‘funnel of attrition’.
Skalidou and Oya (2018) discuss their experiences in incorporating a large amount of qualitative
evidence into an effectiveness review of agricultural certification schemes. Unlike many other
mixed-methods systematic reviews, the study adopted different inclusion criteria and critical
appraisal approaches for different types of evidence, enabling the incorporation of ethnographic
research into the review that otherwise would have been omitted on the grounds of methods
reporting. The authors report on their approach to searching, critical appraisal, data extraction and
synthesis across 136 included qualitative studies. They also comment on the challenges in integrating qualitative and quantitative evidence in mixed-methods reviews where, concluding that
even if evidence is not directly linked it can still illuminate issues of implementation and context
which can help understand heterogeneity in impacts.
The paper by Carr-Hill et al. (2018) serves as an example of a state of the art mixed-methods
review on interventions to decentralise education decision-making. The review combines theory of
change analysis, quantitative meta-analysis and qualitative framework synthesis. The authors
conduct full causal chain evidence synthesis and are able to go some way to explaining the
sources of heterogeneity in findings across low- and middle-income contexts. Parents are less
able to hold schools accountable in contexts of low income and low levels of education, where
they have low status relative to teachers and school managers.
Very few systematic reviews on development topics incorporate evidence on the cost of the
interventions being assessed. Masset et al. (2018) present a systematic review of systematic reviews
of cost-effectiveness studies in low- and middle-income countries. The review examines the
characteristics of the studies and the methods employed and discusses their relevance for decision-making. The paper notes the challenges in aggregating data across economic evaluations, due
to great heterogeneity in outcomes, costs and methods, and argues for greater standardisation in
data collection and reporting.
The contribution of Doocy and Tappis (2018) is to provide, alongside a review of effects,
systematic evidence on costs, cost efficiency, cost-effectiveness and cost benefit for cash transfer
programmes in emergency settings. The study is the first published systematic review on a
humanitarian topic to provide evidence together on effects and costs, to our knowledge. It
provides a model for future studies incorporating evidence on value for money.
Lombardini and McCollum (2018) give an example of the use of systematic review methods to
inform decision-making in a programme organisation, Oxfam GB. The study is a critical appraisal
and meta-analysis of impact evidence on women’s empowerment projects, and presents a fine
example of the value of meta-analysis in combining results from underpowered studies. One can
imagine other examples where it may be difficult to undertake large enough individual studies to
detect statistically significance changes in outcomes, where meta-analysis might also be of great
value (for example, detecting changes in maternal mortality).
Finally, the paper by Stewart (2018), based on her opening address at the 2017 Cochrane
Colloquium in Cape Town, presents the African Evidence Network (AEN), which was established in
2012 and is building a critical mass of researchers, knowledge brokers and decision makers across
Africa. Drawing on the experiences of the AEN, the article presents the case for how networks bridge
divides, build capacity and create readiness for change. A powerful African proverb exemplifies the
work of the group and the coordinating apparatus provided by Cochrane and the Campbell
Collaboration: if you want to go fast, go alone; if you want to go far, go together.
In 2012, an earlier review stated that ‘efforts were being made to conduct more policy-relevant
reviews drawing on a fuller range of evidence’ (White and Waddington 2012, 357). Papers in this
collection should indicate to readers the developments made in systematic reviewing in international
development over the past five years, and the broad range of questions that rigorous systematic
reviews can answer (see also Hansen and Trifković 2015).
1. Meta-analyses in economics have been around since the 1980s (Ioannides, Stanley, and Doucouliagos 2017).
Systematic reviews on public health topics in low- and middle-income countries have been published since at
least the 1990s (for example, Esrey et al. 1991). However, the common use of systematic reviewing in development economics and development studies dates to the 2000s. 3ie and the Department for International
Development launched their first systematic review research funding programmes in 2008 and 2010,
2. The Repository was updated using systematic search methods in 2017. See: http://www.3ieimpact.org/en/
evidence/systematic-reviews (accessed 1 February 2018).
3. https://www.campbellcollaboration.org/better-evidence/plain-language-summaries.html (accessed 1 February
4. Studies are assessed using a checklist designed to evaluate the methods employed in reviews against a set of
common standards for conducting systematic reviews (3ie n.d.). Based on this, reviews are given an overall rating
of overall confidence in conclusions about effects: low confidence reviews are those in which there are major
methodological limitations; medium confidence reviews are those with important limitations; and high confidence reviews are those with minor limitations.
5. The caveat is that the standards of searching undertaken in EGMs are usually not as exhaustive as those for
systematic reviews. For example, sources may be limited to English language or by date; reference snowballing
(citation tracing and bibliographic back-referencing) may not be undertaken.
