Co-ordination of multi-site evaluations: design, support for Paris Declaration Evaluation

advertisement
Co-ordination of multi-site
evaluations: design, support for
execution, QA and synthesis in the
Paris Declaration Evaluation
Bernard Wood and Julia Betts,
Core Evaluation Team PDE
February 2012
Design (1)
Core Team roles: To move from approach paper and other preparatory stages
to an operational design, framework and matrix for the overall
evaluation and country studies.
What worked well?
a.
International governance arrangements, ‘culture’, participation and
support reflected vast experience & global best practice in joint
evaluation.
b.
Phase I lessons could be applied to Phase 2 framework & country
studies.
c.
Identifying the self defined ‘intended outcomes’ of the Paris Declaration,
the implicit ‘programme theory’ and the centrality of context.
d.
Introducing 3-question sequence and contribution analysis as the best
available way to handle the difficult link to development results
e.
Support of the International Management Group and IRG for these
design steps.
f.
Wide participation in fleshing out the evaluation framework and matrix
strengthened ownership and trust by stakeholders.
g.
Designing the synthesis approach from the start and gearing all tools,
analytical processes etc. to the same framework.
Design (2)
Challenges (what did not work well?) and responses
a.
Much progress was needed from the approach paper and earlier
theoretical exploration. All (esp. IMG) tacitly accepted the need for a
fresh start.
b.
Too many evaluation questions: The 11 intended outcomes had
compelling legitimacy, but were still very wide. Participation added yet
more questions. No full solution: Using some big questions for
conclusions helped. The ability to add country-specific questions also
helped contain this problem but not fully overcome it. Resulted in some
spotty coverage & fewer “hard” quantifiable findings. But the parallel
monitoring survey fed the appetite for “hard” indicators, while also
exposing their limits.
c.
Most teams did not focus enough on context chapters to get full value,
esp. in the Busan era. (e.g. on forces beyond ‘aid’, non-traditional
providers). The synthesis pushed contextual discussion to the limits of
the evidence, no resources to supplement fully.
d.
Too much information and candour were expected from country
studies on the performance of traditional and non-traditional donors.
The synthesis pushed to the limits of evidence, including from other solid
sources.
Support to execution
by country and donor teams
Core Team roles: to support teams in applying the framework, solving issues, and
helping safeguard professional independence
What worked well?
a. Regional workshops (good but expensive) and individual video and other
support (mainly request driven). Intranet tool for managing guidance and
information was vital, but not used by all.
b. Standard framework of questions, sub-questions and suggested indicators
and sources
c. Flexible, multi-lingual core team resources
d. Tracking progress at milestones and following up
Challenges (what did not work well?) and responses
a. Staggered starting points (esp. in contracting teams) made support more
expensive and less effective Extra support resources, sessions and follow-up
were added – of some help
b. Donor studies were contracted before the framework was set, support
provisions were unclear. Not overcome, limited support possible to donor
studies, more was needed
c. Different understandings of independence and QA roles. International
scrutiny, some interventions & clarifications of independence (with Sec’t.)
helped
Quality assurance
Core Team roles: As part of overall QA strategy, to assess the quality of draft country
and donor reports & suggest strengthening, then validate and gauge the
reliability of evidence from each final report for the synthesis.
What worked well?
a.
Systematic check by at least two CET members of each main finding for
strength of evidence and conclusions and recommendations for clarity of
argument.
b.
The Emerging Findings workshop, as a forum for transparent focus on quality,
examples of good practice, constructive peer pressure and support
opportunities for lagging cases.
Challenges (what did not work well?) and responses
a.
The number of late and incomplete drafts limited the scope for a solid
emerging findings report and for rigorous overall checks at the Bali workshop.
Used the solid evidence on hand but intensive extra work was needed post-Bali
to extract evidence from final drafts / reports etc.
b.
Some workshop participants focused on their own opinions and experiences
rather than evaluation findings. Listened and reported back faithfully, but
filtered to keep solid evaluation evidence as the base for the synthesis.
c.
Double checks of both drafts and final reports imposed heavy demands in a
very short time. Worked harder and didn’t compromise on rigour
Synthesis process
Core team roles: Systematically assemble and reflect key findings & conclusions
from the body of evaluations and studies and distil policy-relevant overall
findings, conclusions and recommendations, calibrated to the strength of the
synthesized evidence.
What worked well?
a. Assembling validated evidence and following the evaluation framework.
b. Finding the balance - enough detail to reflect key evidence, but focusing on
strategic findings and conclusions.
c. Making the leap to policy-relevant findings, conclusions and
recommendations – requires policy grasp as well as evaluation rigour. Level
and language geared to dissemination and use.
d. Validation process (and the rules applied) for the first draft synthesis, steered
by the Management Group.
Challenges (what did not work well?) and responses
a. Some different expectations for synthesis product: accessibility and policyrelevance vs. detail and methodology. Opted for accessibility with essential
details and a well-signposted technical annex.
b. Uneven engagement by IRG: thus fuller discussion needed at the final
validation workshop, while protecting the agreed process
c. Time pressures, as throughout: work harder.
Some key lessons
1. Campaign for evaluable frameworks of intended outcomes in the up-front
design of programmes and policies, but remain wary of crude and oversimplified indicators. Accept and embrace the need for rigorous
qualitative evaluations of complex realities.
2. Aim for governance arrangements, ‘culture’, participation and support
that reflect experience and best practice in joint evaluation.
3. Keep the working language as clear and non-technocratic as possible,
minimizing jargon – especially, but not only, in multi-lingual and multicultural evaluation processes. Carry this through to reports to maximize
ultimate dissemination and use.
4. Recognize genuinely participatory design and validation as not just
desirable but integral to the necessary ownership of the process and the
ultimate quality and utility of the evaluation. Build in the “careful
planning, structure, execution, and facilitation” implied (MQP).
5. Recruit highly competent teams early to play a major role in design,
together with evaluation managers and stakeholders.
6. (Perhaps) be prepared to impose selectivity, even among vital questions,
in order to have a manageable challenge across the body of cases.
Some key lessons (2)
7.
Prepare for complex and uneven processes in multi-site evaluations but
set and keep the deadlines necessary to maintain momentum and
deliver timely results.
8.
While working to strengthen them, expect uneven capacities and
delivery among varied teams. Be ready to reinforce, but if necessary
abandon or sideline results where they are found weak against
transparent standards. Ensure in advance that an adequate base will
remain after ‘dropouts’ for reasonable overall validity.
9.
Recognize that written component and synthesis reports are only part
of the contribution of the evaluation, alongside benefits from the
process and building a community of shared understanding and trust.
10.
Set and consistently apply rules to protect teams’ independence within
agreed evaluation frameworks and arrangements for quality assurance
and validation.
11.
Be realistic about the candour to be expected in assessments of other
actors’ performance as well as self assessments
12.
Calibrate the strength of particular synthesis findings, conclusions and
recommendations according to the relative strength of evidence in the
body of cases.
Download