RWE PowerPoint - RealWorld Evaluation

Promoting a RealWorld
and Holistic approach to
Impact Evaluation
Designing Evaluations under Budget, Time, Data and Political Constraints
Workshop for staff of MFA of Finland
Helsinki, 27-28 September 2012
Facilitated by
Oumoul Ba Tall and Jim Rugh
Note: This PowerPoint presentation and the summary
(condensed) chapter of the book are available at:
www.RealWorldEvaluation.org
1
Introduction to
Impact Evaluation
(quoting NONIE Guidelines)


“In international development, impact evaluation is principally
concerned with final results of interventions (programs,
projects, policy measures, reforms) on the welfare of
communities, households and individuals.” [p. 3]
“No single method is best for addressing the variety of
questions and aspects that might be part of impact evaluations.
However, depending on the specific questions or objectives of a
given impact evaluation, some methods have a comparative
advantage over others in analyzing a particular question or
objective. Particular methods or perspectives complement
each other in providing a more complete ‘picture’ of impact.” [p.
xxii]
2
Introduction to
Impact Evaluation
(quoting NONIE Guidelines)

Three related problems that quantitative impact
evaluation methods attempt to address are the
following:
•
•
•
The establishment of a counterfactual: What would have
happened in the absence of the intervention(s)?
The elimination of selection effects, leading to differences
between the intervention (treatment) group and the control
group.
A solution for the problem of unobservables: The omission
of one or more unobserved variables, leading to biased
estimates. [p. 23]
3
Introduction to
Impact Evaluation
(quoting NONIE Guidelines)

“The safest way to avoid selection effects is a
randomized selection of the intervention and control
groups before the experiment starts. In a welldesigned and correctly implemented Randomized
Controlled Trial (RCT) a simple comparison of
average outcomes in the two groups can adequately
resolve the attribution problem and yield accurate
estimates of the impact of the intervention on a
variable of interest. By design, the only difference
between the two groups was the intervention.” [p.24]
4
OK, that’s enough of an initial introduction to
the theory of Impact Evaluation.
More reference is provided at the end of this
presentation for further research on IE
Let’s begin to consider some implications of
trying to conduct Impact Evaluations in the
RealWorld.
5
Applying Impact Evaluation to
this Workshop
All of you in this room have chosen to participate in this
workshop. We’ll call the subject matter we cover during this
workshop the intervention. We presume that you are all
adequately qualified to take this course. So here is what we
propose to do: We will issue a pre-test to determine how well
you know this subject matter. Then we will randomly choose
50% of you whom we will identify as our control group. We will
ask you to leave this room and do whatever else you want to do
during the next two days. Then we will ask you to come back at
4:30 pm tomorrow to take the post-test, along with those who
stayed and participated in the rest of this workshop (our
intervention group). Thus we will have a counterfactual against
which to measure any measurable ‘impact’ of what we taught
the workshop
6
Applying Impact Evaluation to
this Workshop
OK, just kidding, to make a point!
7
Applying Impact Evaluation to
this Workshop
What are reasons why attempting to use a
Randomized Control Trial to evaluate the
impact of this workshop might not be a good
idea? …
8
Workshop Objectives
1. The basics of the RealWorld Evaluation approach
for addressing common issues and constraints faced
by evaluators such as: when the evaluator is not
called in until the project is nearly completed and
there was no baseline nor comparison group; or
where the evaluation must be conducted with
inadequate budget and insufficient time; and
where there are political pressures and
expectations for how the evaluation should be
conducted and what the conclusions should say;
9
Workshop Objectives
2.
3.
4.
5.
6.
Defining what impact evaluation should be;
Identifying and assessing various design options
that could be used in a particular evaluation setting;
Ways to reconstruct baseline data when the
evaluation does not begin until the project is well
advanced or completed;
How to account for what would have happened
without the project’s interventions: alternative
counterfactuals
The advantages of using mixed-methods designs
10
Workshop Objectives
Note: In this workshop we focus on projectlevel impact evaluations. There are, of
course, many other purposes, scopes,
evaluands and types of evaluations. Some
of these methods may apply to them, but our
examples will be based on project impact
evaluations, most of them in the context of
developing countries.
11
Workshop agenda Day 1
1.
2.
3.
4.
5.
6.
7.
8.
9.
Introduction [10 minutes]
Brief overview of the RealWorld Evaluation (RWE) approach [30 minutes]
Rapid review of evaluation designs, logic models, tools and techniques for
RealWorld impact evaluations, with an emphasis on impact evaluation (IE);
overview of range of evaluation models for IE compared to other types of
evaluations [75 minutes]
--- short break [20 minutes]--Small group introductions and sharing of RWE types of constraints you have
faced in your own practice, and how they were addressed. [45 minutes]
What to do if there had been no baseline survey: reconstructing a baseline
[30 minutes]
--- lunch [60 minutes] --Small group exercise Part I: read case studies and begin discussions [45
minutes]
Small group exercise Part II: 'Clients’ and ‘Consultants’ re-negotiate the case
study evaluation ToR [45 minutes]
Feedback from exercise [20 minutes]
Wrap-up discussion, evaluation of Day 1, plans for Day 2 [40 minutes]
Workshop agenda Day 2
1.
2.
3.
4.
5.
6.
7.
8.
9.
Review and reflections on topics addressed in Day 1 [25 minutes]
Quantitative, qualitative and mixed methods [30 minutes]
How to account for what would have happened without the project:
alternative counterfactuals [30 minutes]
--- short break [20 minutes]--Considerations for more holistic approaches to impact evaluation [45
minutes]
Check lists for assessing and addressing threats to validity [20 minutes]
Plenary discussion: Practical realities of applying RWE approaches:
challenges and strategies [30 minutes]
--- lunch [60 minutes] --Small-group work applying RealWorld Evaluation methodologies for the
Norad Nepal evaluation case study [120 minutes]
Next steps: how to apply what we’ve learned in the real worlds of our
work [20 minutes]
Workshop evaluation and wrap-up [20 minutes]
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
OVERVIEW OF THE
RWE APPROACH
14
RealWorld Evaluation Scenarios
Scenario 1: Evaluator(s) not brought in until near
end of project
For political, technical or budget reasons:
• There was no life-of-project evaluation plan
• There was no baseline survey
• Project implementers did not collect
adequate data on project participants at the
beginning nor during the life of the project
• It is difficult to collect data on comparable
control groups
15
RealWorld Evaluation Scenarios
Scenario 2: The evaluation team is called in
early in the life of the project
But for budget, political or methodological
reasons:
 The ‘baseline’ was a needs assessment,
not comparable to eventual evaluation
 It was not possible to collect baseline data
on a comparison group
16
Reality Check – Real-World
Challenges to Evaluation
•
•
•
•
•
•
All too often, project designers do not think
evaluatively – evaluation not designed until the
end
There was no baseline – at least not one with data
comparable to evaluation
There was/can be no control/comparison group.
Limited time and resources for evaluation
Clients have prior expectations for what they want
evaluation findings to say
Many stakeholders do not understand evaluation;
distrust the process; or even see it as a threat
(dislike of being judged)
17
RealWorld Evaluation
Quality Control Goals




Achieve maximum possible evaluation rigor
within the limitations of a given context
Identify and control for methodological
weaknesses in the evaluation design
Negotiate with clients trade-offs between
desired rigor and available resources
Presentation of findings must acknowledge
methodological weaknesses and how they
affect generalization to broader populations
18
The Need for the RealWorld
Evaluation Approach
As a result of these kinds of constraints, many of
the basic principles of rigorous impact
evaluation design (comparable pre-test -- post
test design, control group, adequate instrument
development and testing, random sample
selection, control for researcher bias, thorough
documentation of the evaluation methodology
etc.) are often sacrificed.
19
The RealWorld Evaluation Approach
An integrated approach to
ensure acceptable standards
of methodological rigor while
operating under real-world
budget, time, data and
political constraints.
See the RealWorld Evaluation book
or at least condensed summary for more details
20

EDI T I O N
This book addresses the challenges of conducting program evaluations in real-world contexts where
evaluators and their clients face budget and time constraints and where critical data may be missing.
The book is organized around a seven-step model developed by the authors, which has been tested and
refined in workshops and in practice. Vignettes and case studies—representing evaluations from a
variety of geographic regions and sectors—demonstrate adaptive possibilities for small projects with
budgets of a few thousand dollars to large-scale, long-term evaluations of complex programs. The
text incorporates quantitative, qualitative, and mixed-method designs and this Second Edition reflects
important developments in the field over the last five years.
New to the Second Edition:
Adds two new chapters on organizing and managing evaluations, including how to
strengthen capacity and promote the institutionalization of evaluation systems

Includes a new chapter on the evaluation of complex development interventions, with a
number of promising new approaches presented

Incorporates new material, including on ethical standards, debates over the “best”
evaluation designs and how to assess their validity, and the importance of understanding settings

Expands the discussion of program theory, incorporating theory of change, contextual and
process analysis, multi-level logic models, using competing theories, and trajectory analysis

Provides case studies of each of the 19 evaluation designs, showing how they have
been applied in the field

“This book represents a significant achievement. The authors have succeeded in creating a book that
can be used in a wide variety of locations and by a large community of evaluation practitioners.”
RealWorld Evaluation
RealWorld
Evaluation
Bamberger
Rugh
Mabry
2
—Michael D. Niles, Missouri Western State University
“This book is exceptional and unique in the way that it combines foundational knowledge from
social sciences with theory and methods that are specific to evaluation.”
—Gary Miron, Western Michigan University
“The book represents a very good and timely contribution worth having on an evaluator’s shelf,
especially if you work in the international development arena.”
—Thomaz Chianca, independent evaluation consultant, Rio de Janeiro, Brazil
2
EDITIO
N
Working Under Budget, Time,
Data, and Political Constraints
Michael
Bamberger
Jim
Rugh
Linda
Mabry
EDITION
The RealWorld Evaluation
approach

Developed to help evaluation practitioners
and clients
• managers, funding agencies and external
consultants


Still a work in progress (we continue to learn
more through workshops like this)
Originally designed for developing countries,
but equally applicable in industrialized
nations
22
Special Evaluation Challenges in
Developing Countries






Unavailability of needed secondary data
Scarce local evaluation resources
Limited budgets for evaluations
Institutional and political constraints
Lack of an evaluation culture (though
evaluation associations are addressing this)
Many evaluations are designed by and for
external funding agencies and seldom reflect
local and national stakeholder priorities
23
Expectations for « rigorous »
evaluations
Despite these challenges, there is a
growing demand for methodologically
sound evaluations which assess the
impacts, sustainability and replicability of
development projects and programs.
(We’ll be talking more about that later.)
24
Most RealWorld Evaluation tools are not
new— but promote a holistic, integrated
approach


Most of the RealWorld Evaluation data
collection and analysis tools will be familiar to
experienced evaluators.
What we emphasize is an integrated
approach which combines a wide range of
tools adapted to produce the best quality
evaluation under RealWorld constraints.
25
What is Special About the
RealWorld Evaluation Approach?


There is a series of steps, each with
checklists for identifying constraints and
determining how to address them
These steps are summarized on the following
slide and then the more detailed flow-chart
…
26
The Steps of the RealWorld
Evaluation Approach
Step 1: Planning and scoping the evaluation
Step 2: Addressing budget constraints
Step 3: Addressing time constraints
Step 4: Addressing data constraints
Step 5: Addressing political constraints
Step 6: Assessing and addressing the strengths and
weaknesses of the evaluation design
Step 7: Helping clients use the evaluation
27
The Real-World Evaluation Approach
Step 1: Planning and scoping the evaluation
A. Defining client information needs and understanding the political context
B. Defining the program theory model
C. Identifying time, budget, data and political constraints to be addressed by the RWE
D. Selecting the design that best addresses client needs within the RWE constraints
Step 2
Addressing budget
constraints
A. Modify evaluation design
B. Rationalize data needs
C. Look for reliable secondary
data
D. Revise sample design
E. Economical data collection
methods
Step 3
Addressing time constraints
All Step 2 tools plus:
F. Commissioning preparatory
studies
G. Hire more resource persons
H. Revising format of project
records to include critical data for
impact analysis.
I. Modern data collection and
analysis technology
Step 6
Assessing and addressing the strengths and weaknesses of the
evaluation design
An integrated checklist for multi-method designs
A. Objectivity/confirmability
B. Replicability/dependability
C. Internal validity/credibility/authenticity
D. External validity/transferability/fittingness
Step 4
Addressing data constraints
A. Reconstructing baseline data
B. Recreating comparison
groups
C. Working with non-equivalent
comparison groups
D. Collecting data on sensitive
topics or from difficult to reach
groups
E. Multiple methods
Step 5
Addressing political
influences
A. Accommodating pressures
from funding agencies or
clients on evaluation design.
B. Addressing stakeholder
methodological preferences.
C. Recognizing influence of
professional research
paradigms.
Step 7
Helping clients use the evaluation
A. Utilization
B. Application
C. Orientation
D. Action
28
We will not have time in this
workshop to cover all those steps
We will focus on:
Scoping the evaluation
Evaluation designs
Logic models
Reconstructing baselines
Mixed methods
Alternative counterfactuals
Realistic, holistic impact evaluation
Challenges and Strategies
29
Planning and Scoping the Evaluation



Understanding client information needs
Defining the program theory model
Preliminary identification of constraints to
be addressed by the RealWorld
Evaluation
30
Understanding client information
needs
Typical questions clients want answered:
 Is the project achieving its objectives?
 Is it having desired impact?
 Are all sectors of the target population
benefiting?
 Will the results be sustainable?
 Which contextual factors determine the
degree of success or failure?
31
Understanding client information
needs
A full understanding of client information
needs can often reduce the types of
information collected and the level of
detail and rigor necessary.
However, this understanding could also
increase the amount of information
required!
32
Still part of scoping: Other questions
to answer as you customize an
evaluation Terms of Reference (ToR):
1.
2.
3.
4.
Who asked for the evaluation? (Who are
the key stakeholders)?
What are the key questions to be
answered?
Will this be mostly a developmental,
formative or summative evaluation?
Will there be a next phase, or other
projects designed based on the findings of
this evaluation?
33
Other questions to answer as
you customize an evaluation
ToR:
5.
6.
7.
8.
9.
What decisions will be made in response
to the findings of this evaluation?
What is the appropriate level of rigor?
What is the scope / scale of the
evaluation / evaluand (thing to be
evaluated)?
How much time will be needed /
available?
What financial resources are needed /
available?
34
Other questions to answer as
you customize an evaluation
ToR:
10.
11.
12.
13.
14.
15.
Should the evaluation rely mainly on
quantitative or qualitative methods?
Should participatory methods be used?
Can / should there be a household
survey?
Who should be interviewed?
Who should be involved in planning /
implementing the evaluation?
What are the most appropriate forms of
communicating the findings to different
stakeholder audiences?
35
Evaluation (research) design?
Key questions?
Evaluand (what to evaluate)?
Qualitative?
Quantitative?
Scope?
Appropriate level of rigor?
Resources available?
Time available?
Skills available?
Participatory?
Extractive?
Evaluation FOR whom?
Does this help, or just confuse things more? Who
said evaluations (like life) would be easy?!! 36
Before we return to
the RealWorld steps,
let’s gain a
perspective on levels
of rigor, and what a
life-of-project
evaluation plan could
look like
37
Different levels of rigor
depends on source of evidence; level of confidence; use of information
Objective, high precision – but requiring more time & expense
Level 5: A very thorough research project is undertaken to conduct indepth analysis of situation; P= +/- 1%
Book published!
Level 4: Good sampling and data collection methods used to gather data
that is representative of target population; P= +/- 5% Decision maker reads
full report
Level 3: A rapid survey is conducted on a convenient sample of
participants; P= +/- 10% Decision maker reads 10-page summary of report
Level 2: A fairly good mix of people are asked their perspectives about
project; P= +/- 25% Decision maker reads at least executive summary of report
Level 1: A few people are asked their perspectives about project;
P= +/- 40% Decision made in a few minutes
Level 0: Decision-maker’s impressions based on anecdotes and sound
bytes heard during brief encounters (hallway gossip), mostly intuition;
Level of confidence +/- 50%; Decision made in a few seconds
Quick & cheap – but subjective, sloppy
38
CONDUCTING AN EVALUATION IS
LIKE LAYING A PIPELINE
QUALITY OF INFORMATION GENERATED BY AN EVALUATION
DEPENDS UPON LEVEL OF RIGOR OF ALL COMPONENTS
AMOUNT OF “FLOW” (QUALITY) OF INFORMATION IS LIMITED TO
THE SMALLEST COMPONENT OF THE SURVEY “PIPELINE”
Determining appropriate levels of precision for
events in a life-of-project evaluation plan
High rigor
Same level of rigor
4
Final
evaluation
Baseline
study
Mid-term
evaluation
3
Needs
assessment
Special
Study
Annual
self-evaluation
2
Low rigor
Time during project life cycle
41
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
EVALUATION
DESIGNS
42
So what should be included in a
“rigorous impact evaluation”?
1.
Direct cause-effect relationship between one output (or a
very limited number of outputs) and an outcome that can
be measured by the end of the research project?  Pretty
clear attribution.
… OR …
2.
Changes in higher-level indicators of sustainable
improvement in the quality of life of people, e.g. the MDGs
(Millennium Development Goals)?  More significant but
much more difficult to assess direct attribution.
43
So what should be included in a
“rigorous impact evaluation”?
OECD-DAC (2002: 24) defines impact as “the positive and
negative, primary and secondary long-term effects
produced by a development intervention, directly or
indirectly, intended or unintended. These effects can be
economic, sociocultural, institutional, environmental,
technological or of other types”.
Does it mention or imply direct attribution? Or point to the
need for counterfactuals or Randomized Control Trials
(RCTs)?
44
Some of the purposes for program evaluation





Formative: learning and improvement including early
identification of possible problems
Knowledge generation: identify cause-effect correlations
and generic principles about effectiveness.
Accountability: to demonstrate that resources are used
efficiently to attain desired results
Summative judgment: to determine value and future of
program
Developmental evaluation: adaptation in complex,
emergent and dynamic conditions
-- Michael Quinn Patton, Utilization-Focused Evaluation, 4th edition, pages 139-140
45
Determining appropriate (and
feasible) evaluation design
Based on the main purpose for
conducting an evaluation, an
understanding of client information
needs, required level of rigor, and what
is possible given the constraints, the
evaluator and client need to determine
what evaluation design is required and
possible under the circumstances.
46
Some of the considerations
pertaining to evaluation design
1: When evaluation events take place
(baseline, midterm, endline)
2. Review different evaluation designs
(experimental, quasi-experimental, other)
3. Levels of rigor
4. Qualitative & quantitative methods
5. A life-of-project evaluation design
perspective.
47
An introduction to various evaluation designs
Illustrating the need for quasi-experimental
longitudinal time series evaluation design
Project participants
Comparison group
baseline
scale of major impact indicator
end of project
evaluation
post project
evaluation
48
OK, let’s stop the action to
identify each of the major
types of evaluation (research)
design …
… one at a time, beginning with the
most rigorous design.
49
First of all: the key to the traditional symbols:




X = Intervention (treatment), I.e. what the
project does in a community
O = Observation event (e.g. baseline, mid-term
evaluation, end-of-project evaluation)
P (top row): Project participants
C (bottom row): Comparison (control) group
Note: the 7 RWE evaluation designs are laid out on page 8 of the
Condensed Overview of the RealWorld Evaluation book
50
Design #1: Longitudinal Quasi-experimental
P1
X
C1
P2
X
C2
P3
P4
C3
C4
Project participants
Comparison group
baseline
midterm
end of project
evaluation
post project
evaluation
51
Design #2: Quasi-experimental (pre+post, with comparison)
P1
X
P2
C1
C2
Project participants
Comparison group
baseline
end of project
evaluation
52
Design #2+: Randomized Control Trial
P1
X
P2
C1
C2
Project participants
Research subjects
randomly assigned
either to project or
control group.
Control group
baseline
end of project
evaluation
53
Design #3: Truncated Longitudinal
X
P1
X
C1
P2
C2
Project participants
Comparison group
midterm
end of project
evaluation
54
Design #4: Pre+post of project; post-only comparison
P1
X
P2
C
Project participants
Comparison group
baseline
end of project
evaluation
55
Design #5: Post-test only of project and comparison
X
P
C
Project participants
Comparison group
end of project
evaluation
56
Design #6: Pre+post of project; no comparison
P1
X
P2
Project participants
baseline
end of project
evaluation
57
Design #7: Post-test only of project participants
X
P
Project participants
end of project
evaluation
58
See Table 2.2 on page 8 of Condensed Overview of RWE
D
e
s
i
g
n
T4
cont.)
(endline)
(ex-post)
X
P3
C3
P4
C4
X
P2
C2
X
P2
C2
X
X
P2
C2
X
X
P1
C1
X
X
P2
X
X
P1
X
(baseline)
(intervention)
1
P1
C1
X
2
P1
C1
X
3
4
X
P1
5
6
7
P1
T2
X
T3
T1
(midterm)
P2
C2
P1
C1
(intervention,
59
“Non-Experimental” Designs
[NEDs]


NEDs are impact evaluation designs that
do not include a matched comparison
group
Outcomes and impacts assessed without
a conventional statistical counterfactual
to address the question
•
“what would have been the situation of the
target population if the project had not taken
place?”
60
Situations in which an NED may
be the best design option








Not possible to define a comparison group
When the project involves complex processes of
behavioral change
Complex, evolving contexts
Outcomes not known in advance
Many outcomes are qualitative
Projects operate in different local settings
When it is important to study implementation
Project evolves slowly over a long period of time
61
Some potentially strong NEDs
A.
B.
C.
D.
E.
F.
G.
Interrupted time series
Single case evaluation designs
Longitudinal designs
Mixed method case study designs
Analysis of causality through program
theory models
Concept mapping
Contribution Analysis
62
Any questions?
63
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
LOGIC
MODELS
64
Defining the program theory
model
All programs are based on a set of assumptions
(hypothesis) about how the project’s
interventions should lead to desired outcomes.
 Sometimes this is clearly spelled out in project
documents.
 Sometimes it is only implicit and the evaluator
needs to help stakeholders articulate the
hypothesis through a logic model.
65
Defining the program theory
model

Defining and testing critical assumptions
are essential (but often ignored)
elements of program theory models.

The following is an example of a model
to assess the impacts of microcredit on
women’s social and economic
empowerment
66
Critical logic chain hypothesis for a
Gender-Inclusive Micro-Credit Program




Sustainability
• Structural changes will lead to long-term impacts.
Medium/long-term impacts
• Increased women’s economic and social empowerment.
• Economic and social welfare of women and their families will
improve.
Short-term outcomes
• If women obtain loans they will start income-generating activities.
• Women will be able to control the use of loans and reimburse them.
Outputs
• If credit is available women will be willing and able to obtain loans
and technical assistance.
67
Example of threat to internal
validity: The assumed causal model
Increases women’s
income
Women join the village
bank where they
receive loans, learn
skills and gain
self-confidence
WHICH ………
Increases women’s
control over
household resources
WHICH …
An alternative causal model
Women’s income and
control over
household resources
increased as a
combined result of
literacy, selfconfidence and loans
Some women
had
previously taken
literacy training
which increased
their selfconfidence and
work skills
Women who had taken
literacy training are
more likely to join
the village bank.
Their literacy and selfconfidence makes
them more effective
entrepreneurs
Consequences
Consequences
Consequences
PROBLEM
PRIMARY
CAUSE 1
Secondary
cause 2.1
Tertiary
cause 2.2.1
PRIMARY
CAUSE 2
Secondary
cause 2.2
Tertiary
cause 2.2.2
PRIMARY
CAUSE 3
Secondary
cause 2.3
Tertiary
cause 2.2.3
Consequences
Consequences
Consequences
DESIRED IMPACT
OUTCOME
1
OUTPUT 2.1
OUTCOME
2
OUTPUT 2.2
OUTCOME
3
OUTPUT 2.3
Intervention
Intervention
Intervention
2.2.1
2.2.2
2.2.3
Reduction in poverty
Women empowered
Women in
leadership roles
Improved
educational
policies
Parents
persuaded to
send girls to
school
Young women
educated
Economic
opportunities
for women
Female
enrollment rates
increase
Curriculum
improved
Schools
built
School system
hires and pays
teachers
To have synergy and achieve impact all of these need to address
the same target population.
Program Goal: Young
women educated
Advocacy
Project
Goal:
Improved
educational
policies
enacted
ASSUMPTION
(that others will do this)
Construction
Project Goal:
More
classrooms
built
Teacher
Education
Project
Goal:
Improve
quality of
curriculum
OUR project
PARTNER will do this
Program goal at impact level
What does it take to measure
indicators at each level?
Impact :Population-based survey
(baseline, endline evaluation)
Outcome: Change in behavior of participants
(can be surveyed annually)
Output: Measured and reported by project staff (annually)
Activities: On-going (monitoring of interventions)
Inputs: On-going (financial accounts)
We need to recognize which evaluative
process is most appropriate for
measurement at various levels
• Impact
• Outcomes
• Output
• Activities
• Inputs
IMPACT EVALUATION
PROJECT EVALUATION
PERFORMANCE MONITORING
One form of Program Theory (Logic) Model
Economic context
in which the
project operates
Design
Inputs
Institutional and
operational
context
Political context in
which the project
operates
Implementation
Process
Outputs
Outcomes
Impacts
Sustainability
Socio-economic and cultural characteristics
of the affected populations
Note: The orange boxes are included in conventional Program Theory Models. The
addition of the blue boxes provides the recommended more complete analysis.
76
77
Education Intervention Logic
Output
Clusters
Institutional
Management
Specific
Impact
Outcomes
Better
Allocation of
Educational
Resources
Increased
Affordability of
Education
Quality of
Education
Economic
Growth
Skills and
Learning
Enhancement
MDG 2
Equitable
Access to
Education
Improved
Participation in
Society
Education
Facilities
MDG 3
Poverty
Reduction
MDG 1
Social
Development
MDG 2
Health
Global
Impacts
Improved Family
Planning &
Health Awareness
Curricula &
Teaching
Materials
Teacher
Recruitment
& Training
Intermediate
Impacts
Greater Income
Opportunities
Optimal
Employment
Source: OECD/DAC Network on Development Evaluation
Expanding the results chain for multi-donor, multicomponent program
Impacts
Intermediate
outcomes
Outputs
Inputs
Increased
rural H/H
income
Increased
production
Credit for
small
farmers
Donor
Increased
political
participation
Access to offfarm
employment
Rural
roads
Government
Improved
education
performance
Increased school
enrolment
Improved
health
Increased use of
health services
Schools
Health
services
Other donors
Attribution gets very difficult! Consider plausible contributions each makes.
TIME FOR SMALL
GROUP DISCUSSION
80
1. Self-introductions
2. What constraints of
these types have you
faced in your evaluation
practice?
3. How did you cope with
them?
81
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Where there was
no baseline
Ways to reconstruct baseline
conditions
A.
B.
C.
D.
E.
Secondary data
Project records
Recall
Key informants
Participatory methods
83
Ways to reconstruct baseline
conditions
E.
PRA (Participatory Rapid Appraisal)
and PLA (Participatory Learning and
Action) and other participatory
techniques such as timelines and
critical incidents to help establish the
chronology of important changes in the
community
84
Assessing the utility of potential
secondary data






Reference period
Population coverage
Inclusion of required indicators
Completeness
Accuracy
Free from bias
85
Examples of secondary data to
reconstruct baselines






Census
Other surveys by government agencies
Special studies by NGOs, donors
University research studies
Mass media (newspapers, radio, TV)
External trend data that might have been
monitored by implementing agency
86
Using internal project records
Types of data
 Feasibility/planning studies
 Application/registration forms
 Supervision reports
 Management Information System (MIS) data
 Meeting reports
 Community and agency meeting minutes
 Progress reports
 Construction, training and other
implementation records, including costs
87
Assessing the reliability of
project records




Who collected the data and for what
purpose?
Were they collected for record-keeping or to
influence policymakers or other groups?
Do monitoring data only refer to project
activities or do they also cover changes in
outcomes?
Were the data intended exclusively for
internal use? For use by a restricted group?
Or for public use?
88
Typical kinds of information for
which we try to reconstruct
baseline data








School attendance and time/cost of travel
Sickness/use of health facilities
Income and expenditures
Community/individual knowledge and skills
Social cohesion/conflict
Water usage/quality/cost
Periods of stress
Travel patterns
89
Limitations of recall




Generally not reliable for precise
quantitative data
Sample selection bias
Deliberate or unintentional distortion
Few empirical studies (except on
expenditure) to help adjust estimates
90
Sources of bias in recall






Who provides the information
Under-estimation of small and routine expenditures
“Telescoping” of recall concerning major expenditures
Distortion to conform to accepted behavior:
•
•
•
Intentional or unconscious
Romanticizing the past
Exaggerating (e.g. “We had nothing before this project came!”)
Contextual factors:
•
•
Time intervals used in question
Respondents expectations of what interviewer wants to
know
Implications for the interview protocol
91
Improving the validity of recall




Conduct small studies to compare recall
with survey or other findings.
Ensure all relevant groups interviewed
Triangulation
Link recall to important reference events
• Elections
• Drought/flood/tsunami/war/displacement
• Construction of road, school etc
92
Key informants


Not just officials and high status people
Everyone can be a key informant on
their own situation:
• Single mothers
• Factory workers
• Users of public transport
• Sex workers
• Street children
93
Guidelines for key-informant
analysis





Triangulation greatly enhances validity
and understanding
Include informants with different
experiences and perspectives
Understand how each informant fits into
the picture
Employ multiple rounds if necessary
Carefully manage ethical issues
94
PRA and related participatory
techniques



PRA (Participatory Rapid Appraisal) and PLA
(Participatory Learning and Action)
techniques collect data at the group or
community [rather than individual] level
Can either seek to identify consensus or
identify different perspectives
Risk of bias:
• If only certain sectors of the community
•
participate
If certain people dominate the discussion
95
Summary of issues in baseline
reconstruction





Variations in reliability of recall
Memory distortion
Secondary data not easy to use
Secondary data incomplete or unreliable
Key informants may distort the past
96
Enough of our
presentations: it’s time
for you (THE
RealWorld PEOPLE!)
to get involved
yourselves.
Time for smallgroup work. Read
your case studies
and begin your
discussions.
99
Small group case study work
1.
2.
3.
Some of you are playing the role of
evaluation consultants, others are
clients commissioning the evaluation.
Decide what your group will propose
to do to address the given constraints/
challenges.
Prepare to negotiate the ToR with the
other group (later this afternoon).
The purpose of this exercise to gain some practical ‘feel’ for applying what
we’ve learned about RealWorld Evaluation.
Group A (consultants) Evaluation team to consider how they will propose a
revised evaluation design and plan that reduces the budget by 25% to 50%,
and yet meets the needs of both clients (City Housing Department and the
international donor).
Group B (clients) will also review the initial proposal in light of what they have
learned about RealWorld Evaluation, and prepare to re-negotiate the plans
with the consultancy group. Note: there are two types of clients: the Housing
Department (project implementers) and the international donor (foundation).
Groups are given 45 minutes to prepare their cases.
Later the Consultants’ group will meet with Clients’ groups to negotiate their
proposed revisions in the plans for this evaluation. 45 minutes will be available
for those negotiation sessions.
Time for consultancy
teams to meet with
clients to negotiate the
revised ToRs for the
evaluation of the
housing project.
103
What did you learn about
RealWorld Evaluation by
getting involved in that
case study and roleplaying exercise?
104
Wrap-up of Day One
1.
2.
How are you feeling about this
workshop at this point?
Tired?
105
Wrap-up of Day One
1.
2.
How are you feeling about this
workshop at this point?
Energized?!!
106
Wrap-up of Day One
2.
3.
What would you want us to do
differently tomorrow?
Please do read the Nepal case study as
we will be using it as our practical
exercise tomorrow.
107
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Reflections on
Day One
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Mixed-method
evaluations
Quantitative data collection methods





Structured surveys (household, farm,
transport usage, etc)
Structured observation
Anthropometric methods
Aptitude and behavioral tests
Indicators that can be counted
110
Quantitative data collection methods
Strengths and weaknesses
Strengths






Generalization
statistically representative
Estimate magnitude and
distribution of impacts
Clear documentation of
methods
Standardized approach
Statistical control of bias and
external factors
Weaknesses






Surveys cannot capture many
types of information
Do not work for difficult to reach
groups
No analysis of context
Survey situation may alienate
Long delay in obtaining results
Data reduction loses information
111
Qualitative data collection methods
Interviewing







Structured
Semi-structured
Unstructured
Focus groups
Community
interviews
PRA
Audio recording
Analysis of
Documents and
Artifacts
Observation




Participant observation
Structured observation
Unstructured observation
Photography and video
recording




Project documents
Published reports
E-mail
Legal documents:
•
•
•
birth and death certificates,
property transfer documents
marriage certificates
 Posters
 Decorations in the house
 Clothing and gang insignia
112
Qualitative data collection methods
Characteristics






The researcher’s perspective is an integral part of
what is recorded about the social world
Scientific detachment is not possible
Meanings given to social phenomena and situations
must be understood
Programs cannot be studied independently of their
context
Difficult to define clear cause and effect
Change must be studied holistically
113
Using Qualitative methods to improve
the Evaluation design and results
 Use recall to reconstruct the pre-test situation
 Interview key informants to identify other changes in the
community or in gender relations
 Conduct interviews or focus groups with women and
men to
•
•
assess the effect of loans on gender relations within the
household, such as changes in control of resources and
decision-making
identify other important results or unintended consequences:
• increase in women’s work load,
• increase in incidence of gender-based or domestic violence
114
It should NOT be a fight between pure
QUALITATIVE
(verbiage alone)
Quantoid!
OR
QUANTITATIVE
(numbers alone)
Qualoid!
115
“Your numbers
“Your human
look impressive,
interest story
but let me tell
sounds nice, but
you the human
let me show you
interest story.”
the statistics.”
116
What’s needed is the right combination of
BOTH QUALITATIVE methods
AND QUANTITATIVE methods
117
Mixed method evaluation designs

Combine the strengths of both QUANT and QUAL
approaches

One approach ( QUANT or QUAL) is often
dominant and the other complements it

Can have both approaches equal but harder to
design and manage.

Can be used sequentially or concurrently
118
Mixed method evaluation designs
How quantitative and qualitative methods
complement each other
A. Broaden the conceptual framework
• Combining theories from different disciplines:
• Exploratory QUAL studies can help define framework
B. Combine generalizability with depth and context
• Random [QUANT] subject selection ensures representativity and
•
generalizability
[QUAL] Case studies, focus groups etc. can help understand the
characteristics of the different groups selected in the sample
C. Permit access to difficult to reach groups [QUAL]
• PRA, focus groups, case studies etc. can be effective ways to reach women,
•
ethnic minorities and other vulnerable groups
Direct observation can provide information on groups difficult to interview.
For example, informal sector and illegal economic activities
D. Enable Process analysis [QUAL]
• Observation, focus groups and informal conversations are more effective for
understanding group processes or interaction between people and public
agencies, and studying the organization
119
Mixed method evaluation designs
How quantitative and qualitative methods
complement each other (cont.)
E.
Analysis and control for underlying structural factors [QUANT]
•
•
Sampling and statistical analysis can avoid misleading conclusions
Propensity scores and multivariate analysis can statistically control for
differences between project and control groups
Example:
•
•
Meetings with women may suggest gender biases in local firms’ hiring
practices; however,
Using statistical analysis to control for years of education or experience
may show there are no differences in hiring policies for workers with
comparable qualifications
Example:
•
•
Participants who volunteer to attend a focus group may be strongly in
favor or opposed to a certain project, but
A rapid sample survey may show that most community residents have
different views
120
3. Mixed method evaluation designs
How quantitative and qualitative methods
complement each other (cont.)
F. Triangulation and consistency checks
•
•
Direct observation may identify inconsistencies in interview responses
Examples:
•
•
A family may say they are poor but observation shows they have new
furniture, good clothes etc.
A woman may say she has no source of income, but an early morning visit
may show she operates an illegal beer brewing business
G. Broadening the interpretation of findings:
•
•
Combining personal experience with “social facts”
Statistical analysis frequently includes unexpected or interesting
findings which cannot be explained through the statistics. Rapid
follow-up visits may help explain the findings
121
Mixed method evaluation designs
How quantitative and qualitative methods
complement each other (cont.)
H. Interpreting findings
Example:
• A QUANT survey of community water management in
Indonesia found that with only one exception all village water
supply was managed by women
• Follow-up [QUAL] visits found that in the one exceptional
village women managed a very profitable dairy farming
business – so men were willing to manage water to allow
women time to produce and sell dairy produce
Source: Brown (2000)
122
Determining appropriate precision and mix of multiple methods
Nutritional
measurements
Nutritional
measurements
HH
surveys
Focus
Groups
HH
surveys
Focus
Groups
Key
Informant
interviews
Large
group
Low rigor, questionable quality, quick and cheap
Participatory --- Qualitative
Extractive --- Quantitative
High rigor, high quality, more time & expense
Participatory approaches should be
used as much as possible
but even they should be used with appropriate
rigor: how many (and which) people’s
perspectives contributed to the story? 124
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Determining
Counterfactuals
Attribution and counterfactuals
How do we know if the observed changes in
the project participants or communities
•
income, health, attitudes, school attendance. etc
are due to the implementation of the project
•
credit, water supply, transport vouchers, school
construction, etc
or to other unrelated factors?
•
changes in the economy, demographic movements,
other development programs, etc
126
The Counterfactual

What change would have occurred in
the relevant condition of the target
population if there had been no
intervention by this project?
127
Where is the counterfactual?
After families had been living
in a new housing project for
3 years, a study found
average household income
had increased by an 50%
Does this show that housing is
an effective way to raise
income?
128
I n c o m e
Comparing the project with two
possible comparison groups
Project group. 50% increase
750
Scenario 2. 50% increase in
comparison group income. No
evidence of project impact
500
Scenario 1. No increase in
comparison group income.
Potential evidence of project
impact
250
2004
2009
Control group and comparison group


Control group = randomized allocation of
subjects to project and non-treatment group
Comparison group = separate procedure for
sampling project and non-treatment groups
that are as similar as possible in all aspects
except the treatment (intervention)
130
IE Designs: Experimental Designs
Randomized Control Trials
Eligible individuals, communities, schools
etc are randomly assigned to either:
• The project group (that receives the services)
or
• The control group (that does not have access
to the project services)
131
Primary Outcome
A graphical illustration of an ‘ideal’
counterfactual using pre-project
trend line then RCT
Subjects randomly
assigned either to …
Intervention
IMPACT
Impact
Treatment group
Control group
Time
132
There are other methods for
assessing the counterfactual



Reliable secondary data that depicts
relevant trends in the population
Longitudinal monitoring data (if it includes
non-reached population)
Qualitative methods to obtain perspectives
of key informants, participants, neighbors,
etc.
133
Ways to reconstruct comparison
groups



Judgmental matching of communities
When there is phased introduction of
project services beneficiaries entering in
later phases can be used as “pipeline”
comparison groups
Internal controls when different subjects
receive different combinations and levels
of services
134
Using propensity scores and
other methods to strengthen
comparison groups


Propensity score matching
Rapid assessment studies can compare
characteristics of project and comparison
groups using:
•
•
•
•
•
Observation
Key informants
Focus groups
Secondary data
Aerial photos and GIS data
135
Issues in reconstructing
comparison groups





Project areas often selected purposively and
difficult to match
Differences between project and comparison
groups - difficult to assess whether outcomes
were due to project interventions or to these
initial differences
Lack of good data to select comparison
groups
Contamination (good ideas tend to spread!)
Econometric methods cannot fully adjust for
initial differences between the groups
[unobservables]
136
What experiences have you had with
identifying counterfactual data?
137
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
More holistic
approaches to
Impact Evaluation
139
Let’s talk about the
challenges of conducting
impact evaluations in the
real world.
140
Some recent developments in
impact evaluation in development
2003
2006
J-PAL is best understood as a network of
affiliated researchers … united by their use of
the randomized trial methodology…
2008
Impact Evaluation for
Improving Development –
3ie and AfrEA conference in
Cairo March 2009
2012 Stern, E., N. Stame, J. Mayne, K.
Forss, R. Davies and B. Befani.
Broadening the Range of Designs
and Methods for Impact Evaluations
DFID Working Paper 38
2009
141
So, are we saying that Randomized Control Trials
(RCTs) are the Gold Standard and should be used
in most if not all program impact evaluations?
Yes or no?
Why or why not?
If so, under what circumstances
should they be used?
If not, under what circumstances
would they not be appropriate?
142
Different lenses needed for different
situations in the RealWorld
Simple
Complicated
Following a recipe
Sending a rocket to the Raising a child
moon
Recipes are tested to
assure easy replication
Sending one rocket to
the moon increases
assurance that the next
will also be a success
The best recipes give
There is a high degree
good results every time of certainty of outcome
Complex
Raising one child
provides experience
but is no guarantee of
success with the next
Uncertainty of outcome
remains
Sources: Westley et al (2006) and Stacey (2007), cited in Patton 2008;
also presented by Patricia Rodgers at Cairo impact conference 2009.
143
Evidence-based policy for simple interventions (or
simple aspects): when RCTs may be appropriate
Question needed for evidence-based
policy 
What works?
What interventions look like 
Discrete, standardized intervention
Logic model 
Simple, direct cause  effect
How interventions work 
Pretty much the same everywhere
Process needed for evidence
uptake 
Knowledge transfer
Adapted from Patricia Rogers, RMIT University
144
When might rigorous evaluations of higherlevel “impact” indicators not be needed?



Complicated, complex programs where there are multiple
interventions by multiple actors
Projects working in evolving contexts (e.g. conflicts, natural
disasters)
Projects with multiple layered logic models, or unclear
cause-effect relationships between outputs and higher level
“vision statements” (as is often the case in the RealWorld of
international development projects)
145
When might rigorous evaluations of higherlevel “impact” indicators not be needed?

An approach evaluators might take is that if the correlation
between intermediary effects (outcomes) and higher-level
impact has been adequately established though research
and previous evaluations, then assessing intermediary
outcome-level indicators might suffice, as long as the
contexts (internal and external conditions) can be
shown to be sufficiently similar to where such causeeffect correlations have been tested.
146
Examples of cause-effect correlations
that are generally accepted
• Vaccinating young children with a standard
set of vaccinations at prescribed ages leads to
reduction of childhood diseases (means of
verification involves viewing children’s health charts,
not just total quantity of vaccines delivered to clinic)
•
Other examples … ?
147
But look at examples of what kinds of interventions
have been “rigorously tested” using RCTs
• Conditional cash transfers
• The use of visual aids in Kenyan schools
• Deworming children (as if that’s all that’s needed
to enable them to get a good education)
• Note that this kind of research is based on
the quest for Silver Bullets – simple, most
cost-effective solutions to complex problems.
148
“Far better an approximate answer to
the right question, which is often vague,
than an exact answer to the wrong
question, which can always be made
precise.“
J. W. Tukey (1962, page 13), "The future of data analysis".
Annals of Mathematical Statistics 33(1), pp. 1-67.
Quoted by Patricia Rogers, RMIT University
149
“An expert is someone who knows
more and more about less and less
until he knows absolutely everything
about nothing at all.”*
*Quoted by a friend; also available at www.murphys-laws.com 150
Is that what we call “scientific method”?
There is much more to impact, to rigor,
and to “the scientific method” than
RCTs. Serious impact evaluations
require a more holistic approach.
151
Consequences
Consequences
Consequences
DESIRED IMPACT
OUTCOME
1
OUTCOME
2
OUTCOME
3
A more comprehensive design
OUTPUT 2.1
OUTPUT 2.2
OUTPUT 2.3
A Simple RCT
Intervention
Intervention
Intervention
2.2.1
2.2.2
2.2.3
There can be validity problems
with RCTs


Internal validity
Quality issues – poor measurement, poor adherence to
randomisation, inadequate statistical power, ignored differential
effects, inappropriate comparisons, fishing for statistical
significance, differential attrition between control and treatment
groups, treatment leakage, unplanned cross-over, unidentified poor
quality implementation
Other issues - random error, contamination from other sources,
need for a complete causal package, lack of blinding.
External validity
Effectiveness in real world practice, transferability to new situations
Patricia Rogers, RMIT University
153
The limited use of strong
evaluation designs

In the RealWorld (at least of international
development programs) we estimate that:
• fewer than 5%-10% of project impact
evaluations use a strong experimental or even
quasi-experimental designs
• significantly less than 5% use randomized
control trials (‘pure’ experimental design)
154
What kinds of evaluation designs are
actually used in the real world of
international development? Findings from
meta-evaluations of 336 evaluation reports
of an INGO.
Post-test only
59%
Before-and-after
25%
With-and-without
15%
Other
counterfactual
1%
Rigorous impact evaluation should
include (but is not limited to):
1) thorough consultation with and
involvement by a variety of stakeholders,
2) articulating a comprehensive logic model
that includes relevant external influences,
3) getting agreement on desirable ‘impact
level’ goals and indicators,
4) adapting evaluation design as well as data
collection and analysis methodologies to
respond to the questions being asked, …
Rigorous impact evaluation should
include (but is not limited to):
5) adequately monitoring and
documenting the process throughout the
life of the program being evaluated,
6) using an appropriate combination of
methods to triangulate evidence being
collected,
7) being sufficiently flexible to account
for evolving contexts, …
Rigorous impact evaluation should
include (but is not limited to):
8) using a variety of ways to determine
the counterfactual,
9) estimating the potential sustainability
of whatever changes have been
observed,
10) communicating the findings to
different audiences in useful ways,
11) etc. …
The point is that the list of
what’s required for ‘rigorous’
impact evaluation goes way
beyond initial randomization
into treatment and ‘control’
groups.
To attempt to conduct an impact evaluation of
a program using only one pre-determined tool
is to suffer from myopia, which is unfortunate.
On the other hand, to prescribe to donors and
senior managers of major agencies that there
is a single preferred design and method for
conducting all impact evaluations can and has
had unfortunate consequences for all of those
who are involved in the design,
implementation and evaluation of international
development programs.
We must be careful that in using the
“Gold Standard”
we do not violate the “Golden Rule”:
“Judge not that you not be judged!”
In other words:
“Evaluate others as you would have
them evaluate you.”
Caution: Too often what is called Impact Evaluation is based
on a “we will examine and judge you” paradigm. When we
want our own programs evaluated we prefer a more holistic
approach.
How much more helpful it is when the approach to
evaluation is more like holding up a mirror to help people
reflect on their own reality: facilitated self-evaluation.
To use the language of the OECD/DAC, let’s be sure our
evaluations are consistent with these criteria:
RELEVANCE: The extent to which the aid activity is suited to
the priorities and policies of the target group, recipient and
donor.
EFFECTIVENESS: The extent to which an aid activity attains
its objectives.
EFFICIENCY: Efficiency measures the outputs – qualitative
and quantitative – in relation to the inputs.
IMPACT: The positive and negative changes produced by a
development intervention, directly or indirectly, intended or
unintended.
SUSTAINABILITY is concerned with measuring whether the
benefits of an activity are likely to continue after donor
funding has been withdrawn. Projects need to be
environmentally as well as financially sustainable.
The bottom line is defined by this
question:
Are our programs making plausible
contributions towards positive
impact on the quality of life of our
intended beneficiaries?
Let’s not forget them!
ANY QUESTIONS?
166
166
RealWorld Evaluation
Designing Evaluations under Budget,
Time, Data and Political Constraints
Threats to Validity
Checklists
1. What is validity
and why does it
matter?
Defining validity
The degree to which the evaluation findings and
recommendations are supported by:
 The conceptual framework describing how the
project is supposed to achieve its objectives
 Statistical techniques (including sample design)
 How the project and the evaluation were
implemented
 The similarities between the project population and
the wider population to which findings are
generalized
169
Why validity is important
Evaluations provide recommendations for
future decisions and action. If the
findings and interpretation are not valid:
 Programs which do not work may
continue or even be expanded
 Good programs may be discontinued
 Priority target groups may not have
access or benefit
170
RWE quality control goals




The evaluator must achieve greatest possible
methodological rigor within the limitations of a given
context
Standards must be appropriate for different types of
evaluation
The evaluator must identify and control for
methodological weaknesses in the evaluation
design.
The evaluation report must identify methodological
weaknesses and how these affect generalization to
broader populations.
171
2. General guidelines for assessing the
validity of all evaluation designs
A.
B.
C.
D.
E.
Confirmability
Reliability
Credibility
Transferability
Utilization
172
A.
Confirmability
Are the conclusions drawn from the available evidence
and is the research relatively free of researcher
bias?
Examples:
A-1: Inadequate documentation of methods and
procedures
A-2: Is data presented to support the conclusions and
are the conclusions consistent with the findings?
[Compare the executive summary with the data in
the main report]
173
B.
Reliability
Is the process of the study consistent, reasonably
stable over time and across researchers and
methods?
Examples:
B-1: Data was only collected from people who
attended focus groups or community meetings
B-2: Were coding and quality checks made and
did they show agreement?
174
C. Credibility
Are the findings credible to the people studied and to
readers? Is there an authentic picture of what is
being studied?
Examples:
C-1: Is there sufficient information to provide a credible
description of the subjects or situations studied?
C-2: Was triangulation among methods and data
sources systematically applied? Were findings
generally consistent? What happened if they were
not?
175
D. Transferability
Do the conclusions fit other contexts and how
widely can they be generalized?
Examples:
D-1: Are the characteristics of the sample
described in enough detail to permit
comparisons with other samples?
D-2: Does the report present enough detail for
readers to assess potential transferability?
176
E. Utilization
Were findings useful to clients,
researchers and communities studied?
Examples:
E-1: Were findings intellectually and
physically accessible to potential
users?
E-2: Do the findings provide guidance for
future action?
177
3. Additional threats to validity for
Quasi-Experimental Designs [QED]
F.
Threats to statistical conclusion validity
why inferences about statistical association between two variables
(for example project intervention and outcome) may not be valid
G.
Threats to internal validity why assumptions that
project interventions have caused observed outcomes may not be
valid
H.
Threats to construct validity why selected
indicators may not adequately describe the constructs and causal
linkages in the evaluation model
I.
Threats to external validity why assumptions
about the potential replicability of a project in other locations or with
other groups may not be valid
178
Threats to Validity Checklists
[See Appendices A-E of RWE book,
pp. 490-555]
• Appendix A: Worksheet for quantitative designs
• Appendix B: Worksheet for qualitative designs
• Appendix C: Worksheet for standard mixed-method
designs
• Appendix D: Example of completed Threats to Validity
worksheet
• Appendix E: Worksheet for advanced mixed-method
designs
185
1. Let’s talk about what
we’re learning.
2. What are some of the
practical realities of
applying RWE
approaches?
192
Nepal case study exercise
1.
2.
3.
4.
Imagine that you have been contracted as an consultant to the Norad
Evaluation Department to serve in the role of expert advisor to help
guide the planning for and reporting of this evaluation.
As you read the ToR (Terms of Reference) [extracted from 205-page full
report] what suggestions would you make for improving the ToR
(making it more realistic), based on what you’ve learned about
RealWorld Evaluation designs and methodologies?
Subsequently, upon reading summary descriptions of the evaluation
report (Part II), what feedback would you provide to those who
conducted the evaluation?
From what you have learned from this evaluation, what advice would
you give to the Norad Evaluation Department (or your own agency)
about designing future evaluations of this kind?
193
Next steps: How are we going to apply these
things as we go forth and practice evaluation in
our own Real Worlds?!
194
Workshop wrap-up and
evaluation.
195
Main workshop messages
1.
2.
3.
4.
5.
Evaluators must be prepared for RealWorld
evaluation challenges.
There is considerable experience to learn from.
A toolkit of practical “RealWorld” evaluation
techniques is available (see
www.RealWorldEvaluation.org).
Never use time and budget constraints as an
excuse for sloppy evaluation methodology.
A “threats to validity” checklist helps keep you
honest by identifying potential weaknesses in
your evaluation design and analysis.
196
197
197
Additional References for IE








DFID “Broadening the range of designs and methods for impact evaluations”
http://www.oecd.org/dataoecd/0/16/50399683.pdf
Robert Picciotto “Experimententalism and development evaluation: Will the
bubble burst?” in Evaluation journal (EES) April 2012:
http://evi.sagepub.com/
Martin Ravallion “Should the Randomistas Rule?” (Economists’ Voice 2009)
http://ideas.repec.org/a/bpj/evoice/v6y2009i2n6.html
William Easterly “Measuring How and Why Aid Works—or Doesn't”
http://aidwatchers.com/2011/05/controlled-experiments-and-uncontrollablehumans/
Control freaks: Are “randomised evaluations” a better way of doing aid and
development policy? The Economist June 12, 2008
http://www.economist.com/node/11535592
Series of guidance notes (and webinars) on impact evaluation produced by
InterAction (consortium of US-based INGOs):
http://www.interaction.org/impact-evaluation-notes
Evaluation (EES journal) Special Issue on Contribution Analysis, Vol. 18, No.
3, July 2012. www.evi.sagepub.com
Impact Evaluation In Practice www.worldbank.org/ieinpractice
198