Evaluation Designs - Carolina Population Center

advertisement
Monitoring and Evaluation:
Evaluation Designs
Objectives of the Session
By the end of this session, participants will be able
to:
• Understand the purpose, strengths, and
shortcomings of different study designs
• Distinguish between study designs that enable us
to causally link program activities to observed
changes and study designs that do not
• Link evaluation designs to the types of decisions
that need to be made
Causality Requirements
• A precedes B.
• B is present only when A is present.
• We can rule out all other possible causes of B.
The Basic Experimental Principle
• The intervention is the only difference between two
groups
• This is achieved by random assignment
Class Activity
Can you name situations in which random
assignment can be used in evaluation?
An Experimental Design
RA
Experimental group
O1
Control group
O3
X
O2
O4
An Experimental Design-Cont’d.
• In this design, there are two groups, an
experimental group and a control group. Both have
been randomly selected and both complete the
pre-test. Only the experimental group gets the
intervention, then both groups complete the posttest.
An Experimental Design-Cont’d.
Steps
1. Identify people or groups, some of which could get the
intervention.
2. Pre-test everyone.
3. Randomly assign some of the participants to either the
control group or the experimental group.
4. Deliver the intervention to the experimental group. The
control group may receive an alternative intervention or
nothing at all.
5. Post-test both groups with the same instrument under
the same conditions.
Factors that May Lead Us to Make
Invalid Conclusions
• Dropout: There may be loss to follow-up.
• Instrumentation effects: Occur when a
questionnaire is changed between pre-test and
post-test.
• Testing effects: Occur because study participants
remember questions that were asked of them at
pre-test and perform better at post-test because
they are familiar with the questions.
A Second Experimental Design
Experimental group
RA
Control group
X
O2
O4
A Second Experimental Design-Cont’d
• In this design, experimental and control groups are
formed; however, there is no pre-test. Instead, the
experimental group gets the intervention and then
both groups are measured at the end of the
program.
A Non-Experimental Design
Time
Experimental
group
O1
X
O2
A Non-Experimental Design-Cont’d
•
In this method of evaluation, only people who are
participating in the program get the pre- and post-test.
Steps
1. Pre-test everyone in the program.
2. Deliver the intervention.
3. Post-test the same individuals.
This design does not provide any information about what
kinds of results might have occurred without the
program and is the weakest in terms of scientific rigor.
Another Factor that May Lead to
Invalid Conclusions
• History effects: These occur when extraneous
events (events that occur outside the study)
influence study-measured outcomes.
A Second Non-Experimental Design
Time
Experimental
group
O1 O2 O3 X O4 O5 O6
A Second Non-Experimental DesignCont’d
• For this design, a survey is administered multiple
times - before, during, and after a program
A Second Non-Experimental DesignCont’d
Steps
1. Select a program-outcome measure that can be used
repeatedly.
2. Decide who will be in the experimental group. Will it be
the same group of people measured many times, or will
it be successive groups of different people?
3. Collect at least three measurements prior to the
intervention that were made at regular intervals.
4. Check the implementation of the intervention.
5. Continue to collect measurements, at least through the
duration of the program.
A Quasi-Experimental Design
Time
Experimental group
O1
X
O2
---------------------------------
Comparison group
O3
O4
A Quasi-Experimental DesignCont’d.
• In this design, two groups which are similar, but
which were not formed by random assignment, are
measured both before and after one of the groups
gets the program intervention.
A Quasi-Experimental DesignCont’d.
Steps
1. Identify people who will be getting the program.
2. Identify people who are not getting the program,
but are other ways very similar.
3. Pre-test both groups.
4. Deliver the intervention to the experimental group.
The control group may receive an alternative
intervention or nothing at all.
5. Post-test both groups.
Threat to Validity
• Selection effects: Occur when people selected for
a comparison group differ from the experimental
group.
Summary Features of Different Study Designs
True experiment Quasi-experiment
Non-experimental
Partial coverage/
new programs
Partial coverage/ new
programs
Full coverage programs
Control group
Comparison group
--
Strongest design
Weaker than
experimental design
Weakest design
Most expensive
Moderately expensive
Least expensive
Summary Features of Different Study DesignsCont’d.
I.
Non-experimental (One-Group, Post-Only)
IMPLEMENT PROGRAM
II.
ASSESS TARGET GROUP AFTER PROGRAM
Non-experimental (One-Group, Pre- and Post-Program)
ASSESS TARGET
GROUP BEFORE
PROGRAM
IMPLEMENT PROGRAM
ASSESS TARGET GROUP
AFTER PROGRAM
Summary Features of Different Study Designs-ctd
III. Experimental (Pre- and Post-Program with Control Group)
RANDOMLY
ASSIGN PEOPLE
FROM THE SAME
TARGET
POPULATION TO
GROUP A OR
GROUP B
TARGET
GROUP A
CONTROL
GROUP B
ASSESS
TARGET
GROUP A
ASSESS
TARGET
GROUP A
IMPLEMENT
PROGRAM
WITH TARGET
GROUP A
ASSESS
TARGET
GROUP A
ASSESS
CONTROL
GROUP B
Summary Features of Different Study Designs
IV. Quasi-Experimental (Pre- and Post-Program with NonRandomized Comparison Group)
ASSESS TARGER
GROUP BEFORE
PROGRAM
ASSESS
COMPARISON
GROUP BEFORE
PROGRAM
IMPLEMENT PROGRAM
ASSESS TARGET
GROUP AFTER
PROGRAM
ASSESS
COMPARISON
GROUP AFTER
PROGRAM
Summary Features of Different Study Designs-Cont’d.
•
•
•
The different designs vary in their capacity to produce
information that allows you to link program outcomes to
program activities.
The more confident you want to be about making these
connections, the more rigorous the design and costly
the evaluation.
Your evaluator will help determine which design will
maximize your program’s resources and answer your
team’s evaluation questions with the greatest degree of
certainty.
Important Issues to Consider When Choosing
a Design
•
•
•
•
Complex evaluation designs are most costly, but allow for
greater confidence in a study’s findings.
Complex evaluation designs are more difficult to implement, and
so require higher levels of expertise in research methods and
analysis.
Be prepared to encounter stakeholder resistance to the use of
comparison or control groups, such as a parent wondering why
his or her child will not receive a potentially beneficial
intervention
No evaluation design is immune to threats to its validity; there is
a long list of possible complications associated with any
evaluation study. However, your evaluator will help you
maximize the quality of your evaluation study.
Exercise
• A maternity hospital wishes to determine if the offer
of post-partum family-planning methods will
increase contraceptive use among women who
deliver at the hospital.
• What study design would you recommend to test
the hypothesis that women who are offered
postpartum family-planning services are more
likely to use family planning than women are not
offered services?
Exercise
• You have been asked to evaluate the impact of a
national mass-media AIDS-prevention campaign
on condom use.
• What study design would you choose and why?
Linking Evaluation Design to
Decision-Making
Deciding Upon An Appropriate
Evaluation Design
• Indicators: What do you want to measure?
–
–
–
–
Provision
Utilization
Coverage
Impact
• Type of inference: How sure to you want to be?
– Adequacy
– Plausibility
– Probability
• Other factors
Source: Habicht, Victora, and Vaughan (1999)
Clarification of Terms
Types of
evaluation
Are the services available?
Are they accessible?
Performance Provision
or process
evaluation
Utilization
Impact
evaluation
Is their quality adequate?
Are the services being used?
Coverage
Is the target population being reached?
Impact
Were there improvements in disease patterns
or health-related behaviors?
Clarification of Terms
Adequacy
assessment
Plausibility
assessment
Probability
assessment
•Did the expected changes occur?
•Are objectives being met?
•Are activities were performed as planned?
•May or may not require before/after
comparison. Does not require controls
•Did the program seem to have an effect to
an intervention above and beyond other
external influences?
•Requires before-and-after comparison with
controls and treatment of confounding
factors.
•Did the program have an effect (P < x%)?
•Determines the statistical probability that
the intervention caused the effect.
•Requires before/after comparison with
randomized control.
Adequacy Assessment
• Adequacy studies only describe if a condition is
met or not
– Typically addresses provision, utilization or coverage
aspects. No need for control, pre/post data in such
cases
• Hypothesis tested: Are expected levels achieved?
– Can also answer questions of impact (magnitude of
change) provided pre/post data is available
• Hypothesis tested: Difference is equal or greater than
expected
Features of Adequacy Assessment
• Simplest (and cheapest) of evaluation models, as it does
not try to control for external effects. Data are needed
only for outcomes.
• If only input or output results are needed, then the lack of
controls is not a problem.
• When measuring impact, however, it is not possible to
infer that the change is due to the program due to lack of
controls.
• Also, if there is no change, it will not be possible to say
whether the lack of change is due to program inefficiency,
or if the program has impeded a further deterioration.
Class Activity
For each of the following outcomes of interest,
provide indicators that would be useful in the
evaluation of a program for control of diarrheal
diseases aimed at young children with emphasis on
the promotion of oral rehydration salts (ORS):
- Provision:
Are the services available?
Are services accessible?
Is their quality adequate?
- Utilization:
Are the services being used?
- Coverage:
Is target population being reached?
- Impact:
Were there improvements in
disease patterns or health
behaviors?
Adequacy Assessment Inferences
• Are objectives being met?
– Compares program performance with previously-established
adequacy criteria, e.g. 80% ORT-use rate
– No control group
– 2+ measurements to assess adequacy of change over time
• Provision, utilization, coverage
– Are activities being performed as planned?
• Impact
– Are observed changes in health or behavior of expected
direction and magnitude?
• Cross-sectional or longitudinal
Source: Habicht, Victora and Vaughan (1999)
Class Activity
• What are the advantages of adequacy
evaluations?
• What are the limitations of adequacy
evaluations?
• If an adequacy evaluation shows a lack of
change in indicators, how can this be
interpreted?
• Which of the study designs discussed earlier
can be used for adequacy evaluations?
Plausibility Assessment Inferences (1)
• Program appears to have effect above and beyond impact
of non-program influences
• Includes control group
– Historical control group
• Compares changes in community before & after program and attempts to
rule out external factors
• Same target population
– Internal control group
• Compares groups/individuals with different intensities of exposure to
program (dose-response)
• Compares previous exposure to program between individuals with and
without the disease (case-control)
– External control group
• Compares communities/geographic areas with and without the program
• Population that were never targeted by the intervention, but who share
key characteristics with the beneficiaries
Source: Habicht, Victora and Vaughan (1999)
Plausibility Assessment Inferences
(2)
• Provision, utilization, coverage
– Intervention group appears to have better performance
than control
– Cross-sectional, longitudinal, longitudinal-control
• Impact
– Changes in health/behavior appear to be more
beneficial in intervention than control group
– Cross-sectional, longitudinal, longitudinal-control, casecontrol
Source: Habicht, Victora and Vaughan (1999)
Controls and Confounding Factors
• For all types of controls, the groups being compared
should be similar in all respect except their exposure to
the intervention
• That is almost never possible, however. There is always
one factor that influences one group more than another
(confounding factor). E.g., mortality due to diarrhea may
be due to better access to drinking water, not to the ORS
program.
• To eliminate this problem, confounding must be measured
and statistically treated, either via matching,
standardization, or multivariate analysis.
Probability Assessment Inferences
• There is only a small probability that the differences
between program and control areas were due to chance
(P < .05)
• Requires control group
• Requires randomization
• Often not feasible for assessing program effectiveness
–
–
–
–
–
Randomization needed before program starts
Political factors
Scale-up
Inability to generalize results
Known efficacy of intervention
Source: Habicht, Victora and Vaughan (1999)
Summary
Assessment
Objective
What it says
Adequacy:
Assess whether
impact was
reached
Indicates whether
Outcome data
resources were well collected among
spent or not
beneficiaries
Understand what
affects the
outcomes
Helps understand
the determinants of
success/failure of
program
(assessment of
change in outcome)
Plausibility:
(before/after
comparison
controlling for
confounding
factors)
Probability:
(Causal analysis of
before/after
differences)
Determine the
Establishes precise
causal effect of one causation between
intervention on the action and effect
outcome
Data needs
Outcome data plus
confounders
collected among
beneficiaries and
controls
Outcome data
collected among
beneficiaries and
control
Discuss with Decision-Makers Before
Choosing Evaluation Design
Possible Areas of Concern to
Different Decision-Makers
Type of
evaluation
Adequacy
Plausibility
Probability
Provision
Utilization
Coverage
Health center manager
International Agencies
District health managers
International Agencies
International Agencies
Donor Agencies & Scientists
Source: Habicht, Victora and Vaughan (1999)
Impact
Donor
agencies
Scientists
Evaluation Flow from Simpler to
More Complex Designs
Type of
Evaluation Provision
Adequacy
1st
Utilization
2nd
Plausibility
Probability
Source: Habicht, Victora and Vaughan (1999)
Coverage
Impact
3rd
4th (b)
4th (a)
5th
Key Issues to Discuss with Decision
Makers Before Choosing a Design
• Is there a need for collecting new data? If so, at
what level?
• Does design include intervention-control or a
before-after comparison?
• How rare is the event to be measured?
• How small is the difference to be detected?
• How complex will the data analysis be?
• How much will alternative designs cost?
Source: Habicht, Victora and Vaughan (1999)
References
• Adamchak S et al. (2000). A Guide to Monitoring and
Evaluating Adolescent Reproductive Health Programs. Focus
on Young Adults, Tool Series 5. Washington, D.C.: Focus on
Young Adults.
• Fisher A et al. (2002). Designing HIV/AIDS Intervention
Studies. An Operations Research Handbook. New York: The
Population Council.
• Habicht JP et al. (1999). Evaluation Designs for Adequacy,
Plausibility, and Probability of Public Health Programme
Performance and Impact. International Journal of
Epidemiology, 28: 10-18.
• Rossi P et al. (1999). Evaluation. A Systematic Approach.
Thousand Oaks: Sage Publications.
Download