Early Childhood Activities

advertisement
Defining, Conceptualizing, and Measuring
Fidelity of Implementation and Its Relationship
to Outcomes in K–12 Curriculum Intervention
Research
Prepared by
Carol O’Donnell
Institute of Education Sciences
The opinions expressed are those of the presenter and do not
necessarily represent the views of IES or the US Dept of
Education. This webinar is based on a paper published in the
Review of Educational Researcher (O'Donnell, 2008) and findings
reported in O'Donnell, 2007, which were published before the
presenter joined IES.
A little about my background…
• I was a classroom teacher for 10 years (grades 4-8) and now teach
undergraduates part-time.
– “I know it’s not always easy to teach with fidelity.”
• I was a curriculum developer for 11 years developing curriculum
materials for science teachers.
– “I believed in the materials I developed. They were field tested. I knew
they could be effective if taught with fidelity.”
• I was a researcher at GWU for 5 years managing a large scale-up
study on the effectiveness of middle school science curriculum units.
– “How can we be certain the Treatment is being implemented as planned?”
• This lead to my role as a program officer and Review of Educational
Research article (O’Donnell, 2008) on fidelity of implementation.
– How can I help other researchers define, conceptualize and measure
fidelity of implementation in their efficacy and effectiveness studies?
Motivation: What problems exist?
• Teachers have difficulty teaching with fidelity
when creativity, variability, and local
adaptations are encouraged.
• Developers often fail to identify the critical
components of an intervention.
• Researchers often fail to measure whether
components are delivered as intended.
What problems exist?
• If we want to determine effectiveness of an
intervention, we need to define the treatment and its
counterfactual. If “it” works, what is “it” and how do we
distinguish “it” from what is happening in the
comparison classroom?
• Most fidelity studies are correlational and do not
involve impact analysis.
• Implementation under under ideal conditions (efficacy
studies) may yield higher effects than those under
routine conditions (effectiveness studies).
Lipsey, 1999; Petrosino & Soydan, 2005; Weisz, Weiss
& Donenberg, 1992
Points I Will Address In This Webinar
A. How do teachers, developers, and
researchers define fidelity of
implementation?
B. How is fidelity of implementation
conceptualized within efficacy and
effectiveness studies?
Points I Will Address In This Webinar
C. How do we measure fidelity of
implementation?
D. How do we analyze fidelity data to
determine how it impacts program
effectiveness?
E. An example from my own research (if
time).
A. How do teachers, developers, and
researchers define fidelity of
implementation?
Teachers
What does fidelity of implementation mean
to a teacher?
As a teacher, I would ask:
• “Can I modify the program to meet the needs of my
diverse students (SPED, ESOL, etc.)?”
• “How do I meet state indicators (e.g. vocabulary)
not covered by the new program?”
• “Can I use instructional practices that I typically use
in the classroom (e.g. exit cards, warm ups) even if
they aren’t promoted by the intervention?”
• “Can I add supplemental readings?”
Source: O’Donnell, Lynch, & Merchlinsky, 2004
To a teacher, fidelity is…
• Adhering to program purpose, goals, and
objectives.
• Applying the program’s pedagogical
approaches.
• Following the program’s sequence.
• Using the recommended equipment or
materials.
• Making an adaptation to the program that
does NOT change its nature or intent.
Source: O’Donnell, Lynch, & Merchlinsky, 2004
To a teacher, fidelity is NOT…
• Reducing or modifying program objectives.
• Gradually replacing parts of the new program
with previous practices.
• Varying grouping strategies outlined in the
program.
• Changing the program’s organizational patterns.
• Substituting other curriculum materials or
lessons for those described by the program.
Source: O’Donnell, Lynch, & Merchlinsky, 2004
Developers
What does fidelity of implementation mean
to a developer?
As a developer, I would ask:
• What are the critical components of the program? If
the teacher skips part of the program, why does that
happen, and what effect will it have on outcomes?
• Is the program feasible (practical) for a teacher to
use? Is it usable (are the program goals clear)? If not,
what changes should I make to the program? What
programmatic support must be added?
• What ancillary components are part of the program
(e.g., professional development) and must be scaledup with it?
Why should developers collect fidelity of
implementation data?
• To distinguish between the effects of pre-existing
good teaching practices and those prompted by
the instructional program. If the program doesn’t
add value, why spend money on it?
• To understand why certain aspects of instructional
delivery are consistently absent, despite curricular
support (e.g., skipping lesson closure).
Source: O’Donnell, Lynch, & Merchlinsky, 2004
Researchers
What does fidelity of implementation mean to a
researcher?
• Determination of how well a program is
implemented in comparison with the original
program design during an efficacy and/or
effectiveness study (Mihalic, 2002).
• Extent to which the delivery of an intervention
adheres to the program model originally
developed; confirms that the independent
variable in outcome research occurred as
planned (Mowbray et al., 2003).
Why do researchers study fidelity of
implementation?
• To explore how effective programs might be
scaled up across many sites (i.e., if
implementation is a moving target,
generalizability of research may be imperiled).
• To gain confidence that the observed student
outcomes can be attributed to the program.
• To gauge the wide range of fidelity with which
an intervention might be implemented.
Source: Lynch, O’Donnell, Ruiz-Primo, Lee, & Songer, 2004.
Definitions: Summary
• Fidelity of implementation is:
• the extent to which a program (including its
content and process) is implemented as
designed;
• how it is implemented (by the teacher);
• how it is received (by the students);
• how long it takes to implement (duration); and,
• what it looks like when it is implemented (quality).
Questions?
B. How is fidelity of implementation
conceptualized within efficacy and
effectiveness studies?
Definition: Efficacy Study
• Efficacy is the first stage of program research
following development . Efficacy is defined as “the
ability of an intervention to produce the desired
beneficial effect in expert hands and under ideal
circumstances” (RCTs) (Dorland’s Illustrated
Medical Dictionary, 1994, p. 531).
• Failure to achieve desired outcomes in an efficacy
study "give[s] evidence of theory failure, not
implementation failure" (Raudenbush, 2003, p. 4).
Fidelity in Efficacy Studies
• Internal validity - determines that the program will result in
successful achievement of the instructional objectives,
provided the program is “delivered effectively as
designed” (Gagne et al., 2005, p. 354).
• Efficacy entails continuously monitoring and improving
implementation to ensure the program is implemented
with fidelity (Resnick et al., 2005).
• Explains why innovations succeed or fail (Dusenbury et
al., 2003);
• Helps determine which features of program are essential
and require high fidelity, and which may be adapted or
deleted (Mowbray et al., 2003).
Definition: Effectiveness Study
• Interventions with demonstrated benefit in efficacy
studies are then transferred into effectiveness studies.
• Effectiveness study is not simply a replication of an
efficacy study with more subjects and more diverse
outcome measures conducted in a naturalistic setting
(Hohmann & Shear, 2002).
• Effectiveness is defined as “the ability of an intervention
to produce the desired beneficial effect in actual use”
under routine conditions (Dorland, 1994, p. 531) where
mediating and moderating factors can be identified
(Aron et al., 1997; Mihalic, 2002; Raudenbush, 2003;
Summerfelt & Meltzer, 1998).
Fidelity in Effectiveness Studies
• External validity – fidelity in effectiveness studies
helps to generalize results and provides
“adequate documentation and guidelines for
replication projects adopting a given model”
(Mowbray et al, 2003; Bybee, 2003; Raudenbush,
2003).
• Role of developer and researcher is minimized.
• Focus is not on monitoring and controlling levels
of fidelity; instead, variations in fidelity are
measured in a natural setting and accounted for in
outcomes.
Questions?
C. How do we measure fidelity of
implementation?
Multiple Dimensions
• Adherence – Strict adherence to structural components
and methods that conform to theoretical guidelines.
• Dose (Duration) – Completeness and amount of
program delivered.
• Quality of Delivery – The way by which a program is
implemented.
• Participant Responsiveness – The degree to which
participants are engaged.
• Program Differentiation – The degree to which
elements which would distinguish one type of program
from another are present or absent.
Adapted from: Dane & Schneider (1998); Dusenbury, Brannigan, Falco,
& Hansen (2003)
O’Donnell (2008): Steps in Measuring Fidelity
1. Start with curriculum profile or analysis; review
program materials and consult with developer.
Determine the intervention’s program theory.
What does it mean to teach it with fidelity?
2. Using developer’s and past implementers’ input,
outline critical components of intervention
divided by structure (adherence, duration) and
process (quality of delivery, program
differentiation, participant responsiveness) and
outline range of variations for acceptable use.
O’Donnell, C. L. (2008).Defining, conceptualizing, and measuring fidelity of
implementation and its relationship to outcomes in K–12 curriculum intervention
research. Review of Educational Research, 78, 33–84.
O’Donnell (2008): Steps in Measuring Fidelity
3. Develop checklists and other instruments to
measure implementation of components (in most
cases unit of analysis is the classroom).
4. Collect multi-dimensional data in both treatment
and comparison conditions: questionnaires,
classroom observations, self-report, student
artifacts, interviews. Self-report data typically yields
higher levels of fidelity than observed in the field.
5. Adjust outcomes if fidelity falls outside acceptable
range.
O’Donnell, C. L. (2008).Defining, conceptualizing, and measuring fidelity of
implementation and its relationship to outcomes in K–12 curriculum intervention
research. Review of Educational Research, 78, 33–84.
Measuring Fidelity of Implementation
• Psychometricians should be involved in the development
of fidelity measures—validity and reliability. Fidelity to
structure (adherence) easy to measure. Fidelity to process
(quality) less reliable, but has higher predictive utility.
• “Global classroom quality” should be considered
separately from implementation fidelity, unless the global
items are promoted by the program.
• Measure adaptation separately from fidelity. Adapting a
program is different from supplementing the program,
which has been shown to enhance outcomes as long as
fidelity is high (Blakely et al, 1987). Fidelity measures are
not universal. They are program-specific. As a field, we
need to standardize the methods—not the measures.
See Hulleman et al SREE 2010 papers
Questions?
D. How do we analyze fidelity data to determine
how it impacts program effectiveness?
Analyzing the Impact of Fidelity on Outcomes
• Descriptive - frequency or percentage of fidelity.
• Associative – simple correlation; relationship between
percentage of fidelity and outcomes.
• Predictive - fidelity explains percentage of variance in
outcomes in the treatment group.
• Causal - requires randomization of teachers to high and
low fidelity groups; fidelity causes outcomes; rarely done
in research (Penuel).
• Impact – fidelity as 3rd variable: e.g., fidelity moderates
relationship between intervention and outcomes; effects
of intervention on outcomes mediated by level of fidelity.
• Adjusting Outcomes – achieved relative strength; fidelity
vs infidelity (Hulleman & Cordray, 2009)
Source: O’Donnell, 2008
Analyzing the Impact of Fidelity on Outcomes
• Correlational studies provide a nice foundation
for impact analysis, but it is impact analysis
that asks if implementation fidelity changes the
program’s effects.
• Multiple correlations between fidelity items and
outcomes are often disparate—what does it all
mean? We need a more complete fidelity
assessment to better understand construct
validity and generalizability.
Questions?
Okay. So, how do you go from identifying
the program theory (using a logic model
to define and conceptualize fidelity in your
own study), to measuring fidelity, to
analyzing its effects on outcomes?
An Example
O’Donnell, C. L. (2007). Fidelity of implementation to instructional
strategies as a moderator of curriculum unit effectiveness in a largescale middle school science experiment. Dissertation Abstracts
International, 68(08). (UMI No. AAT 3276564)
Step 1: Identify the critical components
• First, we worked with the curriculum developers to
identify the program’s critical components, which
weren’t always explicit to users. We then separated
the components into structure (adherence,
duration) and process (quality of implementation,
program differentiation).
• We hired a third party evaluator to conduct a
curriculum analysis to determine if the components
were present; and, if they were, to what degree?
Sharon Lynch will talk more about this work,
which was part of the larger SCALE-uP study.
Curriculum Profile
Chemistry That
Applies*
Instructional Category
ARIES**
Motion & Forces
McMillian/McGraw
-Hill Science*
III. Engaging Students with Relevant Phenomena
Providing a variety of phenomena
●
●
X
Providing vivid experiences
●
●
X
Introducing terms meaningfully
●
◒
X
Representing ideas effectively
◒
◒
X
Demonstrating use of knowledge
◕
X
X
●
X
X
IV. Developing and Using Scientific Ideas
Providing practice
● =Excellent, ◕=Very Good, ◒ = Satisfactory, X=Poor
*Source: www.project2061.org/tools/textbook/mgsci/crit-used.htm
**Available from www.gwu.edu/~scale-up under Reports
Step 2: Define the intervention a priori using
a logic model
• I then created a logic model of implementation to illustrate
the theory of change. I used the model to theorize a priori
what should happen in the classroom relative to outcomes.
• I kept the counterfactual in mind as I conceptualized the
logic model because I hypothesized that teachers’ fidelity
to the instructional practices identified by the curriculum
analysis were moderating outcomes, and I knew I would
have to collect fidelity data in both the comparison and
treatment classrooms (structure vs process)?
O’Donnell, C. L. (2007).
I hypothesized that the presence
of the curriculum materials in
the teachers’ hands relative to
the comparison condition
(business as usual) would have
a direct effect on students’
understanding of motion and
forces, but that this relationship
would be moderated by a
teacher’s use of the
instructional practices identified
by the curriculum analysis. In
other words, I hypothesized that
the causal relationship between
the IV and DV would vary as a
function of fidelity as a
moderator (Cohen et al, 2003).
My Logic Model
O’Donnell, C. L. (2007).
Step 3: Measure fidelity
• We developed the Instructional Strategies Classroom
Observation Protocol (O’Donnell, Lynch, & Merchlinsky,
2007) using the critical components identified by the
curriculum analysis as our foundation.
• 24 items were developed, some dichotomous (Yes/No),
some polytomous (Likert-like scale 0 – 3) to measure
the degree of fidelity to that item. The problem was, the
items were not on an interval scale and were not
additive. Subjects receiving the same fidelity score had
different implementation profiles.
O’Donnell, C. L. (2007).
O’Donnell, Lynch &
Merchlinsky, 2007)
Step 4: Analyze fidelity data
• I knew that 24 items analyzed separately would complicate
the model conceptually and structurally, because multiple
measures often inflate standard errors of parameter
estimators. I needed parsimony. So I computed a
unidimensional fidelity score for each classroom using
Rasch analysis. I mean-centered the fidelity score and
entered it into my model.
• I avoided the dangers of removing low fidelity implementers
from the sample, or creating bivariate median split between
high and low fidelity users (which loses continuous data).
O’Donnell, C. L. (2007).
R2 = .616
No statistically significant differences between T and C classroom means for observed instructional strategies; except for Assisting
teachers in identifying own students’ ideas. However, 5 of 8 criteria rated highly in in the program were positively significantly
correlated with classroom mean achievement in the Treatment classrooms; no positive correlations in Comparison classrooms.
Regression analysis testing for interaction effects showed that treatment classrooms
were predicted to score 17.406 points higher on final assessment than comparison
classrooms when their fidelity measure was High (2.40), t = -3.999, p < .05. There was
no statistically significant difference in classroom mean achievement when the fidelity
measures of classrooms were Low (-.85) or medium (.78). (O’Donnell, 2007)
Item maps in Rasch analysis showed that it was
harder for teachers to teach the more reformoriented practices with fidelity (items at the top of the
map = accurate representations, justifying ideas); it
was easier to teach the more traditional practices
with fidelity (items at the bottom of the map = using
terms appropriately).
O’Donnell, C. L. (2007).
Questions?
Conclusions
Know when & how to use fidelity data
• Development - Use fidelity results to inform revisions. Decide
now what components are required to deliver the intervention
as intended when implemented at scale.
• Efficacy - Monitor fidelity and relate it to outcomes to gain
confidence that outcomes are due to the program (internal
validity).
• Replication - Determine if levels of fidelity and program results
under a specific structure replicate under other organizational
structures.
• Scale-up - Understand implementation conditions, tools, and
processes needed to reproduce positive effects under routine
practice on a large scale (external validity). Are methods for
establishing high fidelity financially feasible?
Bottom Line: If the intervention can be implemented
with adequate fidelity under conditions of routine
practice and yield positive results, scale it up.
Source: O’Donnell, 2008
Questions & Discussion
Please feel free to contact me at any time:
Dr. Carol O’Donnell
Research Scientist
National Center for Education Research
Institute of Education Sciences
U.S. Department of Education
555 New Jersey Ave., NW, Room 610c
Washington, DC 20208 – 5521
Voice: 202-208-3749
Fax: 202-219-2030
Web: http://ies.ed.gov
Email: Carol.ODonnell@ed.gov
Download