Session 2: Specifying the Conceptual and Operational Models and the

advertisement
Session 2:
Specifying the Conceptual and
Operational Models and the
Research Questions that Follow
Mark W. Lipsey
Vanderbilt University
IES/NCER Summer Research Training Institute, 2008
Workshop on randomized
controlled trials
• Purpose: Increasing capacity to develop and
conduct rigorous evaluations of the effectiveness
of education interventions
• Caveat: “Rigorous evaluations” are not
appropriate for every intervention or every
research project involving an intervention
– They require special resources (funding, amenable
circumstances, expertise, time)
– They can produce misleading or uninformative results
if not done well
– The preconditions for making them meaningful may
not be met.
Critical preconditions for rigorous
evaluation
• A well-specified, fully developed intervention
with useful scope
–
–
–
–
basis in theory and prior research
identified target population
specification of intended outcomes/effects
“theory of change” explication of what it does and why
it should have the intended effects for the intended
population
– operators’ manual: complete instructions for
implementing
– ready-to-go materials, training procedures, software,
etc.
Critical preconditions for rigorous
evaluation (continued)
• A plausible rationale that the intervention is
needed; reason to believe it has advantages
over what’s currently proven and available
• Clarity about the relevant counterfactual– what it
is supposed to be better than
• Demonstrated “implementability”– can be
implemented well enough in practice to plausibly
have effects
• Some evidence that it can produce the intended
effects albeit short of standards for rigorous
evaluation
Critical preconditions for rigorous
evaluation (continued)
• Amenable research sites and
circumstances:
– cooperative schools, teachers, parents, and
administrators willing to participate
– student sample appropriate in terms of
representativeness and size for showing
educationally meaningful effects
– access to students (e.g., for testing), records,
classrooms (e.g., for observations)
IES funding categories
• Goal 2 (intervention development) for advancing
intervention concepts to the point where rigorous
evaluation of its effects may be justified
• Goal 3 (efficacy studies) for determining whether
an intervention can produce worthwhile effects;
RCT evaluations preferred.
• Goal 4 (effectiveness studies) for investigating
the effects of an intervention implemented under
realistic conditions at scale; RCT evaluations
preferred.
Specifying the theory of change
embodied in the intervention
1. Nature of the need addressed
– what and for whom (e.g., 2nd grade students
who don’t read well)
– why (e.g., poor decoding skills, limited
vocabulary)
– where the issues addressed fit in the
developmental progression (e.g.,
prerequisites to fluency and comprehension,
assumes concepts of print)
– rationale/evidence supporting these specific
intervention targets at this particular time
Specifying the theory of change
2. How the intervention addresses the need and
why it should work
–
–
–
content: what the student should know or be able to
do; why this meets the need
pedagogy: instructional techniques and methods to
be used; why appropriate
delivery system: how the intervention will arrange to
deliver the instruction
Most important: What aspects of the above are
different from the counterfactual condition
What are the key factors or core ingredients most
essential and distinctive to the intervention
Logic models as theory schematics
Target
Population
Intervention
Proximal Outcomes
Distal Outcomes
Positive
attitudes to
school
4 year
old pre-K
children
Exposed to
intervention
Improved
pre-literacy
skills
Learn
appropriate
school
behavior
Increased
school
readiness
Greater
cognitive
gains in K
Mapping variables onto the intervention
theory: Sample characteristics
Positive
attitudes to
school
4 year
old pre-K
children
Exposed to
intervention
Sample descriptors:
basic demographics
diagnostic, need/eligibility
identification
nuisance factors (for
variance control)
Improved
pre-literacy
skills
Learn
appropriate
school
behavior
Increased
school
readiness
Greater
cognitive
gains in K
Potential moderators:
setting, context
personal and family
characteristics
prior experience
Mapping variables onto the intervention
theory: Intervention characteristics
Positive
attitudes to
school
4 year
old pre-K
children
Exposed to
intervention
Independent variable:
T vs. C experimental
condition
Generic fidelity:
T and C exposure to the
generic aspects of the
intervention (type,
amount, quality)
Improved
pre-literacy
skills
Learn
appropriate
school
behavior
Increased
school
readiness
Greater
cognitive
gains in K
Specific fidelity:
T and C(?) exposure to
distinctive aspects of
the intervention (type,
amount, quality)
Potential moderators:
characteristics of personnel
intervention setting, context
e.g., class size
Mapping variables onto the intervention
theory: Intervention outcomes
Positive
attitudes to
school
4 year
old pre-K
children
Exposed to
intervention
Focal dependent variables:
pretests (pre-intervention)
posttests (at end of intervention)
follow-ups (lagged after end of
intervention
Improved
pre-literacy
skills
Learn
appropriate
school
behavior
Increased
school
readiness
Greater
cognitive
gains in K
Other dependent variables:
construct controls– related DVs
not expected to be affected
side effects– unplanned positive
or negative outcomes
mediators– DVs on causal
pathways from intervention
to other DVs
Main relationships of (possible)
interest
• Causal relationship between IV and DVs (effects of
causes); tested as T-C differences
• Duration of effects post-intervention; growth
trajectories
• Moderator relationships; ATIs (aptitude-Tx
interactions): differential T effects for different
subgroups; tested as T x M interactions or T-C
differences between subgroups
• Mediator relationships: stepwise causal relationship
with effect on one DV causing effect on another;
tested via Baron & Kenny (1986), SEM type
techniques.
Formulation of the research
questions
• Organized around key variables and
relationships
• Specific with regard to the nature of the
variables and relationships
• Supported with a rationale for why the
question is important to answer
• Connected to real-world education issues
• What works, for whom, under what
circumstances, how, and why?
Session 3:
Describing and Quantifying
Outcomes
Mark W. Lipsey
Vanderbilt University
IES/NCER Summer Research Training Institute, 2008
Outcome constructs to measure
Identifying the relevant outcome constructs
follows from the theory development and
other considerations covered earlier in
Session 2
– What: proximal/mediating and distal outcomes
– When: temporal status– baseline, immediate
outcome, longer term outcomes
– What else:
• possible positive or negative side effects
• construct control outcomes not targeted for change
Aligning the outcome constructs and measures
with the intervention and policy objectives
Instruction
Assessment
Policy relevant outcomes
(e.g., state achievement standards)
Alignment of instructional tasks
with the assessment tasks
Instructional tasks,
activities, content
Identical
Analogous
(near transfer)
Generalized
(far transfer)
Basic psychometric issues
Validity (typically correlation with established
measures or subgroup differences)
Reliability (typically internal consistency or
test-retest correlation)
– standardized measures of established validity
and reliability
– researcher developed measures with validity
and reliability demonstrated in prior research
– new measures with validity and/or reliability to
be investigated in present study
Special issue for intervention
studies: sensitivity to change
Achievement effect sizes from 97
randomized education studies
Type of Outcome Mean Effect
Measure
Size
Standardized
.09
test, broad
Number of
Measures
29
Standardized
test, narrow
.32
127
Focal topic test,
mastery test
.50
263
Data from which measurement
sensitivity can be inferred
• Observed effects from other intervention studies
using the measure
• Mean effect sizes and their standard deviations
from meta-analysis
• Longitudinal research and descriptive research
showing change over time or differences
between relevant criterion groups
• Archival data allowing ad hoc analysis of, e.g.,
change over time, differences between groups
• Pilot data on change over time or group
differences with the measure
Variance control and
measurement sensitivity
Variance control via procedural consistency and statistical control using
covariates for e.g., pre-intervention individual differences and differences
in testing procedures or conditions
Issues related to multiple
outcome measures
Correlated measures:
overlap and efficiency
Factor Analysis of Preschool Outcome Variables
Factor Loadings
Subtest
Letter Word Identification
Quantitative Concepts
Applied Problems
Picture Vocabulary
Oral Comprehension
Story Recall
Pre-K
Pretest
Pre-K
Posttest
Kindergarten
Follow-up
.60
.82
.82
.75
.82
.53
.69
.82
.80
.76
.79
.55
.73
.78
.75
.67
.74
.61
Correlated change may be even
more relevant
Factor Analysis of Gain Scores for Pre-K Outcomes
Factor Loadings
Subtest
Pre to
Post
Post to
Follow-up
Pre to
Follow-up
Basic School Skills
Letter Word Identification
Quantitative Concepts
Applied Problems
.74
.66
.54
-.19
.14
.08
.73
.70
.47
-.06
.06
.16
.79
.74
.40
-.15
.13
.41
Complex Language
Picture Vocabulary
Oral Comprehension
Story Recall
.09
.16
-.08
.77
.75
.37
.14
.17
-.16
.48
.72
.68
-.04
.13
-.01
.74
.69
.37
Handling multiple correlated
outcome measures
• Pruning– try to avoid measures that have high
conceptual overlap and are likely to have
relatively large intercorrelations
• Procedural– organize assessment and data
collection to combine where possible for
efficiency
• Analytic
– create composite variables to use in the analysis
– use multivariate techniques like MANOVA to examine
omnibus effects as context for univariate effects
– use latent variable analysis, e.g., in SEM
Practicality and appropriateness to
the circumstances
• Feasibility– time and resources required
• Respondent burden– minimize demands,
provide incentives/compensation
• Developmental appropriateness– consider not
only age but performance level, possible ceiling
and floor effect
• For follow-up beyond one school year, may need
measures designed for a broad age span to
maintain comparability
• May need to tailor measures or assessment
procedures for special populations (disabilities,
English language learners)
Download