Bobronnikov_Eval

advertisement
Designs to Estimate Impacts of MSP
Projects with Confidence.
Ellen Bobronnikov
March 29, 2010
According to the 2007 Report of the Academic
Competitive Council
“Successful, large-scale interventions to improve STEM
education are unlikely to arise without serious study and
trial and error. There is a critical pathway for the
development of successful educational interventions and
activities, starting generally with small-scale studies to
test new ideas and generate hypotheses, leading to
increasingly larger and more rigorous studies to test
the effect of a given intervention or activity on a
variety of students in a variety of settings. Different
research methodologies are used along the development
pathway, and corresponding evaluation strategies must be
used to assess their progress.”
2
Hierarchy of Study Designs for Evaluating Effectiveness
Overview
• Criteria for Classifying Designs of MSP
Evaluations (“the rubric”) created through the
Data Quality Initiative (DQI)
• Rubric’s key criteria for a rigorous design
• Common issues with evaluation reports
• Recommendations for better reporting
• Discussion of your evaluation challenges and
solutions
Evaluations Reviewed Using the Rubric
• All final year evaluations that report using an
experimental or quasi-experimental design are
considered for review
• Evaluations need to include a comparison group
to ultimately be reviewed with the rubric
• Within each project, we review evaluations of
teacher content knowledge, classroom practices,
and student achievement
5
6 criteria used in rubric
Rubric comprises six criterion:
1. Equivalence of groups at baseline
2. Adequate sample size
3. Use of valid & reliable measurement instruments
4. Use of consistent data collection methods
5. Sufficient response and retention rates
6. Reporting of relevant statistics
6
Criterion 1 – Baseline Equivalence
Requirement
• Study demonstrates no significant differences in
key characteristics between treatment and
comparison groups at baseline (for the analytic
sample) OR
• Adequate steps were taken to address the lack
of baseline equivalence in the statistical analysis
Purpose – Helps rule out alternative explanations
for differences between groups
7
Criterion 2 – Sample Size
Requirement
•
Sample size is adequate to detect a difference, based on a power
analysis using:
– Significance level = 0.05,
– Power = 0.8
– Minimum detectable effect informed by the literature or
otherwise justified
•
Alternatively, meets or exceeds “rule of thumb” sample sizes:
– School/district-level interventions: 12 schools
– Teacher-level interventions: 60 teachers (teacher outcomes) or
18 teachers (student outcomes)
Purpose – Increases the likelihood of finding an impact
8
Criterion 3 – Measurement Instruments
Requirement – Data collection instruments used were shown to be valid and
reliable to measure key outcomes
•
Use existing instruments that have already been deemed valid and reliable
Refer to TCK instrument database developed by MSP Knowledge
Management and Dissemination Project at http://mspkmd.net/ OR
•
Create new instruments that have either been:
–
Sufficiently tested with subjects comparable to the study sample and
found to be valid and reliable, OR
–
Created using scales and items from pre-existing data collection
instruments that have been validated and found to be reliable
 Resulting instrument needs to include at least 10 items, and at
least 70 percent of the items are from the validated and reliable
instrument(s)
Purpose – By testing for validity and reliability, you ensure that instruments used
accurately capture the intended outcomes
9
Criterion 4 – Data Collection Methods
• Requirement - Methods, procedures, and
timeframes used to collect the key outcome data
from treatment and comparison groups are
comparable
• Purpose – Limits possibility that observed
differences can be attributed to factors besides
the program, such as passage of time and
differences in testing conditions
10
Criterion 5 – Attrition
Requirement
• Need to measure key outcomes for at least 70% of
original sample (both treatment and control groups), or
evidence that attrition is unrelated to treatment
• If the attrition rates between groups equal or exceed 15
percentage points, difference should be accounted for in
the statistical analysis
Purpose – Helps ensure that sample attrition does not bias
results as participants/control group members who drop
out may systematically differ from those who remain
11
Criterion 6 – Relevant Statistics Reported
Requirement
• Include treatment and comparison group post-test
means and tests of significance for key outcomes OR,
• Provide sufficient information for calculation of statistical
significance (e.g., mean, sample size, standard
deviation/standard error)
Purpose – Provides context for interpreting results,
indicating where observed differences between groups
are most likely larger than what chance alone might
cause
12
Common Issues Found in Evaluation Reports
• Information critical for complete assessment of all criteria
is often not reported, inconsistently reported, or only
reported for the treatment group
– Pre & post sample sizes for both groups, means,
standard deviations/ errors are frequently missing –
these are needed for statistical testing and to calculate
attrition rates
– Varying sample sizes throughout report without
explanations for changes
– Validity and reliability testing not reported for locally –
developed instruments or cited for pre-existing
instruments
– Data collection methods are not discussed
13
Key Recommendation – Report the Details
• Report pre & post sample sizes for both groups and explain changes
in samples sizes; if reporting sub-groups, indicate their sample sizes
as well
• Report key characteristics associated with outcomes at baseline (e.g.,
pretest scores, teaching experience)
• Document and describe the data collection procedures
• Report means, standard deviations/errors, for both groups on key
outcomes; if using a regression model, describe it
• Report results from appropriate significance testing of differences
observed between groups (e.g., t-statistics or p-values)
Discussion Questions
• What challenges have you encountered in
your efforts to evaluate the MSP project?
How have you/might you overcome these
obstacles?
• What has enabled you to increase the rigor
of your evaluations?
• If you could start your evaluation anew,
what would you do differently?
Mathematics and
Science Partnership
(MSP) Programs
U.S. Department of
Education
New Orleans Regional
Meeting
March 29, 2010
Download