Designs to Estimate Impacts of MSP Projects with Confidence. Ellen Bobronnikov March 29, 2010 According to the 2007 Report of the Academic Competitive Council “Successful, large-scale interventions to improve STEM education are unlikely to arise without serious study and trial and error. There is a critical pathway for the development of successful educational interventions and activities, starting generally with small-scale studies to test new ideas and generate hypotheses, leading to increasingly larger and more rigorous studies to test the effect of a given intervention or activity on a variety of students in a variety of settings. Different research methodologies are used along the development pathway, and corresponding evaluation strategies must be used to assess their progress.” 2 Hierarchy of Study Designs for Evaluating Effectiveness Overview • Criteria for Classifying Designs of MSP Evaluations (“the rubric”) created through the Data Quality Initiative (DQI) • Rubric’s key criteria for a rigorous design • Common issues with evaluation reports • Recommendations for better reporting • Discussion of your evaluation challenges and solutions Evaluations Reviewed Using the Rubric • All final year evaluations that report using an experimental or quasi-experimental design are considered for review • Evaluations need to include a comparison group to ultimately be reviewed with the rubric • Within each project, we review evaluations of teacher content knowledge, classroom practices, and student achievement 5 6 criteria used in rubric Rubric comprises six criterion: 1. Equivalence of groups at baseline 2. Adequate sample size 3. Use of valid & reliable measurement instruments 4. Use of consistent data collection methods 5. Sufficient response and retention rates 6. Reporting of relevant statistics 6 Criterion 1 – Baseline Equivalence Requirement • Study demonstrates no significant differences in key characteristics between treatment and comparison groups at baseline (for the analytic sample) OR • Adequate steps were taken to address the lack of baseline equivalence in the statistical analysis Purpose – Helps rule out alternative explanations for differences between groups 7 Criterion 2 – Sample Size Requirement • Sample size is adequate to detect a difference, based on a power analysis using: – Significance level = 0.05, – Power = 0.8 – Minimum detectable effect informed by the literature or otherwise justified • Alternatively, meets or exceeds “rule of thumb” sample sizes: – School/district-level interventions: 12 schools – Teacher-level interventions: 60 teachers (teacher outcomes) or 18 teachers (student outcomes) Purpose – Increases the likelihood of finding an impact 8 Criterion 3 – Measurement Instruments Requirement – Data collection instruments used were shown to be valid and reliable to measure key outcomes • Use existing instruments that have already been deemed valid and reliable Refer to TCK instrument database developed by MSP Knowledge Management and Dissemination Project at http://mspkmd.net/ OR • Create new instruments that have either been: – Sufficiently tested with subjects comparable to the study sample and found to be valid and reliable, OR – Created using scales and items from pre-existing data collection instruments that have been validated and found to be reliable Resulting instrument needs to include at least 10 items, and at least 70 percent of the items are from the validated and reliable instrument(s) Purpose – By testing for validity and reliability, you ensure that instruments used accurately capture the intended outcomes 9 Criterion 4 – Data Collection Methods • Requirement - Methods, procedures, and timeframes used to collect the key outcome data from treatment and comparison groups are comparable • Purpose – Limits possibility that observed differences can be attributed to factors besides the program, such as passage of time and differences in testing conditions 10 Criterion 5 – Attrition Requirement • Need to measure key outcomes for at least 70% of original sample (both treatment and control groups), or evidence that attrition is unrelated to treatment • If the attrition rates between groups equal or exceed 15 percentage points, difference should be accounted for in the statistical analysis Purpose – Helps ensure that sample attrition does not bias results as participants/control group members who drop out may systematically differ from those who remain 11 Criterion 6 – Relevant Statistics Reported Requirement • Include treatment and comparison group post-test means and tests of significance for key outcomes OR, • Provide sufficient information for calculation of statistical significance (e.g., mean, sample size, standard deviation/standard error) Purpose – Provides context for interpreting results, indicating where observed differences between groups are most likely larger than what chance alone might cause 12 Common Issues Found in Evaluation Reports • Information critical for complete assessment of all criteria is often not reported, inconsistently reported, or only reported for the treatment group – Pre & post sample sizes for both groups, means, standard deviations/ errors are frequently missing – these are needed for statistical testing and to calculate attrition rates – Varying sample sizes throughout report without explanations for changes – Validity and reliability testing not reported for locally – developed instruments or cited for pre-existing instruments – Data collection methods are not discussed 13 Key Recommendation – Report the Details • Report pre & post sample sizes for both groups and explain changes in samples sizes; if reporting sub-groups, indicate their sample sizes as well • Report key characteristics associated with outcomes at baseline (e.g., pretest scores, teaching experience) • Document and describe the data collection procedures • Report means, standard deviations/errors, for both groups on key outcomes; if using a regression model, describe it • Report results from appropriate significance testing of differences observed between groups (e.g., t-statistics or p-values) Discussion Questions • What challenges have you encountered in your efforts to evaluate the MSP project? How have you/might you overcome these obstacles? • What has enabled you to increase the rigor of your evaluations? • If you could start your evaluation anew, what would you do differently? Mathematics and Science Partnership (MSP) Programs U.S. Department of Education New Orleans Regional Meeting March 29, 2010