Findings from the Indiana Acuity Efficacy Study: Under What Conditions Do Benchmark Assessments Impact Teaching and Learning? Terry Spradlin June 29, 2012 2012 National Conference on Student Assessment About the Center for Evaluation & Education Policy •The Center for Evaluation & Education Policy (CEEP) is a client-focused, self-funded research center associated with the School of Education at Indiana University •CEEP promotes and supports rigorous program evaluation and nonpartisan policy research primarily, but not exclusively, for education, human service and non-profit organizations •In the area of P-20 education policy, CEEP’s mission is to help inform, influence and shape sound policy through effective, nonpartisan research and analysis www.ceep.indiana.edu 2 Contents I. Overview of Indiana’s Comprehensive Assessment Plan II. 2011-12 Testing Schedule (interim and summative) III. Key Terms IV. Objectives of Indiana Acuity Efficacy Study V. Study Design and Methods VI. Preliminary Findings of Comparison-Group Quasi-Experimental Study VII. Findings of Qualitative Analysis on High-Improvement Schools 3 I. Indiana’s Comprehensive Assessment Plan • Adopted by the Indiana State Board of Education on November 1, 2006 • Assessment Plan called for moving the state’s summative assessment ISTEP+ (Indiana Statewide Testing for Educational Progress-Plus) from the fall to spring and the creation and implementation of “formative/diagnostic” assessments • These changes and the new components were implemented during the 2008-09 school year and included: Wireless Generation’s mClass Reading 3D and Math (Grades K-2) CTB/McGraw-Hill’s Acuity Assessment System (Grades 3-8) Phase-out of the Graduation Qualifying Exam (GQE) • Class of 2011 last to be required to pass GQE • replaced with end-of-course assessments in core subject areas Moved ISTEP+ from fall to spring • (Students in grades 3-10 were tested twice during the 2008-09 school year; now testing in grades 3-8) 4 II. 2011-12 Fall Testing Schedule (Interim and Summative) Assessment Testing Window Grade mCLASS: Reading 3D 8/22-9/19 K-2 mCLASS: Math 9/12-10/7 K-2 Acuity Predictive A ELA/Math 9/26-10/7 3-8 Acuity Diagnostic 1 10/12-11/2 3-8 10/17-11/11 Typically grades 9, 10 11/28-12/9 3-8 Acuity Predictive B Science (4 &6) 12/5-12/16 4, 6 Acuity Predictive B Social Studies (5&7) 12/5-12/16 5, 7 ECA (Early Winter) 12/8-12/21 Typically grades 9, 10 End-of Course Assessments Fall (English 10, Biology, Algebra 1) Acuity Predictive B ELA/Math 5 II. 2011-12 Spring Testing Schedule (Interim and Summative) Assessment Testing Window Grade Acuity Diagnostic 2 1/9-1/30 3-8 mCLASS: Reading 3D 1/9-1/30 K-2 mCLASS: Math 1/30-2/24 K-2 Acuity Predictive C Science 2/1-2/15 4, 6 Acuity Predictive C Social Studies 2/1-2/15 5, 7 Acuity Predictive C ELA/Math 2/8-2/23 3-8 ISTEP+ Applied Skills 3/5-3/14 3-8 Acuity Diagnostic 3 3/14-4/4 3-8 ISTEP+ Multiple Choice 4/30-5/9 3-8 mCLASS: Reading 3D 4/16-5/11 K-2 mCLASS: Math 4/30-5/25 K-2 Acuity Diagnostic 4 5/9-5/30 3-8 6 III. Key Terms Assessment Definitions: Formative Assessments – are short-cycle assessments that are administered regularly and often informally; allow for continual evidence collection of student knowledge; flexible and child-specific; can customize assessment to each child Interim(/Benchmark) Assessments – administered multiple times during a school year (typically quarterly), usually outside of instruction, to evaluate students’ knowledge and skills relative to a specific set of academic goals. They inform educator decisions at multiple levels; results may be reported in a manner allowing aggregation across students, occasions or concepts; can help with the evaluation of effectiveness of remediation and intervention programs. *Approx. 14 states include interim assessments as optional component in their assessment programs Summative Assessments – Rigid in content and administration, least frequently given, often carry highest stakes, used to inform policy and for accountability purposes; not designed to provide diagnostic information about individual students or to inform instructional decisions in short-term 7 III. Key Terms (cont.) • Acuity Predictive Assessments – is used to measure student growth and progress toward academic standards; are designed to mirror ISTEP+ blueprint (state summative test); predict likelihood of student passing ISTEP+ • Acuity Diagnostic Assessment – designed to reflect the curriculum that is anticipated to be taught in the classroom prior to its administration; intended to provide educators with detailed info for targeting and personalizing instruction 8 IV. Objectives of Acuity Efficacy Study Objectives of CEEP Study: How can the system be improved? • Information intended to inform CTB and the IDOE about the changes and support needed to make the implementation and use of Acuity most effective during subsequent school years To what degree does Acuity impact teaching and learning? • Evaluate the effects of the Acuity Assessment System on instructional practice and student achievement, particularly ISTEP+ Under what conditions do benchmark assessments impact teaching and learning most? • Looked at success in high-improvement Acuity schools 9 V. Study Design and Methods A Mixed Methods (Qualitative and Quantitative) Research Study 1. Spring Statewide Online Survey of Acuity Schools - 2009-11 • About 4,000 respondents in total over three survey administrations • Examined educator opinions regarding Acuity Assessment Program content, technology/user experience, PD, and customer support after use of the system for a full school year or more • The primary objectives of the survey were to obtain suggestions for improvement of the program and to gauge views regarding the impact of the program on: 1) classroom instruction; 2) general student achievement; and, 3) student achievement on ISTEP+ 10 Study Design and Methods 2. Intensive Case Study in 12/15 schools - 2008-2010 • Determine what factors make a difference in the effective implementation of Acuity and use of Acuity data • Examined extent to which schools have implemented the Acuity Assessment Program and identify obstacles and challenges encountered • Examined extent to which Acuity has altered or informed classroom instruction, and impacted general student achievement as well as ISTEP+ performance • included one-on-one, face-to-face interviews with 34 principals, testing coordinators, and Acuity trainers as well as focus groups with 6-10 teachers in all 15 schools (109 teachers total); conducted 2/03/10 through 4/06/10 11 Study Design and Methods 3. Completion of a Quasi-Experimental Comparison-Group Study • Methodology – Control schools were matched as closely as possible to Acuity schools on the basis of their demographic variables and baseline ISTEP+ scores, within each grade level – Demographics included: school information (locale, size, expenditures, etc.); student information (SES, race, grade, gender, etc.); teacher credentials and experience; and baseline ISTEP+ scores – Treatment schools were grouped based on years using Acuity: 2008-2011 (three years), 2008-10 (two years), and 2009-10 (one year) – Matched groups separated by Math and English/language arts 12 Study Design and Methods Comparison Group Study, cont. Treatment and Control Matches Per Cohort Number of Schools Receiving Matches Treatment Unique Control Total Controls Cohort 08 286 140 286 Cohort 09 195 133 195 Cohort 10 186 137 186 Total 667 410 667 13 Study Design and Methods Comparison-Group Study, Cont. • Methodology, cont. – A Linear Mixed Model (LMM) was performed on ISTEP+ scores for ELA and Math (separately) and by years in program, yielding six separate analyses. – Type III Tests for Fixed Effects were conducted to determine if statistically significant differences between Acuity schools/students and Control schools/students could be detected for several demographic factors and Acuity test types. – Estimates for comparative gain scores for those statistically significant interactions were then prepared from LMM and Fixed Effects regression results. 14 Study Design and Methods 4. Supplemental Focus Group Work – Spring 2011 and 2012 • 28 educators participated in focus groups on March 10-11, 2011, to further gauge opinion on Acuity Assessment Program and impact • 10 educators from high-improvement Acuity schools participated in focus groups on May 29 and June 5, 2012 (8 principals, 1 Acuity coordinator, and 1 curriculum coordinator), to examine “Under what conditions do benchmark assessments impact teaching and learning most?” o High-Improvement Acuity schools were the schools using Acuity all three school years of the program and their students demonstrated the highest scale-score gains on ISTEP+ over the same period of time. 15 VI. Findings of Comparison-Group QuasiExperimental Study 16 Findings ELA One Year - Estimated Average Gain (and SE) 25.0 20.0 Average Gain In ISTEP+ Scale Scores 15.0 10.0 5.0 0.0 -5.0 -10.0 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8 Both 9.60*** 11.55*** 16.51*** 9.56*** -2.34 4.65 Diagnostic 11.69*** 10.21*** 19.48*** 5.65*** 2.17 -2.77* Predictive 9.38*** 8.29*** 15.59*** 9.16*** 2.91** -1.15* Control 10.94*** 7.76*** 17.01*** 10.41*** 2.86*** 0.65* 17 Findings Math One Year - Estimated Average Gain (and SE) 45.0 40.0 Average Gain In ISTEP+ Scale Scores 35.0 30.0 25.0 20.0 15.0 10.0 5.0 0.0 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8 Both 23.40*** 31.28*** 36.98*** 25.04*** 31.66*** 27.14*** Diagnostic 23.48*** 26.39*** 34.21*** 13.18*** 16.05*** 15.84*** Predictive 29.99*** 33.58*** 34.54*** 22.01*** 25.80*** 24.12*** Control 28.07*** 31.58*** 32.29*** 26.23*** 28.59*** 22.74*** 18 Findings ELA Two Year - Estimated Average Gain (and SE) 60.0 Average Gain In ISTEP+ Scale Scores 50.0 40.0 30.0 20.0 10.0 0.0 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Both 44.46*** 41.63*** 48.35*** 29.74*** 19.97*** Diagnostic 33.86*** 39.17*** 32.96*** 29.63*** 11.71*** Predictive 35.65*** 34.48*** 39.03*** 31.72*** 22.42*** Control 32.46*** 35.02*** 40.16*** 28.58*** 18.32*** 19 Findings Math Two Year - Estimated Average Gain (and SE) 90.0 80.0 Average Gain In ISTEP+ Scale Scores 70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.0 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Both 78.79*** 75.91*** 64.39*** 44.14*** 49.41*** Diagnostic 62.99*** 70.28*** 62.04*** 67.58*** 66.81*** Predictive 68.68*** 65.06*** 57.96*** 51.14*** 53.92*** Control 63.34*** 63.36*** 61.05*** 56.13*** 53.82*** 20 Findings ELA Three Year - Estimated Average Gain (and SE) 80.0 70.0 Average Gain In ISTEP+ Scale Scores 60.0 50.0 40.0 30.0 20.0 10.0 0.0 Grade 3 Grade 4 Grade 5 Grade 6 Diagnostic 61.61*** 56.19*** 62.19*** 27.34* Predictive 50.90*** 62.99*** 53.20*** 43.61*** Control 49.09*** 63.58*** 56.28*** 39.17*** 21 Findings Math Three Year - Estimated Average Gain (and SE) 140.0 120.0 Average Gain In ISTEP+ Scale Scores 100.0 80.0 60.0 40.0 20.0 0.0 Grade 3 Grade 4 Grade 5 Grade 6 Diagnostic 110.71*** 88.76*** 74.70*** 47.02** Predictive 94.31*** 82.92*** 89.26*** 88.80*** Control 93.58*** 87.04*** 89.02*** 80.28*** 22 Analysis • Effect sizes were very small across grade levels and subjects, suggesting effects may not have practical significance • In many instances the ELA and math means are higher in the Acuity group; however, the size of the difference is small as indicated by the negligible effect sizes and figures using ISTEP+ scale scores • Increased teacher usage of Acuity Assessment Program components (reports, Instructional Resources, etc.) does have a statistically significant positive effect on ISTEP+ scale score gains • Effects of benchmark assessment system is not definitive in either direction • Qualitative data indicates Acuity may have more effect in subsequent years. It takes time, but how much longer? 23 VII. Findings from Qualitative Analysis Findings illustrate Acuity schools are progressing through the “Stages of Assessment Enlightenment:” 1) Concern: “test fatigue,” validity of results and utility of information 2) Confusion: interpretation and making meaning of the data – “How do we do it and what do we do with the information to provide instructional focus?” 3) Consciousness: evidence that reflective conversation and shared learning occurring; data-driven decisions; informing and altering instruction – at this stage we should begin to see larger impact on achievement 24 A. Spring Survey Results: Findings, Opinions, and Suggestions 25 What impact did Acuity have on classroom instruction? (3 years) 26 What impact did Acuity have on student achievement? 27 Anticipated impact of Acuity on ISTEP+ performance 28 Discussion of Third Research Objective Under what conditions do benchmark assessments impact teaching and learning the most? 29 High-Improvement Acuity Schools: Focus Group Findings Acuity has an impact on instruction • The accuracy of Acuity data has prompted teachers to increasingly utilize data to alter instruction – accuracy termed “eye-opening” • Acuity helps teachers determine where students are performing and underperforming – “helps us get on the front end with intervention efforts” • The data often informs teachers to establish groups by ability or mastery and identify needed remediation; administrators shifting resources and personnel to assist with efforts including remediation periods and extra time for reading/ELA; data could be more user friendly – esp. student reports • Re-teaching and remediation plans based on Acuity help to target students at all ability levels; intervention tier grouping • Schools are aligning curriculum to state curriculum maps and paying closer attention to teaching of Indiana Academic Standards to ensure instruction is standards driven 30 High-Improvement Acuity Schools: Focus Group Findings One principal used analogy of the benefit of Acuity as “small races instead of one long race” in that it provides a clarity of focus on teaching and learning in shorter sequences; teachers can focus on improving 1 or 2 things and then move forward Some of the high-improvement Acuity schools use instructional or data coaches to guide PD and PLC activities; student literacy coaches helpful, too 31 High-Improvement Acuity Schools: Focus Group Findings Acuity has a perceived impact on student achievement • • • • • • Acuity is perceived to be very influential in raising students’ ISTEP+ scores Instructional Resources are popular, but there are many ISTEP+ sub-standards without sufficient correlated Instructional Resource items; make more interactive Students taking ownership of Acuity results to improve their performance on ISTEP+; they like seeing the Student Growth Reports Individual conferences with teachers regarding individual student performance lead to positive student attitudes towards Acuity “…[Our students] get really excited about [the Student Growth Report] because they see a [progression] line, and they see it going up or down, so that’s been good for us and with the kids, knowing where they are, where they’re going” “…what gets measured gets done, and if we measure something and we think it’s important as a staff, the students are going to take it seriously.” 32 Professional Development Critical for Success • Professional development is essential not only to inform on “assessment literacy” issues of relevance, validity, content, administration, and access to reports, the “technical usage” issues, but then to help with interpretation and use of data to inform and alter instruction, as well as ongoing PD on modifications or new features to the system. • High-improvement Acuity schools are setting aside or building time in school day for data analysis and discussion in professional learning communities • If present, these PD practices will lead to greater teacher buy-in and support of the system. 33 Suggestions NEXT STEPS/NEEDS • • • • • • • • • More expansive Instructional Resources needed Move Diagnostic 1 up earlier in the school calendar More time is needed between tests and Predictive C needs to be pushed back Auto reassignment feature in IR activities Adequate computer access in schools still an issue and limits more regular use of IR activities – teachers want remediation tools that don’t require computer Sharing of Acuity data with parents is not widespread and a parent-friendly report is desired Utilize open-ended/constructed response questions and improving scoring process for this component Schools would be interested in seeing state-level Acuity data Some educators expressed the sentiment that the Acuity Assessment Program should replace ISTEP+ altogether 34 Concerns/Observations • Lack of use of open-ended constructed-response questions on Predictive Assessments and none included in Diagnostic Assessments; teachers must score using rubric – a disincentive because of time • Educators question, “Are Acuity Assessments as rigorous as the interim assessments that are being developed by SMARTER Balanced Assessment Consortium and PARCC?” • Will and how will results from Acuity be used to judge teacher quality under new teacher evaluation system required by Indiana? • Narrowing of curriculum and not educating “whole child” 35 Indiana Acuity Schools Evolution Reflective of Experiences Elsewhere • From study of SDP use of interim assessments, Blanc et al. (2010) writes that interim assessments “are most likely to contribute to improved student learning if there are also concomitant attention to developing strong school leaders who promote data-driven decision making within a school culture focused on strengthening instruction, professional learning, and collective responsibility for student success.” 36 CEEP Contact Information Terry E. Spradlin, MPA Director for Education Policy/Project Manager tspradli@indiana.edu Dingjing Shi & Rod Whiteman Graduate Research Assistants 1900 East Tenth Street Bloomington, Indiana 47406-7512 812-855-4438 Fax: 812-856-5890 http://ceep.indiana.edu Stephanie Dickinson & Lijiang Guo Senior Statistician/Statistician Indiana Statistical Consulting Center 37