Learning from Interim Assessments: District Implementation to Classroom Practice JAMES H. MCMILLAN LISA M. ABRAMS VIRGINIA COMMONWEALTH UNIVERSITY MARCES Annual Conference University of Maryland, College Park October 20, 2011 (PowerPoint available at http://www.soe.vcu.edu/merc) Flight Plan Why we need interim assessment What the research says about impact Qualitative study summary Quantitative study summary Recommendations for practice Need for Interim Assessments Increased pressure to understand student achievement: Are students making progress toward meeting the requirements of the state test? Are students on track to pass the state test? Are subgroups of students on track to meet AYP targets? Greater information needs: Measure of student progress relative to a set of specific content standards/skills Identify content areas of strength/areas for improvement Shape instructional decisions Serve as an “early warning” system Inform strategies to support the learning of individual students Results that can be aggregated: student classroom grade/team level school district levels Offer a Range of Instructional Uses (see Supovitz & Klein, 2003) Planning: Decide on content Pace and instructional strategies or approaches (i.e., mastery orientation) Delivery: Targeted instruction: whole class or small groups depending on mastery of content/skills Provide feedback and/or re-teaching selected content and/or skills Selection and use of supplemental or additional resources Remediation: Identify low-performing students Design plans for providing additional supports/assistance Evaluation: Monitor/track student progress Examine effectiveness of interventions Determine instructional effectiveness What We Know About Interim Testing Widespread use across districts in Virginia and nationally (Marsh, Pane & Hamilton, 2006). Mixed views on usefulness of interim test results Compared to own classroom assessments less useful, provide redundant information. Compared to state test more useful than those of state tests to “identify and correct gaps in their teaching.” Factors that influence teachers’ views: quick turnaround of results, alignment with curriculum, capacity and support, instructional leadership, perceived validity, reporting, addedvalue Impact on Teachers Informs instructional adjustments (Brunner et al., 2005; Marsh, Pane & Hamilton, 2006; Oláh, Lawrence & Riggan, 2010; Yeh, 2006) Increased collaboration and problem solving (Lachat & Smith, 2005; Wayman & Cho, 2009; Yeh, 2006) Enhanced self-efficacy, increased reflection (Brunner et al., 2005; Yeh, 2006) Increased emphasis on testing; test preparation and primary influence of colleagues and standards on practice (Loeb, Knapp & Elfers, 2008) Variability within schools – some teachers use information others do not –80% of the variability in teacher survey responses was within rather than between schools (Marsh, Pane & Hamilton, 2006). Impact on Students Achievement – although limited, research suggests impact may be mixed Targeted instruction led to improvements in student test scores (Lachat & Smith, 2005; Nelson & Eddy, 2008; Trimble, Gay & Matthews, 2005; Yeh, 2006) and proficiency in reading and mathematics (Peterson, 2007). Large-scale studies have failed to find significant differences in student achievement between treatment and comparison schools (Henderson, Petrosino & Guckenburg, 2008; May & Robinson, 2007; Quint, Speanik & Smith, 2008). Increased engagement and motivation (Yeh, 2006) Increased access to learning opportunities – tutoring and remedial services (Marsh, Pane & Hamilton, 2006) Targeted instruction toward the “bubble kids”. MERC Research on Interim Assessments Qualitative study Explored the extent to which teachers used interim test results to support learning. Quantitative study Designed to examine teachers’ self-reports about using interim test results and the influence of results on instruction. What conditions are necessary to promote use of test results? How do teachers analyze and use interim test results to inform instruction? To inform decisions about students? What most influence teachers’ use of test results? Qualitative Study Research Design and Methods Qualitative double-layer category focus-group design (Krueger & Casey, 2009) Layers : school type & district (N=6) Protocol the general nature of interim testing policies and the type of data teachers receive expectations for using interim test results instructional uses of interim test results general views on interim testing policies, practices and procedures Focus group sessions Participants Selection: two-stage convenience sampling process District School Principal Teachers Data Collection: Spring 2009/Fall 2010; 15 focus groups w/67 core-content area teachers Demographic Profile: The majority were: white (82%), female (88%), taught at the elementary level (80%) Average of 11.5 years of classroom experience (range of 1-34 yrs.) 33% were beginning teachers with 1-3 years of teaching experience and 20% had been teaching for over 20 years. 20% were middle school teachers in the areas of civics, science, mathematics and language/reading Data Analysis Transcript-based approach using a constant-comparative analytic framework was used to identify emergent patterns or trends (Krueger & Casey, 2009). Analysis focused on the frequency and extensiveness of viewpoints or ideas Codes created in 9 key areas and applied to the text “alignment”, “test quality”, “individualized instruction”, “testing time” High inter-coder agreement Findings: District Policies and Expectations Theme 1: Interim testing policies related to test construction and administration were similar among school divisions. Inconsistencies were evident across content areas and grade levels within districts. They are graded, but they are not part of their grade. So they will [interim test results] show up on their report card as a separate category just so parents know and the students know what the grade is, but it doesn’t have any effect on their class grade. Theme 2: There are clear and consistent district- and buildinglevel expectations for teachers’ analysis and use of interim test results to make instructional adjustments in an effort to increase student achievement. Our principal expects when you have a grade level meeting to be able to say, this is what I’m doing about these results, because it is an unwritten expectation but it is clearly passed on… by sitting down with them the first time they are giving the test and describing how you do data analysis and literally walking them through it and showing them patterns to look for. Findings: Access to Results and Analysis Theme 3: Timely access to test results and use of a software program supported data analysis and reporting. That if we are supposed to be using this information to guide instruction we need immediate feedback, like the day of, so we can plan to adjust instruction for the following day. Theme 4: It was important for teachers to discuss results with others and have time with colleagues to discuss results. We have achievement team meetings where we look at every single teacher, every single class, everything, and look at the data really in depth to try to figure out what’s going on. What is the problem with this class? Why is this one doing better? Findings: Informing Instruction Theme 5: Teachers’ analyze interim test results at the class and individual student level to inform review, re-teaching, and remediation or enrichment. If I see a large number of my students missing in this area, I am going to try to re-teach it to the whole class using a different method. If it is only a couple of [students], I will pull them aside and instruct one-on-one. It makes a difference in my instruction. I mean, I think I’m able to help students more that are having difficulty based on it. I am able to hone in on exactly where the problem is. I don’t have to fish around. Theme 6: A variety of factors related to data quality and validity impact teachers’ use of interim test data. We really need to focus on the tests being valid. It is hard to take it seriously when you don’t feel like it is valid. When you look at it and you see mistakes or passages you know your students aren’t going to be able to read because it is way above their reading level. Findings: Testing Time vs. Learning Time Theme 7: Teachers expressed significant concerns about the amount of instructional time that is devoted to testing and the implications for the quality of their instruction. I think it is definitely made us change the way we teach because you are looking for how can I teach this the most effectively and the fastest…that is the truth, you have got to hurry up and get through it [curriculum] so that you can get to the next thing so that they get everything [before the test]. I do feel like sometimes I don’t teach things as well as I used to because of the time constraints. You are sacrificing learning time for testing time…we leave very little time to actually teaching. These kids are losing four weeks out of the year of instructional time. Conclusions From Qualitative Study In the main, consistent with other research. Importance of conversations among teachers. Relatively little emphasis on instructional correctives. Alignment and high quality items are essential. Quantitative Study: Research Design and Methods o Survey design o Conducted Spring 2010 o Administered online in 4 school districts o Target population: elementary (4 and 5th grades) and middle school teachers (core content areas) o 460 teachers responded; 390 w/useable responses o Response rates ranged from 25.4% to 85. 1% across the districts1 o Survey items adapted from the Urban Data Study survey, American Institutes for Research o Analyses o o Frequency and measures of central tendency Factor analysis and regression procedures 1. Response rate reported for 3 of the 4 participating districts due to difference in recruitment procedures. Demographic Information: Race and Gender Characteristics n % 61 15.7 Females 328 84.3 White 362 93 Black/ African American 21 5.4 Other 7 .16 Gender a Males Race Note: Total Sample Size N = 390. a. The data contain one missing value. Demographic Information: Grade Level and Years of Experience Characteristics n % Elementary 169 43.3 Middle 221 56.7 5 or less years 56 14.9 6- 10 years 93 24.8 11 + years 226 60.3 Grade Level Teaching Experience Note: Total Sample Size N = 390 Demographic Information: Subjects and Grade Level Characteristics n % All (Elementary) 102 26.2 Reading/ English 119 30.5 111 28.5 Elementary 169 43.3 Middle 221 56.7 Subject a Language Arts Mathematics Grade Level Note: Total Sample Size N = 390. a. Responses to this item allowed for multiple selections. Demographic Information: Degrees Characteristics n % Bachelors Degree 382 97.9 Masters Degree 197 65 Educational Specialist/ 36 18.8 21 11.7 Educational Qualification a Professional Diploma Certificate of Advanced Graduate Studies Note: Total Sample Size N = 390. a. Frequencies will not add up to N=390 due to multiple selections by participants Interim Assessment Survey Survey Topics: Policies and Procedures Variables for Analysis: Six Conditions for Use Accessing Test Data Analyzing Results Instructional Adjustments Authentic Strategies Instructional Uses Use of Scores Attitudes Traditional Strategies Demographics Condition 1: Alignment Items on Scale % Agree or Strongly Agree 1.Well-aligned with state and division standards 77% 2. Well-aligned with the state assessment 69% 3. Well- aligned with the pacing guides 75% 4. Well-aligned with what I teach in the classroom 77% 5. Appropriately challenging for my students 69% Note: Scale- Strongly Disagree= 1; Disagree= 2 Agree= 3 Strongly Agree= 4 Reliability estimate for the scale, Cronbach’s α =.901 (n=300). Condition 2: Division (District) Policy Items on Scale % Agree or Strongly Agree 1. The division sets clear, consistent goals for schools to use data for school improvement. 63% 2. Division staff provide information and expertise that support the data use efforts at my school. 48% 3. The division’s data use policies help us address student needs at our school. 45% 4. The division has designated adequate resources (e.g. time, staff, money) to facilitate teachers’ use of data. 28% Note: Scale- Strongly Disagree= 1; Disagree= 2 Agree= 3 Strongly Agree= 4 Reliability estimate for the scale, Cronbach’s α =.864 (n= 267). Condition 3: School Environment Items on Scale % Agree or Strongly Agree 1. Teachers in this school are continually learning and seeking new ideas. 86% 2. Teachers are engaged in systematic analysis of student performance data. 72% 3. Teachers in this school approach their work with inquiry and reflection. 83% 4. Assessment of student performance leads to changes in the curriculum. 54% 5. Teachers in this school regularly examine school performance on assessments. 78% Note: Agreement Scale- Strongly Disagree= 1; Disagree= 2 Agree= 3 Strongly Agree= 4 Reliability estimate for the scale, Cronbach’s α =.856 (n=283). Condition 4: Time Spent Analyzing and Reviewing Interim Data Items on Scale % 1-2 or more hours Independently 70% Analyzing with other teachers 46% Analyzing with principal/assistant principal 11% With students 45% With parents 8% Note: Frequency Scale- 0, <1 hour; 1-2 hours; 2-3 hours; more than 3 hours; Reliability estimate for the scale, Cronbach’s α =.718 (n = 358). Condition 5: Frequency of Analysis and Review Items on Scale % 1-2 Times a Month or More Department chair/grade-level chair 15% Grade level lead teacher 20% Other teachers 34% Instructional coaches 10% School administrators 12% Central office staff 2% Parents/guardians 11% Students 26% Note: Frequency scale- Never = 1; 1-2 times a quarter = 2; 1-2 times a month= 3; 1-2 times a week = 4 ; Reliability estimate for the scale, Cronbach’s α =.815 (n = 200). Condition 6: Teachers’ Interactions Items on Scale % Moderate or Major Extent Grade level teams to review data 56% Share ideas to improve teaching 56% Share and discuss student work 66% Discuss unsuccessful lessons 56% Note: Extent Scale: Not at all = 1; Slight Extent = 2; Moderate Extent= 3; Major Extent = 4 a. Reliability estimate for the scale, Cronbach’s α =.867 (n = 361). Conditions: Some Additional Individual Items Items on Scale % Hinder to Moderate or Major Extent Lack of time to study and think about data 51% Lack of time to collaborate with others 52% Insufficient professional development 27% Data provided too late for use 7% Curriculum pacing pressures 60% Central office staff 2% Parents/guardians 11% Students 26% Of little use in my instruction 37% agree or strongly agree Note: Scale not at all = 1; minor = 2; moderate = 3; major = 4 ; Instructional Adjustments Component 1 (n=303; alpha = .90) Loading M SD Adjusting goals for student learning .758 2.40 .954 Determining a student’s grouping for instruction .735 2.19 .993 The types of assessments Scale Range I=use 1-to4evaluate students .710 2.24 .940 1= no influence change The instructional strategies I or employ on instruction 4= major influence or change on instruction Adjusting pacing in areas where students encountered problems .698 2.40 .911 .689 2.56 .957 Adjusting use of textbooks and instructional materials .672 2.12 .953 Changed teaching method (e.g., lecture, cooperative learning, student inquiry) .650 2.57 .956 The curriculum content I teach .650 1.89 .911 Use same-level achievement groupings .619 2.19 1.00 Changed the sequence of instruction .617 2.30 1.00 Used mixed-level achievement groupings .557 2.27 1.02 Added, deleted, or changed skills taught .541 2.57 .917 Component 2.31 Instructional Adjustments: Some Individual Items 85% of teachers reported making some kind of change in instructional strategies 67% of teachers reported some level of change in student expectations 84% of teachers reported some level of influence in adjusting goals for student learning 35% of teachers indicated that reviewing results with their principal or assistant principal was somewhat or very useful Instructional Adjustments: Some Individual Items Items on Scale % Increase Problem-solving activities 58% Cooperative/group learning 49% Inquiry/investigation 47% Peer tutoring 31% Collaboration/team teaching 29% Worksheets 8% Textbook-based assignments 8% Lecturing 7% Authentic Instructional Strategies Component 2 (n=310; alpha = .82) Loading M SD Inquiry/Investigation .767 3.54 .798 Problem-solving activities .732 3.67 .790 Scale Range = 1- 5 Project-based assessments 1= Large decrease in use of strategy 5= Large increase in use of strategy Use of student response journals .697 3.31 .816 .659 3.55 1.05 Collaborative/team teaching .630 3.64 .998 Peer or cross-age tutoring .616 3.61 .948 Use of portfolios .606 3.61 1.21 Cooperative learning/group work .602 3.55 .739 Component 3.56 Use of Scores Component 3 (n=283; alpha = .79) Loading M SD Results for subgroups of students (e.g., SWD, ELL/LEP, gender, race/ethnicity) .766 2.36 1.05 Scale scores or other scores that show how close students are to performance levels .736 2.43 1.03 Results for each grade level .724 2.23 1.07 Results for specific reporting categories .698 2.77 1.03 Percent of students score at or above the proficient level .662 2.85 .971 Component 2.53 Scale Range = 1- 4 1= No use 4= Extensive use Traditional Instructional Strategies Component 4 (n=320; alpha = .61) Loading M SD Lecturing .687 2.94 .732 Worksheets .635 2.97 .687 Textbook-based assignments .563 3.14 1.09 Component 3.02 Scale Range = 1- 5 1= Large decrease in use of strategy 5= Large increase in use of strategy Bivariate Correlations Between Conditions and Use Condition Instructional Adjustments Authentic Instructional Strategies Use of Scores Traditional Instructional Strategies Alignment .229* .101 .266* .007 Division Policy .336** .087 .373** -.022 School Environment .218** .084 .189* .088 Frequency of Analysis and Review .425** .249** .400** .036 Teachers’ Interactions .381** .113* .398** -.039 Time Spent Analyzing .425** .249** .400** .036 *correlations significant at .05; **correlations significant at .01. Stepwise Regression: Conditions With Instructional Adjustments Model Variable R R2 Beta Sig. 1 Time Spent Analyzing .488 .239 .300 .000 2 Division Policy .530 .281 .153 .000 3 Frequency Reviewing Data .549 .301 .154 .006 4 Teachers’ Interactions .559 .312 .129 .039 Note: Total Sample Size N = 390. a. The data contain one missing value. Stepwise Regression: Conditions With Use of Specific Scores Model Variable R R2 Beta Sig. 1 Frequency Reviewing Data .406 .165 .175 .000 2 Division Policy .487 .237 .219 .000 3 Time Spent Analyzing .515 .265 .163 .006 4 Time Interacting with Others .530 .281 .156 .039 Note: Total Sample Size N = 390. a. The data contain one missing value. Conclusions From Quantitative Study Interim testing may serve a meaningful formative purpose and effect instruction. District policy and school leadership that encourage an environment in which use of data is encouraged and supported, and making time available for teacher review and analysis of data (especially with other teachers) is positively related to teachers’ instructional adjustments and use of specific report scores. Teachers report extensive use of interim test data across many different instructional adjustments. No single type of adjustment was used most often. Only 37% of teachers agree or strongly agree that interim testing is of little use in instruction. Elementary school teachers’ use of interim data only slightly greater than middle school teachers’ use. Greatest barriers to using interim data are lack of time for review and analysis of data and pacing guide pressures. Recommendations for Effective Practice Recommendation VCU Qualitative Study VCU Quantitative Study Researcher Experience with Districts Clarify purpose – focus on instructional adjustments ✔ ✔ Establish alignment evidence – content ✔ ✔ Establish alignment evidence – cognitive level Provide clear guidelines for use Establish district and school environments that support data-driven decision making ✔ ✔ ✔ ✔ ✔ Literature Riggan & Olah, 2011; Olah, Lawrence & Riggin, 2010; Blanc, Christman, Liu, Mitchell, Travers & Bulkley, 2010; Bulkley, Christman, Goertz & Lawrence, 2010; Christman, Neild, Bulkley, Blanc, Liu, Mitchell & Travers, 2009; Yeh 2006 Blanc, Christman, Liu, Mitchell, Travers & Bulkley, 2010; Bulkley, Christman, Goertz & Lawrence, 2010; Goertz,, Olah & Riggan, 2009; Hintze & Silberglitt, 2005 Bulkley, Christman, Goertz & Lawrence, 2010 Blanc, Christman, Liu, Mitchell, Travers & Bulkley, 2010; Bulkley, Christman, Goertz & Lawrence, 2010 Blanc, Christman, Liu, Mitchell, Travers & Bulkley, 2010; Bulkley, Christman, Goertz & Lawrence, 2010; Christman, Neild, Bulkley, Blanc, Liu, Mitchell & Travers, 2009; Goertz, Olah & Riggan, 2009 & Yeh, 2006 Recommendations for Effective Practice Recommendation Use high quality items Provide structured time for review and analysis Use teams of teachers for review and analysis Include estimates of error Distribute questions along with results, with numbers of students selecting each alternative Monitor unintended consequences Document costs – How much instructional time is being replaced by testing, test prep, and review and analysis of results? How much does the process cost in terms of software and personnel? VCU Qualitative Study VCU Quantitative Study Researcher Experience with Districts ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ Literature Bulkley, Cjristman, Goertz & Lawrence, 2010; Yeh, 2006 Blanc, Christman, Liu, Mitchell, Travers & Bulkley, 2010 Blanc, Christman, Liu, Mitchell, Travers & Bulkley, 2010; Goertz,, Olah & Riggan, 2009; Yeh, 2006 Bulkley, Christman, Goertz & Lawrence, 2010; Yeh, 2006 Olah, Lawrence & Riggan, 2010; Bulkley, Christman, Goertz & Lawrence, 2010; Yeh, 2006 ✔ Bulkley, Christman, Goertz & Lawrence, 2010; ✔ ✔ Recommendations for Effective Practice Recommendation Evaluate use of results – What evidence exists that teachers are using results to modify instruction and that students are learning more? Provide adequate professional development VCU Qualitative Study VCU Quantitative Study ✔ ✔ ✔ ✔ ✔ ✔ Address effect of pacing guide Keep items secure until after test is administered Literature Bulkley, Christman, Goertz & Lawrence, 2010; Standardize administrative procedures for all schools within a district – No longer than one hour for each test Ensure fairness Verify results with other evidence Researcher Experience with Districts ✔ ✔ ✔ ✔ Bulkley, Christman, Goertz & Lawrence, 2010; Christman, Neild, Bulkley, Blanc, Liu, Mitchell & Travers, 2009 Learning from Interim Assessments: District Implementation to Classroom Practice Questions?