CSSS Large Scale Assessment Webinar Adaptive Testing in Science Kevin King (WestEd) Roy Beven (NWEA) 1 CSSS Large Scale Assessment Webinar Adaptive Testing in Science Agenda 1. Experience with CATs 2. CAT Overview: What and Why 3. Longitudinal Scale 4. Nature of CATs and their Item Banks 5. Using different item types in CATs 6. Discussion: Implications for NGSS? 2 CSSS Large Scale Assessment Webinar Adaptive Testing in Science Kevin King: HS Biology, Integrated Science, Research Methods Teacher (9 years) Science Assessment Specialist for UT State (20032010) Assessment Development Coordinator for UT State (2010-2012) Senior Assessment Manager for WestEd (2012present) Roy Beven: HS Physics, Math, Geology, Tech-Ed Teacher (23 years) Lead Science Assessment Specialist for WA State (2001-2008) Senior Science Content Specialist for NWEA (2008present) 3 CSSS Large Scale Assessment Webinar Adaptive Testing in Science Presenters’ Experience with CAT Utah: peer review acceptance of Utah Adaptive Assessment System Smarter Balanced: state co-chair work group member for item development program management liaison for multiple work groups MAP® for Science: an interim adaptive test designed to measure growth administered last year to over 1.7 million students mostly in grades 3-8 across the nation 4 CSSS Large Scale Assessment Webinar Adaptive Testing in Science CAT Overview: What Tests are designed to assess the performance of students by locating them on a scale with a high degree of accuracy and precision. A computer algorithm selects items according to where the student was last on the scale or some other criteria. When the student answers an item correctly, the computer selects an item higher on the scale and vice versa. The computer selects items until all the criteria are met. 5 CSSS Large Scale Assessment Webinar Adaptive Testing in Science CAT Overview: What Correct Student 1 Student 2 Correct Spell “Encyclopedia” Incorrect Student 1 Student 2 Student 1 Student 2 Spell “School” Student 3 Student 4 Incorrect Correct Student 3 Student 4 Student 3 Spell “Red” Incorrect Student 4 6 CSSS Large Scale Assessment Webinar Adaptive Testing in Science CAT Overview: What (continued) Possible criteria for item selection (aka, CAT blueprint): • Student grade range (i.e., blueprinted standards) • Number of items (i.e., operational and field test) • Claims being reported (e.g., 3 or 4 disciplines) • Standard Error of Measurement (SEM) • Adequate coverage of standards • Adequate cognitive complexity (DoK) • Adequate types of items 7 CSSS Large Scale Assessment Webinar Adaptive Testing in Science CAT Overview: What (continued) Constraining a CAT 8 CSSS Large Scale Assessment Webinar Adaptive Testing in Science CAT Overview: What (continued) Sample Test Design with only 3 Criteria - 3 reporting goals (e.g., life, earth/space, physical) - 30 operational items (10 per goal) - SEM Items 1-10 to establish preliminary score Items 11-25 to balance the number of items per goal Items 26 to 30 to establish the SEM 9 CSSS Large Scale Assessment Webinar Adaptive Testing in Science CAT Overview: Why (continued) Tests present an individually tailored set of questions to each student. Tests can quickly identify which skills students have mastered. Tests provide accurate scores for all students across the full range of the achievement continuum. SBAC http://www.smarterbalanced.org/smarter-balanced-assessments/computeradaptive-testing/ CATs have been found to be as accurate as fixed-form tests that are twice as long. CATs drawing from large item pools can provide much more information, and more precise information, than fixed-form tests. CATs provide immediate feedback to students and teachers. ASCD http://www.ascd.org/publications/educational-leadership/mar14/vol71/num06/The-Potential-of-10 Adaptive-Assessment.aspx CSSS Large Scale Assessment Webinar Adaptive Testing in Science Longitudinal Scale Many existing LSA’s develop a new scale for each grade level test each year, then equate these new scales back to the scale established when the tests were first administered . CATs establish one scale. Items are calibrated onto this one scale for the life of the test. The scale could be reestablished, but this would affect all items in the item bank. 11 CSSS Large Scale Assessment Webinar Adaptive Testing in Science Nature of CATs and their Item Banks Larger than static test item banks (typically 4-10 times larger). Last longer than static test item banks, as individual item exposure is limited. Need to cover the “range of the algorithm” criteria (e.g., DoK, standards) at a range of item difficulty. Do not fully know the range of difficulty until after items are field tested. A challenge in building a bank at the onset of a new test. 12 CSSS Large Scale Assessment Webinar Adaptive Testing in Science Using different item types in CATs 1. Multiple Choice dichotomously scored items 2. Technology Enhanced Items (TEI’s) 3. Polytomously score items 4. Constructed response items 5. Common Stimulus Item Sets (CSIS) 6. Simulations with scoring by path (PhET, SimSci, NAEP) 7. Others? 13 CSSS Large Scale Assessment Webinar Adaptive Testing in Science Discussion: Implications for LSA of NGSS? • What part of a state’s NGSS assessment system could (might) be a CAT? • Can (should) a single CAT measure all grade ranges? K-2, 3-5, 6-8, 9-12 • Can (should) a CAT report on the 3 dimensions of the NGSS (DCI’s, SEP’s, and CC’s)? • Can a CAT report on the 4 disciplines of the NGSS? • Can a CAT report on an adequate range of NGSS PE’s? 14