Assessment Accommodations: What Have We Learned From Research? Stephen G. Sireci Center for Educational Assessment University of Massachusetts Amherst Mary J. Pitoniak Educational Testing Service © copyright 2006 Stephen G. Sireci In this presentation we will • Discuss validity issues in test accommodations • List the most common test accommodations used to promote valid score interpretation • Discuss research conducted on test accommodations • Suggest areas for future research on test accommodations © copyright 2006 Stephen G. Sireci Defining “Accommodation” • The Standards for Educational and Psychological Testing – use the terms “modification” and “accommodation” almost interchangeably, – use accommodation “as the general term for any action taken in response to a determination that an individual’s disability requires a departure from standard testing protocol” (p. 101). © copyright 2006 Stephen G. Sireci Current State Testing Programs • “Accommodation” is used to refer to test or test administration changes that are not considered to alter the construct measured. • “Modification” is used to refer to changes that are thought to alter the construct. © copyright 2006 Stephen G. Sireci Validity Issues in Accommodations • To support valid test score interpretations for students with disabilities, it is important to remove construct-irrelevant barriers to these students’ test performance, but it is also important to maintain “construct representation.” • In situations where individuals who take accommodated versions of tests are compared with those who take the standard version, an additional validity issue is the comparability of scores across the different test formats. © copyright 2006 Stephen G. Sireci The Psychometric Oxymoron • Accommodated Standardized Test – Promotes fairness in testing? Or – Provides an unfair advantage to some examinees? What do the Standards for Educational and Psychological Testing say on this issue? © copyright 2006 Stephen G. Sireci Standards for Educational and Psychological Testing • Standard 10.1: “In testing individuals with disabilities, test developers, test administrators, and test users should take steps to ensure that the test score inferences accurately reflect the intended construct rather than any disabilities and their associated characteristics extraneous to the intent of the measurement” (AERA, et al., p. 106). © copyright 2006 Stephen G. Sireci Standards for Educational and Psychological Testing • Standard 10.4: If modifications are made or recommended by test developers. . . (unless) evidence of validity for a given inference has been established for individuals with the specific disabilities, test developers should issue cautionary statements in manuals or supplementary materials regarding confidence in interpretations based on such test scores” (AERA et al., p. 106). © copyright 2006 Stephen G. Sireci “Cautionary statements” • Flagging of test scores: Controversial— most research in this area focused on postsecondary and postgraduate admissions tests (Sireci, 2005). • How do states handle score reporting issues for accommodated and alternate assessments? © copyright 2006 Stephen G. Sireci Accommodated Tests and Accommodated Test Administrations have the Potential to Undermine Validity in at Least 2 Ways: 1. Construct underrepresentation 2. Construct-irrelevant variance As stated by Messick (1989): “Tests are imperfect measures of constructs because they either leave out something that should be included…or else include something that should be left out, or both” (p. 34) © copyright 2006 Stephen G. Sireci • When standardized tests are NOT accommodated for SWD – Construct-irrelevant variance can interfere with test performance • e.g. ability to see, hear, focus, interferes with measurement of math or reading proficiency • When standardized tests ARE accommodated – Construct underrepresentation may occur • e.g., read-aloud for a reading assessment © copyright 2006 Stephen G. Sireci What methods do states use to minimize construct-irrelevant variance, while maintaining construct representation? © copyright 2006 Stephen G. Sireci Categories of Accommodations • • • • Presentation Timing Response Setting Thompson, Blount, and Thurlow (2002) © copyright 2006 Stephen G. Sireci Presentation Accommodations •Oral (read-aloud, audiocassette) • Paraphrasing • Technological • Braille/large print • Sign language interpreter • Encouragement (redirecting) • Cueing • Spelling assistance • Use of manipulatives © copyright 2006 Stephen G. Sireci Timing Accommodations • Extended time • Multiple days/sessions • Separate sessions Timing accommodations are not so much an issue on state standards-based assessments because most have generous time limits. © copyright 2006 Stephen G. Sireci Response Accommodations • • • • Scribe Booklet versus answer sheet Marking booklet to maintain place Transcription Setting Accommodations • Individual administration • Administration in a separate room © copyright 2006 Stephen G. Sireci Other Accommodations • Alternate assessment • Others? © copyright 2006 Stephen G. Sireci Psychometric Research on Test Accommodations Has Focused On •Has the accommodation changed the construct measured? •Speed •Different skill •Do accommodations help only those who need them? –Interaction hypothesis •Do test scores from accommodated and non-accommodated administrations have the same meaning? © copyright 2006 Stephen G. Sireci Research on test accommodations for individuals with disabilities: •Little empirical study •Some literature reviews –Willingham et al. (1988) ─Chiu & Pearson (1999) –Tindal & Fuchs (2000) ─Pitoniak & Royer (2001) –Thompson et al. (2002) ─Bolt & Thurlow (2004) –Sireci, Scarpati, & Li (2005) •Psychometric issues (Geisinger, 1994) •Legal issues (Phillips, 1994) •Also: Keeping Score for All (Koenig & Bachman, 2004) © copyright 2006 Stephen G. Sireci Sireci, Scarpati, & Li (2005) Research Questions • Do test accommodations improve the scores of students with disabilities (SWD)? • If so, do such score gains reflect increased validity or unfair advantage? – Interaction hypothesis • What specific types of accommodations are best for specific types of students? © copyright 2006 Stephen G. Sireci Interaction Hypothesis Figure 1 Illustration of Interaction Hypothesis 60 Mean Score 50 40 30 GROUP 20 GEN 10 SWD/ELL No ACC ACC Accommodation Condition Macarthur & Cavalier (2004) “Differential impact on students with and without disabilities provides evidence that the accommodation removes a barrier based on disability” (p. 55). © copyright 2006 Stephen G. Sireci Fletcher et al. (2006) “Because the source of variance is fundamentally irrelevant to the measurement of the construct, a valid accommodation will improve performance only for students with a disability” (p. 138). © copyright 2006 Stephen G. Sireci Are there any general conclusions regarding effects? • Extended time seems to help and it helps SWD more than non-SWD. • Oral accommodations show promise (math), but less uniformity across studies. Effects are considered unclear. © copyright 2006 Stephen G. Sireci Review Process • ERIC and PsychInfo searches • E-mails to researchers in this area © copyright 2006 Stephen G. Sireci Structure of review • Dimension 1: SWD or ELL • Dimension 2: Type of accommodation • Dimension 3: Experimental or non-experimental study Note that the review was primarily conducted in 2003 and so the results are somewhat dated. We have, however, reviewed additional research since then. © copyright 2006 Stephen G. Sireci Characteristics of Studies Study Focused On Research Design Experimental Quasi-experimental Non-experimental Total Total SWD ELL 13 8 21 2 4 6 10 1 11 25 13 38 Studies pertaining exclusively to ELL will not be discussed in this presentation. © copyright 2006 Stephen G. Sireci Types of Accommodations Type(s) of Accommodation # of Studies Presentation: Oral* 23 Paraphrase 2 Technological 2 Braille/Large Print 1 Sign Language 1 Encouragement 1 Cueing 1 Spelling assistances 1 Manipulatives 1 *Includes read aloud, audiotape, or videotape, and screen-reading software. Note: Literature reviews and issues papers are not included in this table. Types of Accommodations Type(s) of Accommodation # of Studies Timing: Extended time 14 Multi day/sessions 1 Separate sessions 1 Response: Scribes 2 In booklet vs. answer sheet 1 Mark task book to maintain place 1 Transcription 1 Setting (separate room) 1 Note: Literature reviews and issues papers are not included in this table. Characteristics of Studies • Most of the studies focused on elementary school (2/3 between grades 3 and 8). • Only 41% were published in peer-reviewed journals. © copyright 2006 Stephen G. Sireci Results: Extended Time • Most common findings were gains for both SWD and and non-SWD. – Contrast Camara et al. (1998) with Bridgeman et al. (in press) • Most studies of extended time (6 of 8) looked at students with learning disabilities (SWLD) © copyright 2006 Stephen G. Sireci Summary of Studies on Extended Time (1) Study Subject(s) Design Elliott & Marquart (2004) Math Experimental Runyan (1991) Reading Experimental Zurcher & Bryant (2001) Huesman & Frisbie (2000) Analogy test Quasiexperimental Quasiexperimental Quasiexperimental Alster (1997) Reading Math Results All student groups gained Greater gains for SWD No gains for either group Gains for LD but not for non-LD groups Greater gains for SWD © copyright 2006 Stephen G. Sireci H1? No Yes No Yes Yes Summary of Studies on Extended Time (2) Study Subject(s) Design Results Gains for LD Camara, Copeland, retesters 3x > greater & Rothchild SAT Ex post facto than standard (1998) retesters Gains for LD Ziomek & retesters 4x > greater ACT Ex post facto Andrews (1998) than gains of standard retesters Reading, Gains for both SWD Zuriff (2000) 5 experimental ACT, GRE and non-SWD © copyright 2006 Stephen G. Sireci H1? Yes Yes No Results: Oral • Results depend on subject – Gains for SWD only in Math – No differential gain in other subject areas – Tends to support oral accommodation for math tests © copyright 2006 Stephen G. Sireci Study Subject Design Results H1? Weston (2002) Math Experimental (b/w and w/in groups) Greater gains for SWD Yes Tindal, Heath, et al. (1998) Math Experimental (b/w and w/in groups) Sig. gain for SWD only Yes Sig. gains for oral accom., no differences b/w teacher & computer Yes Calhoon, Fuchs, & Hamlett (2000) Math Experimental (w/in group) Johnson (2000) Math Experimental (b/w group) Huynh, Meyer, & Gallant (2004) Helwig, & Tindal (2003) Meloy, Deville, & Frisbie (2000) Math Ex post facto Math Quasi-experimental Science , Math, Reading Experimental (b/w and w/in groups) Greater gains for SWD Yes Accommodated SWD > matched non-accom. SWD Yes Teachers not accurate in predicting benefit; no gains for either group. No Similar gains for SWD and nonSWD No Oral (continued) Study Subject Design Results H1? Brown & Augustine (2001) Science, Social Studies Experimental (b/w and w/in groups) No gain No Kosciolek & Ysseldyke (2000) Reading Quasi-experimental SWD had greater gains, but not statistically significant No Reading Experimental (b/w and w/in groups) McKevitt & Elliot (2003) No sig. effect size differences b/w accom. & standard. conditions for either group. No More Recent Research • Extended time – Cohen, Gregg, & Deng (2005) – Wainer, Bridgeman, Najarian, & Trapani (2004) • Oral – Fletcher, Francis, Boudousquie, Copeland, Young, Kalinowski, & Vaughn (2006) • Dictation software – MacArthur & Cavalier (2004) © copyright 2006 Stephen G. Sireci Cohen, Gregg, & Deng (2005) • Looked at groups of students with and without accommodations and their performance on specific types of math items using differential item functioning methods – Accommodation status “only marginally related to the pattern of accommodation-related DIF” – Different types of students benefited from the extra time – DIF not due to accommodations, but to differences in students’ performance across different types of math items © copyright 2006 Stephen G. Sireci Cohen, Gregg, & Deng (2005) “Accommodations are more appropriately viewed as leveling the playing field; they do not supply the knowledge necessary to pass tests” (p. 231). © copyright 2006 Stephen G. Sireci Wainer et al. (2004) • Reanalysis of Bridgeman, Trapani, & Curley (2004) data • Evaluated extended time by shortening experimental sections of SAT • Little difference for verbal (about 5-point gain) • Big difference for quantitative – about 10-30 points, with larger gain associated with larger time extension – Largest gains for highest-scoring students © copyright 2006 Stephen G. Sireci Wainer et al. (2004) • Looked at correlations b/w scores from standard and extended time with students’ HS math grades – Claimed no relationship, but results (correlations and sample sizes) were not reported! – Important idea to look at external validity criterion © copyright 2006 Stephen G. Sireci Wainer et al. (2004) • Claim that results support not flagging verbal, but should flag quantitative – Don’t acknowledge presence of undesired speededness – SWD not included in study • Hard to agree with conclusions • Supports increasing time limit on SAT-Q © copyright 2006 Stephen G. Sireci Fletcher et al. (2006) • Experimental study involving Grade 3 students with (n=91) and without (n=91) decoding difficulties associated with dyslexia • Oral vs. standard accommodation reading test (Texas) © copyright 2006 Stephen G. Sireci Fletcher et al. (2006) • Accommodation targeted for specific disability – Oral reading of proper nouns, comprehension stems, & answer choices – Designed to reduce the impact of word recognition difficulties © copyright 2006 Stephen G. Sireci Fletcher et al. (2006) • Results – Significant group/accommodation interaction – Only SWD benefited from the accommodation – Seven times greater likelihood of passing the test with the accommodation © copyright 2006 Stephen G. Sireci Macarthur & Cavalier (2004) • Looked at accommodations for writing assessments – Experimental study: SWD (n=21), students w/o documented disability (n=10) – Three accommodation conditions: • hand-written • dictation to scribe • dictation to speech recognition software – 48 states allow dictation accommodation (17 exclude scores) © copyright 2006 Stephen G. Sireci Macarthur & Cavalier (2004) • Results: – Dictation improved writing scores for SWD, with Scribe > speech recognition software > hand-written – Dictation did not improve scores for students w/o disability – No difference between student groups with respect to preference (hand vs. dictation) © copyright 2006 Stephen G. Sireci Macarthur & Cavalier (2004) • Caveat – Small n (21, 10) • Construct issue – Dictation okay if construct = “composing” – Not okay if construct=“writing” © copyright 2006 Stephen G. Sireci Research on Equivalence of Test Structure • One aspect of “construct equivalence” – – – – – Rock, Bennett, Kaplan, & Jirele (1988) Tippets & Michaels (1997) Huynh, Meyer, & Gallant (2004) Huynh & Barton (2006) Cook, Eignor, Sawaki, Steinberg, & Cline (2006) © copyright 2006 Stephen G. Sireci Research on Equivalence of Test Structure Results tend to support similarity of test structure across accommodated and standard test administrations (oral, extended time, various). © copyright 2006 Stephen G. Sireci Discussion (1) • Do accommodations hurt or promote valid score interpretations for students with disabilities? – Accommodations are designed to promote validity by removing barriers (irrelevant variance) – In general, the research suggests the accommodations being used are sensible and defensible. © copyright 2006 Stephen G. Sireci Discussion (2) • Extended time seems to be a valid accommodation. – Unintended test speededness could explain results for students w/o disabilities – Result support revised interaction hypothesis or “differential boost.” © copyright 2006 Stephen G. Sireci Interaction Hypothesis: Typical Illustration of Interaction Hypothesis 60 Mean Score 50 40 30 GROUP 20 GEN 10 SWD/ELL No ACC ACC Accommodation Condition Interaction Hypothesis: Revised “Differential Boost”(Fuchs, Fuchs, Eaton, Hamlett, & Karns, 2000) Illustration of Revised Interaction Hypothesis 60 Mean Score 50 40 30 GROUP 20 GEN 10 SWD/ELL No ACC ACC Accommodation Condition © copyright 2006 Stephen G. Sireci Discussion (3) • Other accommodations have less consistent and convincing results, but no evidence of “harm” or “unfairness.” • It should be noted that lots of solid and ingenious experimental research has been done in this area. – Small n, but intense with respect to data collection © copyright 2006 Stephen G. Sireci Discussion (4) • Oral accommodation for math seems valid. • Oral accommodation for reading involves consideration of specific construct changes – Fletcher et al. (2006) results indicate matching disability and accommodation to one aspect of construct promotes validity © copyright 2006 Stephen G. Sireci Discussion (5) • Looking across various studies and accommodation conditions – Lots of variability across studies with respect to • accommodation conditions and how they were implemented • Student groups (within and between) • Results © copyright 2006 Stephen G. Sireci Future Directions for Test Design • Test Development: Universal test design – Build tests that are “accessible to all” (i.e., that do not need to be accommodated). – CBT could be particularly helpful in this regard. – 19th & 20th century: Standardization – 21st century?—Adaptivity? (can’t be oxymoronic) © copyright 2006 Stephen G. Sireci Future Directions for Research (1) • Meta-analysis based on practice – Non-published test accommodations being conducted in states – Establish a data warehouse for teachers and test administrators to record results and make comments? – Would address the small-n issue © copyright 2006 Stephen G. Sireci Future Directions for Research (2) • Larger sample sizes due to inclusion, coupled with improved school data management systems should promote more research on – Differential item functioning – Structural equivalence – Analysis of educational gains © copyright 2006 Stephen G. Sireci Future Directions for Research (3) • More needs to be done on potential changes to the construct – Most often decided by logical analysis – Structural equivalence research is limited – Structural equivalence construct equivalence © copyright 2006 Stephen G. Sireci Let’s go do it! Thank you for your attention! Sireci@acad.umass.edu Mpitoniak@ets.org © copyright 2006 Stephen G. Sireci