A Taxonomy of Adaptive Testing Robert J. Mislevy Measurement, Statistics & Evaluation University of Maryland in collaboration with Roy Levy John T. Behrens Arizona State University Cisco Systems, Inc. Presented at the Fifth Annual Technology for Second Language Learning Conference, September 21-22, 2007, Iowa State University, Ames, Iowa, USA September 22, 2007 TSLL 07 Slide 1 Terminology & Concepts for Adaptive Testing Adaptive testing » Most familiar as item response-theory based computer-adaptive testing (IRT-CAT) Can take a broader perspective of evidentiary reasoning We will look at the interplay among inferences and data gathering A taxonomy of configurations » IRT-CAT plus many others September 22, 2007 TSLL 07 Slide 2 Taxonomy based on three dimensions … Claim status Observation status Locus of control September 22, 2007 TSLL 07 Slide 3 Background for the dimensions Glenn Shafer’s “Frame of discernment” Evidence–centered assessment design September 22, 2007 TSLL 07 Slide 4 “Frame of discernment” From Shafer’s (1976) A mathematical theory of evidence. It’s all the possible combinations of values of the variables your are working with. “Frame” emphasizes how it effectively circumscribes a universe in which inference will take place “Discern” = “detect, recognize, distinguish” Property of you as much as property of world Depends on what you know and what your purpose is September 22, 2007 TSLL 07 Slide 5 “Frame of discernment” Frames of discernment can evolve over time, as beliefs, knowledge, and aims unfold over time. E.g., dip for the party? medical diagnosis Move from one frame of discernment to another by ascertaining values of some variables, dropping others, adding new variables or refining current ones constructing a different frame when observations cause rethinking of assumptions or goals September 22, 2007 TSLL 07 Slide 6 Evidence-Centered Design Mislevy, Steinberg, & Almond (2003) “On the structure of educational assessments.” Educational assessment as evidentiary argument: We reason from the things students say, do, or make in a handful of particular settings, to what they know, can do in various situations, or have accomplished, as more broadly construed. All elements of an assessment, from analysis of domain, through design, to operation, are based on building then embodying such an argument in operational procedures. September 22, 2007 TSLL 07 Slide 7 Toulmin’s Argument Structure Claim unless Alternative Warrant explanation since so Backing September 22, 2007 Data TSLL 07 Slide 8 An Assessment Design Argument Aspects of performance that bear on claims is captured in terms of Warrant observable variables (OVs) Backing Claim about student in some frame of discernent so Data concerning performance Formative assessments often have highly specific claims, summative assessments tend Whatclaims. aspects of the to have broader Data concerning situation Student acting in assessment situation September 22, 2007 Information pertinent to addressing the claims is accumulated in terms of student-model variables (SMVs) TSLL 07 situation are important for the possibility of inference about examinee? What we actually see/hear the student say, do, or make Slide 9 Adaptive Testing 4. Update belief about claim Claim about student in some frame of discernent 5. Somebody has choice about whether to refocus claim Warrant so Backing of 3. Evaluation performance in light of current targeted claim Data concerning performance Data concerning situation 1. Somebody selects situation for getting information Student acting in assessment situation 2. Examinee acts September 22, 2007 TSLL 07 Slide 10 What is an adaptive test? At a given time in an assessment system, The set of student-model variables and observable variables consitutes a frame of discernment. An adaptive test is one in which the frame of discernment changes over time as a function of the values of observations. Ways it might change are the basis of the taxonomy. September 22, 2007 TSLL 07 Slide 11 Claim Status Is the claim part of the frame of discernment, i.e., SMVs, fixed or evolving? i.e., do the SMVs at issue stay the same or change (as opposed to knowledge about SMVs)? September 22, 2007 TSLL 07 Slide 12 Observation status Is the data part of the frame of discernment, i.e., OVs, fixed or evolving? i.e., does the choice of OVs that can be made stay the same or change as more information is obtained? September 22, 2007 TSLL 07 Slide 13 Locus of Control If the claim part of the frame is changing as the test procedes, who decides how it should change: The examiner or the examinee? If the data part of the frame is changing as the test procedes, who decides how it should change: The examiner or the examinee? September 22, 2007 TSLL 07 Slide 14 Observation status Fixed Claim status Fixed Adaptive: Examiner Determined Adaptive: Examinee Determined 1. Usual, linear test Adaptive: Examiner Determined Adaptive: Examinee Determined 2. IRT-CAT “User friendly” testing Observation status Fixed Claim status Fixed Adaptive: Examiner Determined Adaptive: Examinee Determined 1. Usual, linear test Adaptive: Examiner Determined 2. IRT-CAT Guided / diagnostic Adaptive: Examinee Determined Observation status Fixed Claim status Fixed 1. Usual, linear test Adaptive: Examiner Determined 2. IRT-CAT Adaptive: Examiner Determined Adaptive: Examinee Determined Self-guided / diagnostic Adaptive: Examinee Determined Cell 1: Fixed, examiner-controlled claim; Fixed, examiner-controlled observation Traditional assessments in which … Same kind of claim(s) / inferences / SMVs for everyone they were decided on by the examiner a priori, tasks presented are determined by the examiner a priori, the examiner determines the sequence of tasks a priori Neither the frame of discernment nor the gathering of evidence varies in response to values of observable variables or their impact on beliefs about SMVs. September 22, 2007 TSLL 07 Slide 18 Cell 2: Fixed, examiner-controlled claim; Adaptive, examiner-controlled observation Same claims space (SMVs) for everyone the claims (SMVs) were decided on by the examiner, the tasks presented are determined by examiner a priori, But in light of unfolding pattern responses, examiner selects items, to maximize accuracy IRT-CAT (Can be multivariate; Segall, 1996). Binet’s original individually-administered intelligence test Lord’s Flexi-level scheme September 22, 2007 TSLL 07 Slide 19 Cell 3: Fixed, examiner-controlled claim; Adaptive, examinee-controlled observation Same claims space (SMVs) for everyone the claims (SMVs) were decided on by the examiner. But examinee is able to determine tasks in light of how he/she chooses. “User friendly” Pole-vaulting competition Self-adaptive SAT (Wise et al, 1992): Student chooses items by page or bin, grouped by difficulty. IRT scoring takes difficulty into account. (also see Wright, 1977) Guard against nonignorable missingness (free throws) September 22, 2007 TSLL 07 Slide 20 Cell 4: Adaptive, examiner-controlled claim; fixed, examiner-controlled observation Same tasks (OVs) for everyone Same presentation of tasks, determined a priori by examiner. But examiner determines claims (SMVs) for examinee in light of responses. E.g., MMPI – same 100’s of items for everyone, but examiner may compute different scales for different patients. Diagnostic “reading record” test in language testing Note: Need multidimensional claim space in Cells 4-9. September 22, 2007 TSLL 07 Slide 21 Cell 5: Adaptive, examiner-controlled claim; adaptive, examiner-controlled observation Claims may diverge for different examinees in light of data Different tasks for different examinees, to be optimal in light of the claims examiner wants to make about them as individuals E.g., Triage in medicine, followed by different diagnostics Adaptive MMPI – different items for everyone, adaptively selected for different scales for scales for different patients. Differential strategies in math (Tatsuoka) Adaptive diagnosis in language testing September 22, 2007 TSLL 07 Slide 22 Cell 6: Adaptive, examiner-controlled claim; adaptive, examinee-controlled observation Examiners can home in on different claims for different examinees in light of data, but Examinees have at least some control over task selection. E.g., Self-adaptive tests, but along dimensions controlled by examiner. Mulivariate SA-SAT, examiner’s inferences. Diagnostic / placement tests, homing in on different remedial needs of students, but allowing for lower-stress choices of groups/pages of tasks like in Cell 3. Thus examiner tailors claims part of frame of discernment, examinee tailors overvations part given claims. September 22, 2007 TSLL 07 Slide 23 Cell 7: Adaptive, examinee-controlled claims; fixed, examiner-controlled observations Examinees all take same examiner-determined items in examiner-determined way, but … Examinees can home in on different claims of their choosing in light of data. E.g., MMPI, but examinee determines which scales to compute & analyze. Oral reading of a fixed sample, automated parsing— student determines what to work on next (maybe could be done with Ordinate-like setup?) September 22, 2007 TSLL 07 Slide 24 Cell 8: Adaptive, examinee-controlled claims; adaptive, examiner-controlled observations Examinee chooses the claim, at beginning or adaptively, examiner controls tasks presentation for optimal precision. E.g., structured self-diagnosis: MMPI, where examinee determines which scales to focus on and is presented items adaptively for those scales. Oral readings w. automated parsing—student determines what to work on next, then examiner-selected samples to focus on what examinee wants to follow up on. SIGI: Sequential exploration of career interests -- examinee chooses categories and system asks adaptive questions. September 22, 2007 TSLL 07 Slide 25 Cell 9: Adaptive, examinee-controlled claims; adaptive, examinee-controlled observations Examinees control both the claims and the tasks to yield observations for those claims. The examinee selects the claims to focus on and then has input into what data will be observed. Feedback from system to help examinee figure out what they want to know, then offer them choices about directions to go to refine information they receive (continued) September 22, 2007 TSLL 07 Slide 26 Cell 9, continued: Adaptive, examinee-controlled claims; adaptive, examinee-controlled observations E.g., guided self-diagnosis: Central challenge in retrieval systems in libraries -organize materials and search terms to help patrons find the information they might want Amazon: “Customers who looked at these books you selected also looked at…” Multivariate SA-SAT practice exploration space Language testing self-diagnosis: Start with common passage or list of areas, do diagnostics, use results to refine testing for areas you are interested in. September 22, 2007 TSLL 07 Slide 27 Observation status Fixed Claim status Adaptive: Examiner Determined Adaptive: Examinee Determined Fixed 1. Usual, linear test 2. IRT-CAT 3. Self-adapting tests e.g., SA-SAT (Wise et al., 1992) Adaptive: Examiner Determined 4. MMPI— examiner decides how to pursue analysis 5. Examiner chooses target, Multidim CAT 6. Examiner chooses target in Multidim SASAT Adaptive: Examinee Determined 7. MMPI— examinee decides how to pursue analysis 8. Examinee chooses target, Multidim CAT 9. Examinee chooses target & tasks in Multidim SA-SAT Conclusion Assessments involving adaptive claims have yet to achieve the prominence of adaptive-observation assessments. » History, up-front work, solving known “centralized” problems User-controlled assessment not seen as assessment User modeling literature will be important Cells 8 & 9 good for self-directed learning in a supported environment » Like user-modeling strategies for buying cars, choosing movies, finding information in library systems. September 22, 2007 TSLL 07 Slide 29