Making Sense of Data from Complex Assessments Robert J. Mislevy University of Maryland Linda S. Steinberg & Russell G. Almond Educational Testing Service FERA November 6, 2001 November 6, 2001 FERA 2001 Slide 1 Buzz Hunt, 1986: How much can testing gain from modern cognitive psychology? So long as testing is viewed as something that takes place in a few hours, out of the context of instruction, and for the purpose of predicting a vaguely stated criterion, then the gains to be made are minimal. November 6, 2001 FERA 2001 Slide 2 Opportunities for Impact Informal / local use Conceptual design frameworks E.g., Grant Wiggins, CRESST Toolkits & building blocks E.g., Assessment Wizard, IMMEX Building structures into products E.g., HYDRIVE, Mavis Beacon Building structures into programs E.g., AP Studio Art, DISC November 6, 2001 FERA 2001 Slide 3 For further information, see... www.education.umd.edu/EDMS/mislevy/ November 6, 2001 FERA 2001 Slide 4 Don Melnick, NBME: “It is amazing to me how many complex ‘testing’ simulation systems have been developed in the last decade, each without a scoring system. “The NBME has consistently found the challenges in the development of innovative testing methods to lie primarily in the scoring arena.” November 6, 2001 FERA 2001 Slide 5 The DISC Project The Dental Interactive Simulations Corporation (DISC) The DISC Simulator The DISC Scoring Engine Evidence-Centered Assessment Design The Cognitive Task Analysis (CTA) November 6, 2001 FERA 2001 Slide 6 Evidence-centered assessment design The three basic models Task Model(s) Evidence Model(s) Student Model Stat model Evidence rules 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx November 6, 2001 FERA 2001 Slide 7 Evidence-centered assessment design What complex of knowledge, skills, or other attributes should be assessed? (Messick, 1992) Task Model(s) Evidence Model(s) Student Model Stat model Evidence rules 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx November 6, 2001 FERA 2001 Slide 8 Evidence-centered assessment design What complex of knowledge, skills, or other attributes should be assessed? (Messick, 1992) Student Model Variables e Model(s) Task Model(s) Student Model Stat model Evidence rules 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx November 6, 2001 FERA 2001 Slide 9 Evidence-centered assessment design What behaviors or performances should reveal those constructs? Task Model(s) Evidence Model(s) Student Model Stat model Evidence rules 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx November 6, 2001 FERA 2001 Slide 10 Evidence-centered assessment design What behaviors or performances should reveal those constructs? Task Model(s) Evidence Model(s) Student Model Stat model Evidence rules Work product 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx November 6, 2001 FERA 2001 Slide 11 Evidence-centered assessment design What behaviors or performances should reveal those constructs? Task Model(s) Evidence Model(s) Observable variables Student Model Stat model Evidence rules Work product 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx November 6, 2001 FERA 2001 Slide 12 Evidence-centered assessment design What behaviors or performances should reveal those constructs? Task Model(s) Evidence Model(s) Student Model Stat model Evidence rules Observable variables 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx November 6, 2001 FERA 2001 Slide 13 Evidence-centered assessment design What behaviors or performances should reveal those constructs? Student Model Student Model Variables Task Model(s) Evidence Model(s) Stat model Evidence rules Observable variables 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx November 6, 2001 FERA 2001 Slide 14 Evidence-centered assessment design What tasks or situations should elicit those behaviors? Task Model(s) Evidence Model(s) Student Model Stat model Evidence rules 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx November 6, 2001 FERA 2001 Slide 15 Evidence-centered assessment design What tasks or situations should elicit those behaviors? Stimulus Specifications Task Model(s) Evidence Model(s) Student Model Stat model Evidence rules 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx November 6, 2001 FERA 2001 Slide 16 Evidence-centered assessment design What tasks or situations should elicit those behaviors? Work Product Specifications Task Model(s) Evidence Model(s) Student Model Stat model Evidence rules 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx November 6, 2001 FERA 2001 Slide 17 Implications for Student Model SM variables should be consistent with … The results of the CTA. The purpose of assessment: What aspects of skill and knowledge should be used to accumulate evidence across tasks, for pass/fail reporting and finer-grained feedback? November 6, 2001 FERA 2001 Slide 18 Simplified Version of the DISC Student Model Information gathering/Usage Communality Assessment Evaluation Student Model 2 Treatment Planning 9/3/99,rjm Simplified version of DISC student model Medical Knowledge Ethics/Legal November 6, 2001 FERA 2001 Slide 19 Implications for Evidence Models The CTA produced ‘performance features’ that characterize recurring patterns of behavior and differentiate levels of expertise. These features ground generally-defined, re-usable ‘observed variables’ in evidence models. We defined re-usable evidence models for recurring scenarios for use with many tasks. November 6, 2001 FERA 2001 Slide 20 An Evidence Model Adapting to situational constraints Information gathering/Usage Addressing the chief complaint Adequacy of examination procedures Assessment Adequacy of history procedures InfoGathAss simplified 9/3/99,rjm A simplified version of the EM for InformationGathering Procedures in the context of Assessment November 6, 2001 Collection of essential information Context FERA 2001 Slide 21 Evidence Models: Statistical Submodel What’s constant across cases that use the EM » Student-model parents. » Identification of observable variables. » Structure of conditional probability relationships between SM parents and observable children. What’s tailored to particular cases » Values of conditional probabilities » Specific meaning of observables. November 6, 2001 FERA 2001 Slide 22 Evidence Models: Evaluation Submodel What’s constant across cases » Identification and formal definition of observable variables. » Generally-stated “proto-rules” for evaluating their values. What’s tailored to particular cases » Case-specific rules for evaluating values of observables-Instantiations of proto-rules tailored to the specifics of case. November 6, 2001 FERA 2001 Slide 23 “Docking” an Evidence Model Student Model Evidence Model Adapting to situational constraints Information gathering/Usage Information gathering/Usage Addressing the chief complaint Adequacy of examination procedures Communality Assessment Assessment Adequacy of history procedures Evaluation Collection of essential information Treatment P lanning Context Medical Knowledge Ethics/Legal November 6, 2001 FERA 2001 Slide 24 “Docking” an Evidence Model Student Model Evidence Model Adapting to situational constraints Information gathering/Usage Addressing the chief complaint Adequacy of examination procedures Assessment Assessment Communality Adequacy of history procedures Evaluation Collection of essential information Treatment P lanning Context Medical Knowledge Ethics/Legal November 6, 2001 FERA 2001 Slide 25 Initial Status All Some None Expert Competent Novice November 6, 2001 .28 .43 .28 FERA 2001 Slide 26 .33 .33 .33 Status after four ‘good’ findings All 1.00 Some .00 None .00 Expert Competent Novice November 6, 2001 .39 .51 .11 FERA 2001 Slide 27 Status after one ‘good’ and three ‘bad’ findings All .00 Some .00 None 1.00 Expert Competent Novice November 6, 2001 .15 .54 .30 FERA 2001 Slide 28 “Docking” another Evidence Model Student Model Evidence Model Information gathering/Usage Communality Assessment Evaluation Treatment P lanning Treatment Planning Adequacy of treatment procedures Individualization of procedures Medical Knowledge Medical Knowledge Effect of treatment on patient Ethics/Legal Performance of extraneous treatment Context November 6, 2001 FERA 2001 Slide 29 “Docking” another Evidence Model Student Model Evidence Model Information gathering/Usage Communality Assessment Evaluation Treatment lanning TreatmentPPlanning Adequacy of treatment procedures Individualization of procedures Medical Knowledge Medical Knowledge Effect of treatment on patient Ethics/Legal Performance of extraneous treatment Context November 6, 2001 FERA 2001 Slide 30 Implications for Task Models Task models are schemas for phases of cases, constructed around key features that ... the simulator needs for its virtual-patient data base, characterize features we need to evoke specified aspects of skill/knowledge, characterize features of tasks that affect difficulty, characterize features we need to assemble tasks into tests. November 6, 2001 FERA 2001 Slide 31 Implications for Simulator Once we’ve determined the kind of evidence we need as evidence about targeted knowledge, how must we construct the simulator to provide the data we need? Nature of problems » Distinguish phases in the patient interaction cycle. » Use typical forms of information & control availability. » Dynamic patient condition & cross time cases. Nature of affordances » » » » Examinees must be able to seek and gather data, indicate hypotheses, justify hypotheses with respect to cues, justify actions with respect to hypotheses. November 6, 2001 FERA 2001 Slide 32 Payoff Re-usable student-model » Can project to overall score for licensing » Supports mid-level feedback as well Re-usable evidence and task models » Can write indefinitely many unique cases using schemas » Framework for writing case-specific evaluation rules Machinery can generalize to other uses & domains November 6, 2001 FERA 2001 Slide 33 Part 2 Conclusion Two ways to “score” complex assessments THE HARD WAY: Ask ‘how do you score it?’ after you’ve built the assessment and scripted the tasks or scenarios. A DIFFERENT HARD, BUT MORE LIKELY TO WORK, WAY: Design the assessment and the tasks/scenarios around what you want to make inferences about, what you need to see to ground them, and the structure of the interrelationships. November 6, 2001 FERA 2001 Slide 34 Grand Conclusion We can attack new assessment challenges by working from generative principles: Principles from measurement and evidentiary reasoning, coordinated with... inferences framed in terms of current and continually evolving psychology, using current and continually evolving technologies to help gather and evaluate data in that light, in a coherent assessment design framework. November 6, 2001 FERA 2001 Slide 35 November 6, 2001 FERA 2001 Slide 36