Making Sense of Data from Complex Assessments Robert J. Mislevy

advertisement
Making Sense of Data from
Complex Assessments
Robert J. Mislevy
University of Maryland
Linda S. Steinberg & Russell G. Almond
Educational Testing Service
FERA
November 6, 2001
November 6, 2001
FERA 2001
Slide 1
Buzz Hunt, 1986:
How much can testing gain from modern
cognitive psychology?
So long as testing is viewed as something that
takes place in a few hours, out of the context of
instruction, and for the purpose of predicting a
vaguely stated criterion, then the gains to be
made are minimal.
November 6, 2001
FERA 2001
Slide 2
Opportunities for Impact


Informal / local use
Conceptual design frameworks
E.g., Grant Wiggins, CRESST

Toolkits & building blocks
E.g., Assessment Wizard, IMMEX

Building structures into products
E.g., HYDRIVE, Mavis Beacon

Building structures into programs
E.g., AP Studio Art, DISC
November 6, 2001
FERA 2001
Slide 3
For further information, see...
www.education.umd.edu/EDMS/mislevy/
November 6, 2001
FERA 2001
Slide 4
Don Melnick, NBME:
“It is amazing to me how many complex
‘testing’ simulation systems have been
developed in the last decade, each without a
scoring system.
“The NBME has consistently found the
challenges in the development of innovative
testing methods to lie primarily in the scoring
arena.”
November 6, 2001
FERA 2001
Slide 5
The DISC Project

The Dental Interactive Simulations
Corporation (DISC)

The DISC Simulator

The DISC Scoring Engine

Evidence-Centered Assessment Design

The Cognitive Task Analysis (CTA)
November 6, 2001
FERA 2001
Slide 6
Evidence-centered assessment design
The three basic models
Task Model(s)
Evidence Model(s)
Student Model
Stat model
Evidence
rules
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
November 6, 2001
FERA 2001
Slide 7
Evidence-centered assessment design

What complex of knowledge, skills, or other
attributes should be assessed?

(Messick, 1992)
Task Model(s)
Evidence Model(s)
Student Model
Stat model
Evidence
rules
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
November 6, 2001
FERA 2001
Slide 8
Evidence-centered assessment design

What complex of knowledge, skills, or other
attributes should be assessed?
(Messick, 1992)
Student Model
Variables e Model(s)
Task Model(s)
Student Model
Stat model
Evidence
rules
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
November 6, 2001
FERA 2001
Slide 9
Evidence-centered assessment design

What behaviors or performances should reveal
those constructs?
Task Model(s)
Evidence Model(s)
Student Model
Stat model
Evidence
rules
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
November 6, 2001
FERA 2001
Slide 10
Evidence-centered assessment design

What behaviors or performances should reveal
those constructs?
Task Model(s)
Evidence Model(s)
Student Model
Stat model
Evidence
rules
Work product
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
November 6, 2001
FERA 2001
Slide 11
Evidence-centered assessment design

What behaviors or performances should reveal
those constructs?
Task Model(s)
Evidence Model(s)
Observable variables
Student Model
Stat model
Evidence
rules
Work product
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
November 6, 2001
FERA 2001
Slide 12
Evidence-centered assessment design

What behaviors or performances should reveal
those constructs?
Task Model(s)
Evidence Model(s)
Student Model
Stat model
Evidence
rules
Observable variables
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
November 6, 2001
FERA 2001
Slide 13
Evidence-centered assessment design

What behaviors or performances should reveal
those constructs?
Student Model
Student Model
Variables
Task Model(s)
Evidence Model(s)
Stat model
Evidence
rules
Observable variables
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
November 6, 2001
FERA 2001
Slide 14
Evidence-centered assessment design

What tasks or situations should elicit those
behaviors?
Task Model(s)
Evidence Model(s)
Student Model
Stat model
Evidence
rules
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
November 6, 2001
FERA 2001
Slide 15
Evidence-centered assessment design

What tasks or situations should elicit those
behaviors?
Stimulus
Specifications
Task Model(s)
Evidence Model(s)
Student Model
Stat model
Evidence
rules
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
November 6, 2001
FERA 2001
Slide 16
Evidence-centered assessment design

What tasks or situations should elicit those
behaviors?
Work Product
Specifications
Task Model(s)
Evidence Model(s)
Student Model
Stat model
Evidence
rules
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
November 6, 2001
FERA 2001
Slide 17
Implications for Student Model
SM variables should be consistent with …
The results of the CTA.
The purpose of assessment:
What aspects of skill and knowledge should
be used to accumulate evidence across
tasks, for pass/fail reporting and finer-grained
feedback?
November 6, 2001
FERA 2001
Slide 18
Simplified Version of the
DISC Student Model
Information gathering/Usage
Communality
Assessment
Evaluation
Student Model 2
Treatment Planning
9/3/99,rjm
Simplified version of DISC
student model
Medical Knowledge
Ethics/Legal
November 6, 2001
FERA 2001
Slide 19
Implications for Evidence Models

The CTA produced ‘performance features’ that
characterize recurring patterns of behavior and
differentiate levels of expertise.

These features ground generally-defined, re-usable
‘observed variables’ in evidence models.

We defined re-usable evidence models for recurring
scenarios for use with many tasks.
November 6, 2001
FERA 2001
Slide 20
An Evidence Model
Adapting to situational constraints
Information gathering/Usage
Addressing the chief complaint
Adequacy of examination procedures
Assessment
Adequacy of history procedures
InfoGathAss simplified
9/3/99,rjm
A simplified version of the EM
for InformationGathering
Procedures in the context of
Assessment
November 6, 2001
Collection of essential information
Context
FERA 2001
Slide 21
Evidence Models: Statistical Submodel

What’s constant across cases that use the EM
» Student-model parents.
» Identification of observable variables.
» Structure of conditional probability relationships between
SM parents and observable children.

What’s tailored to particular cases
» Values of conditional probabilities
» Specific meaning of observables.
November 6, 2001
FERA 2001
Slide 22
Evidence Models: Evaluation Submodel

What’s constant across cases
» Identification and formal definition of observable variables.
» Generally-stated “proto-rules” for evaluating their values.

What’s tailored to particular cases
» Case-specific rules for evaluating values of observables-Instantiations of proto-rules tailored to the specifics of case.
November 6, 2001
FERA 2001
Slide 23
“Docking” an Evidence Model
Student Model
Evidence Model
Adapting to situational constraints
Information gathering/Usage
Information gathering/Usage
Addressing the chief complaint
Adequacy of examination procedures
Communality
Assessment
Assessment
Adequacy of history procedures
Evaluation
Collection of essential information
Treatment P lanning
Context
Medical Knowledge
Ethics/Legal
November 6, 2001
FERA 2001
Slide 24
“Docking” an Evidence Model
Student Model
Evidence Model
Adapting to situational constraints
Information gathering/Usage
Addressing the chief complaint
Adequacy of examination procedures
Assessment
Assessment
Communality
Adequacy of history procedures
Evaluation
Collection of essential information
Treatment P lanning
Context
Medical Knowledge
Ethics/Legal
November 6, 2001
FERA 2001
Slide 25
Initial Status
All
Some
None
Expert
Competent
Novice
November 6, 2001
.28
.43
.28
FERA 2001
Slide 26
.33
.33
.33
Status after four ‘good’ findings
All 1.00
Some .00
None .00
Expert
Competent
Novice
November 6, 2001
.39
.51
.11
FERA 2001
Slide 27
Status after one ‘good’ and three ‘bad’ findings
All
.00
Some .00
None 1.00
Expert
Competent
Novice
November 6, 2001
.15
.54
.30
FERA 2001
Slide 28
“Docking” another Evidence Model
Student Model
Evidence Model
Information gathering/Usage
Communality
Assessment
Evaluation
Treatment P lanning
Treatment Planning
Adequacy of treatment procedures
Individualization of procedures
Medical Knowledge
Medical Knowledge
Effect of treatment on patient
Ethics/Legal
Performance of extraneous treatment
Context
November 6, 2001
FERA 2001
Slide 29
“Docking” another Evidence Model
Student Model
Evidence Model
Information gathering/Usage
Communality
Assessment
Evaluation
Treatment
lanning
TreatmentPPlanning
Adequacy of treatment procedures
Individualization of procedures
Medical
Knowledge
Medical
Knowledge
Effect of treatment on patient
Ethics/Legal
Performance of extraneous treatment
Context
November 6, 2001
FERA 2001
Slide 30
Implications for Task Models
Task models are schemas for phases of cases,
constructed around key features that ...

the simulator needs for its virtual-patient data base,

characterize features we need to evoke specified aspects of
skill/knowledge,

characterize features of tasks that affect difficulty,

characterize features we need to assemble tasks into tests.
November 6, 2001
FERA 2001
Slide 31
Implications for Simulator
Once we’ve determined the kind of evidence we need
as evidence about targeted knowledge, how must we
construct the simulator to provide the data we need?

Nature of problems
» Distinguish phases in the patient interaction cycle.
» Use typical forms of information & control availability.
» Dynamic patient condition & cross time cases.

Nature of affordances
»
»
»
»
Examinees must be able to seek and gather data,
indicate hypotheses,
justify hypotheses with respect to cues,
justify actions with respect to hypotheses.
November 6, 2001
FERA 2001
Slide 32
Payoff



Re-usable student-model
»
Can project to overall score for licensing
»
Supports mid-level feedback as well
Re-usable evidence and task models
»
Can write indefinitely many unique cases using schemas
»
Framework for writing case-specific evaluation rules
Machinery can generalize to other uses & domains
November 6, 2001
FERA 2001
Slide 33
Part 2 Conclusion
Two ways to “score” complex assessments
THE HARD WAY:
Ask ‘how do you score it?’ after you’ve built the
assessment and scripted the tasks or scenarios.
A DIFFERENT HARD, BUT MORE LIKELY TO WORK, WAY:
Design the assessment and the tasks/scenarios
around what you want to make inferences about,
what you need to see to ground them, and the
structure of the interrelationships.
November 6, 2001
FERA 2001
Slide 34
Grand Conclusion
We can attack new assessment challenges
by working from generative principles:




Principles from measurement and evidentiary
reasoning, coordinated with...
inferences framed in terms of current and
continually evolving psychology,
using current and continually evolving technologies
to help gather and evaluate data in that light,
in a coherent assessment design framework.
November 6, 2001
FERA 2001
Slide 35
November 6, 2001
FERA 2001
Slide 36
Download