A Taxonomy of Adaptive Testing Robert J. Mislevy Roy Levy John T. Behrens

advertisement
A Taxonomy of Adaptive Testing
Robert J. Mislevy
Measurement, Statistics & Evaluation
University of Maryland
in collaboration with
Roy Levy
John T. Behrens
Arizona State University
Cisco Systems, Inc.
Presented at the Fifth Annual Technology for Second Language Learning Conference,
September 21-22, 2007, Iowa State University, Ames, Iowa, USA
September 22, 2007
TSLL 07
Slide 1
Terminology & Concepts for Adaptive Testing

Adaptive testing
» Most familiar as item response-theory based
computer-adaptive testing (IRT-CAT)



Can take a broader perspective of evidentiary
reasoning
We will look at the interplay among inferences
and data gathering
A taxonomy of configurations
» IRT-CAT plus many others
September 22, 2007
TSLL 07
Slide 2
Taxonomy based on three dimensions …
Claim status
 Observation status
 Locus of control

September 22, 2007
TSLL 07
Slide 3
Background for the dimensions
Glenn Shafer’s “Frame of discernment”
 Evidence–centered assessment design

September 22, 2007
TSLL 07
Slide 4
“Frame of discernment”



From Shafer’s (1976) A mathematical theory of evidence.
It’s all the possible combinations of values of the variables
your are working with.
“Frame” emphasizes how it effectively circumscribes a
universe in which inference will take place
 “Discern” = “detect, recognize, distinguish”


Property of you as much as property of world
Depends on what you know and what your purpose is
September 22, 2007
TSLL 07
Slide 5
“Frame of discernment”
Frames of discernment can evolve over time,


as beliefs, knowledge, and aims unfold over time.
E.g., dip for the party? medical diagnosis
Move from one frame of discernment to another by



ascertaining values of some variables, dropping others,
adding new variables or refining current ones
constructing a different frame when observations cause
rethinking of assumptions or goals
September 22, 2007
TSLL 07
Slide 6
Evidence-Centered Design

Mislevy, Steinberg, & Almond (2003) “On the structure of
educational assessments.”

Educational assessment as evidentiary argument:
We reason from the things students say, do, or make in a
handful of particular settings, to what they know, can do in
various situations, or have accomplished, as more broadly
construed.

All elements of an assessment, from analysis of domain,
through design, to operation, are based on building then
embodying such an argument in operational procedures.
September 22, 2007
TSLL 07
Slide 7
Toulmin’s Argument Structure
Claim
unless Alternative
Warrant
explanation
since
so
Backing
September 22, 2007
Data
TSLL 07
Slide 8
An Assessment Design Argument
Aspects of performance
that bear on claims is
captured in terms of
Warrant
observable
variables
(OVs)
Backing
Claim about student
in some frame of
discernent
so
Data
concerning
performance
Formative assessments often
have highly specific claims,
summative assessments tend
Whatclaims.
aspects of the
to have broader
Data
concerning
situation
Student acting in
assessment situation
September 22, 2007
Information pertinent to
addressing the claims is
accumulated in terms of
student-model variables
(SMVs)
TSLL 07
situation are important
for the possibility of
inference about
examinee?
What we actually
see/hear the student
say, do, or make
Slide 9
Adaptive Testing
4. Update belief
about claim
Claim about student
in some frame of
discernent
5. Somebody has
choice about
whether to
refocus claim
Warrant
so
Backing of
3. Evaluation
performance in
light of current
targeted claim
Data
concerning
performance
Data
concerning
situation
1. Somebody
selects
situation for
getting
information
Student acting in
assessment situation
2. Examinee acts
September 22, 2007
TSLL 07
Slide 10
What is an adaptive test?

At a given time in an assessment system,
The set of student-model variables and
observable variables consitutes a frame of
discernment.

An adaptive test is one in which the frame of
discernment changes over time as a function of
the values of observations.

Ways it might change are the basis of the
taxonomy.
September 22, 2007
TSLL 07
Slide 11
Claim Status
Is the claim part of the frame of discernment,
i.e., SMVs, fixed or evolving?

i.e., do the SMVs at issue stay the same or
change (as opposed to knowledge about SMVs)?
September 22, 2007
TSLL 07
Slide 12
Observation status
Is the data part of the frame of discernment,
i.e., OVs, fixed or evolving?

i.e., does the choice of OVs that can be made
stay the same or change as more information is
obtained?
September 22, 2007
TSLL 07
Slide 13
Locus of Control
If the claim part of the frame is changing as the test
procedes, who decides how it should change:
The examiner or the examinee?
If the data part of the frame is changing as the test
procedes, who decides how it should change:
The examiner or the examinee?
September 22, 2007
TSLL 07
Slide 14
Observation status
Fixed
Claim status
Fixed
Adaptive:
Examiner
Determined
Adaptive:
Examinee
Determined
1. Usual, linear test
Adaptive: Examiner
Determined
Adaptive: Examinee
Determined
2. IRT-CAT
“User friendly”
testing
Observation status
Fixed
Claim status
Fixed
Adaptive:
Examiner
Determined
Adaptive:
Examinee
Determined
1. Usual, linear test
Adaptive: Examiner
Determined
2. IRT-CAT
Guided /
diagnostic
Adaptive: Examinee
Determined
Observation status
Fixed
Claim status
Fixed
1. Usual, linear test
Adaptive: Examiner
Determined
2. IRT-CAT
Adaptive:
Examiner
Determined
Adaptive:
Examinee
Determined
Self-guided /
diagnostic
Adaptive: Examinee
Determined
Cell 1: Fixed, examiner-controlled claim;
Fixed, examiner-controlled observation
Traditional assessments in which …

Same kind of claim(s) / inferences / SMVs for everyone

they were decided on by the examiner a priori,

tasks presented are determined by the examiner a priori,

the examiner determines the sequence of tasks a priori
Neither the frame of discernment nor the gathering of
evidence varies in response to values of observable
variables or their impact on beliefs about SMVs.
September 22, 2007
TSLL 07
Slide 18
Cell 2: Fixed, examiner-controlled claim;
Adaptive, examiner-controlled observation

Same claims space (SMVs) for everyone

the claims (SMVs) were decided on by the examiner,

the tasks presented are determined by examiner a priori,
But in light of unfolding pattern responses, examiner
selects items, to maximize accuracy

IRT-CAT (Can be multivariate; Segall, 1996).

Binet’s original individually-administered intelligence test

Lord’s Flexi-level scheme
September 22, 2007
TSLL 07
Slide 19
Cell 3: Fixed, examiner-controlled claim;
Adaptive, examinee-controlled observation

Same claims space (SMVs) for everyone

the claims (SMVs) were decided on by the examiner.
But examinee is able to determine tasks in light
of how he/she chooses. “User friendly”

Pole-vaulting competition

Self-adaptive SAT (Wise et al, 1992): Student chooses
items by page or bin, grouped by difficulty. IRT scoring
takes difficulty into account. (also see Wright, 1977)

Guard against nonignorable missingness (free throws)
September 22, 2007
TSLL 07
Slide 20
Cell 4: Adaptive, examiner-controlled claim;
fixed, examiner-controlled observation

Same tasks (OVs) for everyone

Same presentation of tasks, determined a priori by
examiner.
But examiner determines claims (SMVs) for
examinee in light of responses. E.g.,


MMPI – same 100’s of items for everyone, but examiner
may compute different scales for different patients.
Diagnostic “reading record” test in language testing
Note: Need multidimensional claim space in Cells 4-9.
September 22, 2007
TSLL 07
Slide 21
Cell 5: Adaptive, examiner-controlled claim;
adaptive, examiner-controlled observation

Claims may diverge for different examinees in light of data

Different tasks for different examinees, to be optimal in light
of the claims examiner wants to make about them as
individuals
E.g.,




Triage in medicine, followed by different diagnostics
Adaptive MMPI – different items for everyone, adaptively
selected for different scales for scales for different patients.
Differential strategies in math (Tatsuoka)
Adaptive diagnosis in language testing
September 22, 2007
TSLL 07
Slide 22
Cell 6: Adaptive, examiner-controlled claim;
adaptive, examinee-controlled observation

Examiners can home in on different claims for different
examinees in light of data, but

Examinees have at least some control over task selection.
E.g.,


Self-adaptive tests, but along dimensions controlled by
examiner. Mulivariate SA-SAT, examiner’s inferences.
Diagnostic / placement tests, homing in on different
remedial needs of students, but allowing for lower-stress
choices of groups/pages of tasks like in Cell 3.
Thus examiner tailors claims part of frame of discernment,
examinee tailors overvations part given claims.
September 22, 2007
TSLL 07
Slide 23
Cell 7: Adaptive, examinee-controlled claims; fixed,
examiner-controlled observations

Examinees all take same examiner-determined items in
examiner-determined way, but …

Examinees can home in on different claims of their
choosing in light of data.
E.g.,


MMPI, but examinee determines which scales to compute
& analyze.
Oral reading of a fixed sample, automated parsing—
student determines what to work on next (maybe could be
done with Ordinate-like setup?)
September 22, 2007
TSLL 07
Slide 24
Cell 8: Adaptive, examinee-controlled claims;
adaptive, examiner-controlled observations

Examinee chooses the claim, at beginning or adaptively,

examiner controls tasks presentation for optimal precision.
E.g., structured self-diagnosis:

MMPI, where examinee determines which scales to focus
on and is presented items adaptively for those scales.

Oral readings w. automated parsing—student determines
what to work on next, then examiner-selected samples to
focus on what examinee wants to follow up on.
SIGI: Sequential exploration of career interests -- examinee
chooses categories and system asks adaptive questions.

September 22, 2007
TSLL 07
Slide 25
Cell 9: Adaptive, examinee-controlled claims;
adaptive, examinee-controlled observations

Examinees control both the claims and the tasks to yield
observations for those claims.

The examinee selects the claims to focus on and then has
input into what data will be observed.

Feedback from system to help examinee figure out what
they want to know, then offer them choices about directions
to go to refine information they receive
(continued)
September 22, 2007
TSLL 07
Slide 26
Cell 9, continued: Adaptive, examinee-controlled
claims; adaptive, examinee-controlled observations
E.g., guided self-diagnosis:

Central challenge in retrieval systems in libraries -organize materials and search terms to help patrons find
the information they might want

Amazon: “Customers who looked at these books you
selected also looked at…”

Multivariate SA-SAT practice exploration space

Language testing self-diagnosis: Start with common
passage or list of areas, do diagnostics, use results to
refine testing for areas you are interested in.
September 22, 2007
TSLL 07
Slide 27
Observation status
Fixed
Claim status
Adaptive: Examiner
Determined
Adaptive: Examinee
Determined
Fixed
1. Usual, linear test
2. IRT-CAT
3. Self-adapting tests
e.g., SA-SAT (Wise et
al., 1992)
Adaptive:
Examiner
Determined
4. MMPI—
examiner decides
how to pursue
analysis
5. Examiner chooses
target, Multidim CAT
6. Examiner chooses
target in Multidim SASAT
Adaptive:
Examinee
Determined
7. MMPI—
examinee decides
how to pursue
analysis
8. Examinee chooses
target, Multidim CAT
9. Examinee chooses
target & tasks in
Multidim SA-SAT
Conclusion

Assessments involving adaptive claims have yet to
achieve the prominence of adaptive-observation
assessments.
» History, up-front work, solving known “centralized” problems

User-controlled assessment not seen as assessment

User modeling literature will be important

Cells 8 & 9 good for self-directed learning in a
supported environment
» Like user-modeling strategies for buying cars, choosing
movies, finding information in library systems.
September 22, 2007
TSLL 07
Slide 29
Download