Integrating Measurement and Sociocognitive Perspectives in Educational Assessment Robert J. Mislevy

advertisement
Integrating Measurement and Sociocognitive
Perspectives in Educational Assessment
Robert J. Mislevy
University of Maryland
Robert L. Linn Distinguished Address
Sponsored by AERA Division D. Presented at the Annual Meeting of the
American Educational Research Association, Denver, CO, May 1, 2010.
This work was supported by a grant from the Spencer Foundation.
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 1
Messick, 1994

[W]hat complex of knowledge, skills, or
other attribute should be assessed...

Next, what behaviors or performances
should reveal those constructs, and

what tasks or situations should elicit
those behaviors?
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 2
Snow & Lohman, 1989

Summary test scores, and factors based
on them, have often been though of as
“signs” indicating the presence of
underlying, latent traits. …

An alternative interpretation of test scores
as samples of cognitive processes and
contents … is equally justifiable and could
be theoretically more useful.
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 3
Roadmap
Rationale
 Model-based reasoning
 A sociocognitive perspective
 Assessment arguments
 Measurement models & concepts
 Why are these issues important?
 Conclusion

May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 4
Rationale
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 5
Rationale
An articulated way to think about assessment:

Understand task & use situations in “emic”
sociocognitive terms.

Identify the shift in to “etic” terms in task-level
assessment arguments.

Examine the synthesis of evidence across tasks in
terms of model-based reasoning.

Reconceive measurement concepts.

Draw implications for assessment practice.
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 6
Model-Based Reasoning
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 7
Representational
Form A
y=ax+b
(y-b)/a=x
Measurement
models
Representational
Form B
Mappings among
representational
systems
Entities and
Measurement
relationships
concepts
Real-World Situation
Reconceived Real-World Situation
Representational
Form A
y=ax+b
Measurement
models
(y-b)/a=x
Representational
Form B
Mappings among
representational
systems
Reconceived
Entities and
relationships
in higher-level
model
Measurement
concepts
Entities and
relationships
in lower-level
model
Real-World Situation
Sociocognitive
concepts
Reconceived Real-World Situation
A Sociocognitive Perspective
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 10
Some Foundations

Themes from, e.g., cog psych, linguistics,
neuroscience, anthropology:
» Connectionist metaphor, associative memory,
complex systems (variation, stability, attractors)

Situated cognition & information processing
» E.g., Kintsch’s Construction-Integration (CI) theory
of comprehension; diSessa’s “knowledge in pieces”

Interpersonal & Extrapersonal patterns
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 11
Some Foundations

Extrapersonal patterns:
» Linguistic: Grammar, conventions, constructions
» Cultural models: What ‘being sick’ means,
restaurant script, apology situations
» Substantive: F=MA, genres, plumbing, etc.

Intrapersonal resources:
» Connectionist metaphor for learning
» Patterns from experience at many levels
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 12
Inside A
not observable
May 1, 2010
A
Inside B
B
observable
not observable
AERA 2010 Robert L. Linn Lecture
Slide 13
and internal and external
aspects of context …
Inside A
A
Inside B
B
Context
A la Kintsch: Propositional
content of text / speech…
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 14
Inside A
A
Inside B
B
Context
The C in CI theory is Construction:
Activation of both relevant and irrelevant
bits from
•If a pattern
hasn’t been developed in past
LTM, past experience. All L/C/S levels
involved.
experience,
it can’t be activated (although it
Example: Chemistry problems in German.
may get constructed in the interaction).
May 1, 2010
•A relevant pattern from LTM may be
activated in some contexts but not others
(e.g., physics models).
AERA 2010 Robert L. Linn Lecture
Slide 15
Inside A
A
Inside B
B
Context
The I in CI theory, Integration:
•Situation model: synthesis of coherent /
reinforced activated L/C/S patterns
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 16
Inside A
A
Inside B
B
Context
Situation model is also the
basis of planning and action.
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 17
Inside A
A
Inside B
B
Context
Context
Context
Context
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 18
Inside A
A
Inside B
B
Context
Context
Context
Context
May 1, 2010
Ideally, activation of
relevant and compatible
intrapersonal patterns…
AERA 2010 Robert L. Linn Lecture
Slide 19
Inside A
A
Inside B
B
Context
Context
Context
Context
•Persons’ capabilities, situations, and
toperformances
lead to (sufficiently)
are intertwined –
•Meaning
co-determined, through
shared
understanding;
L/C/S
patterns
i.e., co-constructed
meaning.
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 20
What can we say about individuals?
Use of resources in appropriate contexts in
appropriate ways; i.e.,
Attunement to targeted L/C/S patterns:

Recognize markers of externally-viewed patterns?

Construct internal meanings in their light?

Act in ways appropriate to targeted L/C/S patterns?

What is the range and circumstances of activation?
(variation of performance across contexts)
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 21
Assessment Arguments
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 22
Messick, 1994

[W]hat complex of knowledge, skills, or
other attribute should be assessed...

Next, what behaviors or performances
should reveal those constructs, and

what tasks or situations should elicit
those behaviors?
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 23
Toulmin’s Argument Structure
Claim
unless Alternative
Warrant
explanation
since
so
Backing
May 1, 2010
Data
AERA 2010 Robert L. Linn Lecture
Slide 24
Concerns features of (possibly
Note the move from the emic
evolving) context as seen from the
to Claim
the about
etic!student
view of the assessor – in particular,
Choice in light of assessment
Backing concerning
assessment situation
those seen as relevant to targets of
purpose and conception of
inference.
Warrant capabilities.
concerning
unless
on account of
assessment
Alternative
explanations
since
so
Data concerning
Evaluation
of performance
task situation
seeks evidence of
Depends on contextual
attunement to features of
features implicitly, since
targeted L/C/S patterns.
evaluated in light of targeted
patterns.
Student acting L/C/S
in
Data concerning
student
performance
Warrant
concerning
evaluation
since
Warrant
concerning
task design since
Other information
concerning student vis a vis
assessment situation
assessment situation
Backing concerning
assessment situation
on account of
Warrant
concerning
assessment
“Hidden” aspects of context—not in
test theory model but essential to
argument:
What attunements to linguistic
Claim about student
cultural / substantive patterns can be
presumed
arranged
for among
Fundamental
to or
situated
meaning
examinees,
to condition
inference re
of student
variables
in
targeted l/c/s
patterns?
measurement
models;
Both critical and implicit.
unless
Alternative
explanations
since
so
Data concerning
student
performance
Warrant
concerning
evaluation
since
Data concerning
task situation
Warrant
concerning
task design since
Student acting in
assessment situation
Other information
concerning student vis a vis
assessment situation
Claim about student
Backing concerning
assessment situation
unless
on account of
Features of Warrant
concerning
performance assessment
evaluated in light
of emerging
context.
Macro features
of performance
Features of
context arise over
time as student
acts / interacts.
Alternative
explanations
since
so
Data concerning
student
performance
Warrant
concerning
evaluation
since
Micro features of
performance
Data concerning
task situation
Time
Warrant
concerning
task design since
Especially important in simulation,
game,
extended
performance
Unfolding
situated and
Micro features
of Macro
features of
performance
situation as it
situation
contexts
(e.g.,
Shute)
evolves
Student acting in
Other information
concerning student vis a vis
assessment situation
assessment situation
Claim about student
Backing concerning
assessment situation
unless
on account of
Warrant
concerning
assessment
Alternative
explanations
since
so
Data concerning
student
performance
Warrant
concerning
evaluation
Design Argument
since
Data concerning
task situation
Warrant
concerning
task design since
Student acting in
assessment situation
Other information
concerning student vis a vis
assessment situation
Use Argument
Claim about student in use
situation
(Bachman)
unless
Warrant concerning
use situation
Alternative
explanations
since
on account of
Backing concerning
use situation
Other information
concerning student vis a
vis use situation
Data concerning
use situation
Claim about student
Backing concerning
assessment situation
unless
on account of
Warrant
concerning
assessment
Alternative
explanations
since
so
Data concerning
student
performance
Warrant
concerning
evaluation
Design Argument
since
Data concerning
task situation
Warrant
concerning
task design since
Student acting in
assessment situation
Other information
concerning student vis a vis
assessment situation
Use Argument
Claim about student in use
situation
(Bachman)
unless
Warrant concerning
use situation
Alternative
explanations
since
on account of
Backing concerning
use situation
Data concerning
use situation
Other information
concerning student vis a
vis use situation
Claim about student
Backing concerning
assessment situation
unless
Alternative
explanations
Claim aboutWarrant
student is
concerning
output of the
assessment
assessment
cast depends on
argument, How
inputittoisthe
Data concerning
Data concerning
psychological
perspective
student
task situation
use argument.
When
measurement
models
performance
and intended
areuse.
used, the claim is an etic
synthesis of evidence,
expressed as values of
student-model
variable(s).
Student acting in
Design Argument
on account of
since
so
Warrant
concerning
evaluation
since
Warrant
concerning
task design since
Other information
concerning student vis a vis
assessment situation
assessment situation
Use Argument
Claim about student in use
situation
unless
Warrant concerning
use situation
Alternative
explanations
since
on account of
Backing concerning
use situation
Other information
concerning student vis a
vis use situation
Data concerning
use situation
Claim about student
Backing concerning
assessment situation
unless
on account of
Warrant
concerning
assessment
Alternative
explanations
since
so
Data concerning
student
performance
Warrant
concerning
evaluation
Design Argument
since
Data concerning
task situation
Warrant
concerning
task design since
Student acting in
assessment situation
Other information
concerning student vis a vis
assessment situation
Use Argument
Claim about student in use
situation
unless
Warrant concerning
use situation
Alternative
explanations
since
on account of
Backing concerning
use situation
Other information
concerning student vis a
vis use situation
Data concerning
use situation
Claim about student
Backing concerning
assessment situation
unless
on account of
Warrant
concerning
assessment
Alternative
explanations
since
so
Data concerning
student
performance
Warrant
concerning
evaluation
Design Argument
since
Data concerning
task situation
Warrant
concerning
task design since
Student acting in
assessment situation
Other information
concerning student vis a vis
assessment situation
Use Argument
Claim about student in use
situation
unless
Warrant concerning
use situation
Alternative
explanations
since
on account of
Backing concerning
use situation
Other information
concerning student vis a
vis use situation
Data concerning
use situation
Claim about student
Backing concerning
assessment situation
unless
on account of
Warrant
concerning
assessment
Alternative
explanations
since
so
Data concerning
student
performance
Warrant
concerning
evaluation
Design Argument
since
Data concerning
task situation
Warrant
concerning
task design since
Student acting in
assessment situation
Other information
concerning student vis a vis
assessment situation
Use Argument
Claim about student in use
situation
unless
Warrant concerning
use situation
Alternative
explanations
since
on account of
Backing concerning
use situation
Other information
concerning student vis a
vis use situation
Data concerning
use situation
Claim about student
Backing concerning
assessment situation
unless
on account of
Warrant
concerning
assessment
Alternative
explanations
Warrant for inference:
Increased likelihood of
Data concerning
Data concerning
activation in use situation
student
task situation
performance
if was activated in task
Empirical question: Degrees
situations.
of stability, ranges and
conditions of variability
Student acting in
(Chalhoub-Deville)
Design Argument
assessment situation
since
so
Warrant
concerning
evaluation
since
Warrant
concerning
task design since
What features do
tasks and use
situations share?
•Implicit in trait
arguments
•Explicit in
sociocognitive
arguments
Other information
concerning student vis a vis
assessment situation
Use Argument
Claim about student in use
situation
unless
Warrant concerning
use situation
Alternative
explanations
since
on account of
Backing concerning
use situation
Data concerning
use situation
Other information
concerning student vis a
vis use situation
Claim about student
•Use situation features call for other
•Knowing about relation of target
Backing
concerning that weren’t in task and
L/C/S
patterns
assessment situation
examinees and use situations
may or may not be in examinee’s
strengthen inferences
Warrant
resources. concerning
•“bias for the best” (Swain, 1985)
assessment
•Target patterns activated in task but
Data concerning
Data concerning
not use context.
What features do tasks and
student
task situation
performance
•Target patterns activated in use but use
not situations not have in
task context.
common?
Issues of validity & generalizability
e.g., “method factors”
unless
on account of
Alternative
explanations
since
so
Warrant
concerning
evaluation
Design Argument
since
Warrant
concerning
task design since
Student acting in
assessment situation
Other information
concerning student vis a vis
assessment situation
Multiple Tasks
Claim about student
Dp1
Ds1
Dp1
p2
…
Ds2
OI1
Dpn
p1
Dsn
OI2
OIn
Synthesize
evidence
from multiple tasks, in terms
of
A
A
A
proficiency variables in a measurement model
 Snow & Lohman’s sampling
 What accumulates? L/C/S patterns, but variation
 What is similar from analyst’s perspective need
not be from examinee’s.
1
2
May 1, 2010
AERA 2010 Robert L. Linn Lecture
n
Slide 36
Measurement Models & Concepts
AS IF




Tendencies for certain kinds of performance in certain
kinds of situations expressed as student model variables q.
Probability models for individual performances (X)
modeled as probabilistic functions of q – variability.
Probability models permit sophisticated reasoning about
evidentiary relationships in complex and subtle situations,
BUT they are models, with all the limitations implied!
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 37
Measurement Models & Concepts




Xs result from particular persons calling upon
resources in particular contexts (or not, or how)
Mechanically qs simply accumulate info across
situations
Our choosing situations and what to observe
drives their situated meaning.
Situated meaning of qs are tendencies toward
these actions in these situations that call for
certain interactional resources, via L/C/S patterns.
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 38
Classical Test Theory
t
Claim about student
X
Dp1

Ds1
Dp2
p1
OI1


…
Dpn
p1
Dsn
OI2
Probability
model: “true score” = stability along implied
A
A
A
dimension, “error” = variation
Situated meaning from task features & evaluation
Can organize around traits, task features, or both,
depending on task sets and performance features.
Profile differences unaddressed
1

Ds2
2
May 1, 2010
AERA 2010 Robert L. Linn Lecture
n
Slide 39
OIn
Item Response Theory
q
Claim about student
D
Ds1
Dp1
X
1
Dp1
p2
X
2
OI1




…
Ds2
Dpn
p1
X
n
Dsn
OI2
A
A
q = Apropensity to act
in targeted way, bj=typical
1
2
OIn
n
evocation,Complex
IRT function
= typical variation
systems concepts:
Willfrom
best when
most &
nontargeted
L/C/S
Situated meaning
task
features
evaluation
Attractors
&work
stability

patterns
are familiar…
regularities
in
response
patterns,
Task features still implicit
Item-parameter
invariance
quantified
in parameters;
Profile differences / misfit highlights
where
the
vs
Population
dependence
Typical variation  prob model
narrative doesn’t
fit – forLinn,
sociocognitive
reasons1988)
(Tatsuoka,
Tatsuoka, & Yamamoto,
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 40
Multivariate Item Response Theory (MIRT)

q s = propensities to act in targeted ways in situations

with different mixes of L/C/S demands.
Good for controlled mixes of situations
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 41
Structured Item Response Theory
q
Claim about student
XD1p1
qD1 s1
XD2p1
p2
2
vi1OI1



A1
…
D
qs2
XDnpn
p1
qDnsn
OIn
vin
vOIi22
A
A
Explicitly model task
situations in terms of L/C/S
demands. Links TD with sociocognitive view.
Work explicitly with features in controlled and evolved
situations (design / agents)
Can use with MIRT; Cognitive diagnosis models
2
May 1, 2010
AERA 2010 Robert L. Linn Lecture
n
Slide 42
Mixtures of IRT Models
q student
Claim about
DXp1
1
Ds1
DXp2
2
p1
OI1
A1
…
Ds2
DXpn
n
p1
OI2
A2
OIn
OR
DXp1
1


An
q student
Claim about
Ds1
DXp2
2
p1
…
Ds2
Different IRT models for
differentA unobserved
A
groups of people
Modeling different attractor states
Can be theory driven or discovered in data
OI1
1

Dsn
May 1, 2010
DXpn
n
p1
OI2
2
AERA 2010 Robert L. Linn Lecture
Dsn
OIn
An
Slide 43
Measurement Concepts

Validity
» Soundness of model for local inferences
» Breadth of scope is an empirical question
» Construct representation in L/C/S terms
» Construct irrelevant sources of variation in
L/C/S terms

Reliability
» Through model, strength of evidence for
inferences about tendencies, given variabilities
… or about characterizations of variability.
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 44
Measurement Concepts

Method Effects
» What accumulates in terms of L/C/S patterns in
assessment situations but not use situations

Generalizability Theory (Cronbach)
» Watershed in emphasizing evidentiary reasoning
rather than simply measurement
» Focus on external features of context; can be recast
in L/C/S terms, & attend to correlates of variability
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 45
Why are these issues important?

Connect assessment/measurement with
current psychological research
» Connect assessment with learning
Appropriate constraints on interpreting large
scale assessments
 Inference in complex assessments

» Games, simulations, performances
» Assessment modifications & accommodations
» Individualized yet comparable assessments
May 1, 2010
AERA 2010 Robert L. Linn Lecture
Slide 46
Conclusion
May 1, 2010
Communication at
the interface
We have work we
need to do, together.
AERA 2010 Robert L. Linn Lecture
Slide 47
Download