Argument Substance and Argument Structure in Educational Assessment Robert J. Mislevy

advertisement
Argument Substance and Argument Structure
in Educational Assessment
Robert J. Mislevy
Department of Measurement, Statistics, & Evaluation
University of Maryland, College Park, MD
April 29, 2003
Presented at Conference on Inference, Culture, and Ordinary Thinking in Dispute
Resolution, Benjamin N. Cardozo School of Law, Yeshiva University, New York, New York,
April 27-29, 2003. This work builds on research with Linda Steinberg and Russell Almond
at Educational Testing Service on the structure of educational assessments.
April 29, 2003
Inference & Culture
Slide 1
Central Points
Educational assessment has changed
considerably over the last century.
 Why? Strikingly different psychological
perspectives on nature of learning and
knowledge.
 Can be seen as elaborations of same
argument structure.

» Wigmore, Toulmin
April 29, 2003
Inference & Culture
Slide 2
Messick (1994) on assessment design:
[B]egin by asking what complex of knowledge,
skills, or other attribute should be assessed,
presumably because they are tied to explicit or
implicit objectives of instruction or are otherwise
valued by society.
Next, what behaviors or performances should
reveal those constructs, and what tasks or
situations should elicit those behaviors?
Thus, the nature of the construct guides the
selection or construction of relevant tasks as well
as the rational development of construct-based
scoring criteria and rubrics.
April 29, 2003
Inference & Culture
Slide 3
Toulmin's (1958) structure for arguments
C
unless
W
on
account
of
A
since
so
B
D
supports
R
Reasoning flows from data (D) to claim (C) by justification of a
warrant (W), which in turn is supported by backing (B). The
inference may need to be qualified by alternative explanations (A),
which may have rebuttal evidence (R) to support them.
April 29, 2003
Inference & Culture
Slide 4
Perspectives on learning and knowledge
Trait/differential
(~1900 - )
 Behaviorist
(~1950 - 1980)
 Information-processing
(~1970 - )
 Sociocultural
(~1980 - )

April 29, 2003
Inference & Culture
Slide 5
Trait/Differential Perspective



A relatively stable characteristic of a person—
an attribute, enduring process, or
disposition—which is consistently manifested
to some degree when relevant, despite
considerable variation in the range of settings
and circumstances. (Messick, 1989)
Interest in people's differential status on
common traits
Useful in selection, prediction, and
educational decisions—not so much for
instruction
April 29, 2003
Inference & Culture
Slide 6
Spearman’s
“Theorem of indifference of the indicator”
This means that, for the purpose of indicating the amount of
g possessed by a person, any test will do just well as any
other, provided only that its correlation with g is equally
high. ...
Another consequence of the indifference of the indicator
consists in the significance that should be attached to
personal estimates of “intelligence” made by teachers and
others. However unlike may be the kinds of observation
from which these estimates may have been derived, still
insofar as they have a sufficiently broad basis to make the
influence of g dominate over that of the s’s [subjects], they
will tend to measure precisely the same thing.
April 29, 2003
Inference & Culture
Slide 7
An Analytical Reasoning Item
Pet Shop Display
Arturo is planning the parakeet display for his pet shop. He has five parakeets, Alice,
Bob, Carla, Diwakar, and Etria. Each is a different color; not necessarily in the same
order, they are white, speckled, green, blue, and yellow. Arturo has two cages. The top
cage holds three birds, and the bottom cage holds two. The display must meet the
following additional conditions:
Alice is in the bottom cage.
Bob is in the top cage and is not speckled.
Carla cannot be in the same cage as the blue parakeet.
Etria is green.
The green parakeet and the speckled parakeet are in the same cage.
If Carla is in the top cage, which of the following must be true?
a) The green parakeet is in the bottom cage.
b) The speckled parakeet is in the bottom cage.
c) Diwakar is in the top cage.
d) Diwakar is in the bottom cage.
e) The blue parakeet is in the top cage.
April 29, 2003
Inference & Culture
Slide 8
LSAT on AR Items

LSAT's description of AR takes a trait perspective:
"Analytical reasoning items are designed to measure
the ability to understand a structure of relationships
and to draw conclusions about the structure."

AR items are in the LSAT not because either lawyers
or law students routinely have to solve problems just
like these in their jobs or their studies, but because
there is evidence that students who can solve these
kinds of puzzles tend to perform better in law school
than students who don't.
April 29, 2003
Inference & Culture
Slide 9
C : S u e h a s a h ig h v a l u e
o f A n a ly t ic a l R e a s o n in g .
A: S u e a ns w e r e d
W : S tu d e n ts w h o a r e h ig h o n
u n les s
A n a l y ti c a l R e a s o n i n g te n d t o d o
c o rrec t ly as a re s ult
o f a lu c k y g u e s s .
w e l l o n l o g ic a l p u z z le s t h a t
q u e r y r e la t io n s t h a t fo ll o w fr o m
s inc e
e x p l ic i t r e la t io n s a n d c o n s tr a i n t s.
on
s u p po rt s
a c c o un t
of
so
B: E m pi r i c a l s t ud i e s s h ow
R: S u e s p e nt l es s
an d
h ig h c o r r e la t io n s b e t w e e n
th a n 1 0 s e c o n d s
o n th is it e m .
A R te s t s c o r e s a n d c o l l e g e
g rad e s , op e n -en d ed
p r o b l e m s o lv i n g ta s k s , a n d
r a ti n g s o f e m p l o ye e s
D 1: S u e
D 2 : L o g ic a l
r e a s o n in g s ki l ls o n t h e jo b .
a n s w e r e d th e
s t r u ct u r e a n d
P e t S h o p i te m
c o n te n ts o f P e t
c o r r e c tl y .
S h o p i te m .
C : S u e h a s a h ig h v a l u e
o f A n a ly t ic a l R e a s o n in g .
A: S u e a ns w e r e d
W : S tu d e n ts w h o a r e h ig h o n
u n les s
A n a l y ti c a l R e a s o n i n g te n d t o d o
c o rrec t ly as a re s ult
o f a lu c k y g u e s s .
w e l l o n l o g ic a l p u z z le s t h a t
q u e r y r e la t io n s t h a t fo ll o w fr o m
s inc e
e x p l ic i t r e la t io n s a n d c o n s tr a i n t s.
on
1) Note
that the
a c c o un t
warrant
requires
of
a conjunction of
B: E m pi r i c a l s t ud i e s s h ow
data about the
h ig h c o r r e la t io n s b e t w e e n
A R tenature
s t s c o r eof
s aSue's
n d c olleg e
g r a d eperformance
s , o p e n - e n d e d and
p r o b l e m s o lv i n g ta s k s , a n d
the nature of the
r a ti n g s o f e m p l o ye e s
r e a s operformance
n in g s ki l ls o n t h e jo b .
situation.
s u p po rt s
so
R: S u e s p e nt l es s
an d
th a n 1 0 s e c o n d s
o n th is it e m .
D 1: S u e
D 2 : L o g ic a l
a n s w e r e d th e
s t r u ct u r e a n d
P e t S h o p i te m
c o n te n ts o f P e t
c o r r e c tl y .
S h o p i te m .
C: Sue has a high value
of Analytical Reasoning.
unless
2) A closer look at
“data”:
W thesince
on
Must reasonaccount
from
unique work
of
products and itemB materials, to
so
aspects addressed in the general
and
warrant.
D1: Sue
answered the
Pet Shop item
correctly.
W1: Correspondence
of darkest mark and since
keyed response
means correct
and
answer.
D11 : Sue's
marks on the
answer sheet for
Pet Shop item.
A
supports
R
D2 : Logical
structure and
contents of Pet
Shop item.
W2: Elements in
schemas for valid
AR items.
D12
Answer key for
the Pet Shop
item.
since
D22
Particular
content of Pet
Shop item.
Multiple pieces of
evidence of the same kind
C: Sue has a high value
of Analytical Reasoning.
W:Students who are high on
Analytical Reasoning tend to do
well on logical puzzles that
query relations that follow from
explicit relations and constraints.
on
account
of
B: ...
unless
A: ...
since
supports
R: ...
so
and
D11: Sue's
answer to
Item 1
April 29, 2003
...
D1n: Sue's
answer to
Item n
D21 structure
and contents
of Item 1
Inference & Culture
...
D2n structure
and contents
of Item n
Slide 13
Multiple pieces of
evidence of different kinds
C: Sue has a high value
of Analytical Reasoning.
unless
A0: ...
so
A : [[Alternatives re
logic puzzles]]
W1:[[warrant re
logic puzzles]]
unless
A : [[Alternatives re
recommendations]]
:Wn: [[Warrant re
recommendations]]
since
since
and
D11: Sue's
answer to
Item 1
April 29, 2003
unless
and
D12 : Structure
& content of
Pet Shop item
...
Dn1 Teacher
recommendation
about Sue
Inference & Culture
Dn2 Conditions
of observation
for recommendation
Slide 14
Statistical Modeling of Assessment Data
Claims in terms of values of
unobservable variables in
student model (SM)-characterize student
knowledge.
Data modeled as depending
probabilistically on SM vars.
Estimate conditional
distributions of data given SM
vars.
Bayes theorem to infer SM
variables given data.
April 29, 2003
p()

p(X1|)
p(X3|)
p(X2 |)
X1
.
Inference & Culture
X2
.
X3
.
Slide 15
Behaviorist Perspective
The educational process consists of providing a series of
environments that permit the student to learn new behaviors or
modify or eliminate existing behaviors and to practice these
behaviors to the point that he displays them at some reasonably
satisfactory level of competence and regularity under appropriate
circumstances. …
The evaluation of the success of instruction and of the student’s
learning becomes a matter of placing the student in a sample of
situations in which the different learned behaviors may
appropriately occur and noting the frequency and accuracy with
which they do occur.
D.R. Krathwohl & D.A. Payne, 1971, p. 17-18.
April 29, 2003
Inference & Culture
Slide 16
The warrant
encompasses definitions
of the class of stimulus
situations, response
classifications, and
sampling theory.
C : Sue's probability of
correctly answering a 2digit subtraction problem
with borrowing is p
W:Sampling theory machinery
for reasoning from observed
proportion of r correct
responses in n targeted
situations, to true proportion p.
unless
since
A: [e.g., observational
errors, data errors,
misclassification of
responses or
performance situations,
distractions, etc.]
so
and
D1jD11
: Sue's
: Sue's
D11
Sue's
answer
to: to
answer
to
Item
janswer
Item
j
Item j
D2jD2jstructure
D2jstructure
structure
andand
contents
contents
and
of Item
j contents
of Item
j j
of Item
C : Sue's probability of
correctly answering a 2digit subtraction problem
with borrowing is p
W:Sampling theory machinery
for reasoning from observed
proportion of r correct
responses in n targeted
situations, to true proportion p.
unless
since
The claim addresses the
expected value of
performance of the
targeted kind in the
targeted situations.
A: [e.g., observational
errors, data errors,
misclassification of
responses or
performance situations,
distractions, etc.]
so
and
D1jD11
: Sue's
: Sue's
D11
Sue's
answer
to: to
answer
to
Item
janswer
Item
j
Item j
D2jD2jstructure
D2jstructure
structure
andand
contents
contents
and
of Item
j contents
of Item
j j
of Item
C : Sue's probability of
correctly answering a 2digit subtraction problem
with borrowing is p
W:Sampling theory machinery
for reasoning from observed
proportion of r correct
responses in n targeted
situations, to true proportion p.
unless
since
A: [e.g., observational
errors, data errors,
misclassification of
responses or
performance situations,
distractions, etc.]
so
and
D1jD11
: Sue's
: Sue's
D11
Sue's
answer
to: to
answer
to
Item
janswer
Item
j
Item j
D2jD2jstructure
D2jstructure
structure
andand
contents
contents
and
of Item
j contents
of Item
j j
of Item
The task data
address the salient
features of the
stimulus situations
(i.e., tasks).
C : Sue's probability of
correctly answering a 2digit subtraction problem
with borrowing is p
W:Sampling theory machinery
for reasoning from observed
proportion of r correct
responses in n targeted
situations, to true proportion p.
unless
since
A: [e.g., observational
errors, data errors,
misclassification of
responses or
performance situations,
distractions, etc.]
so
The student data
address the salient
features of the
responses.
and
D1jD11
: Sue's
: Sue's
D11
Sue's
answer
to: to
answer
to
Item
janswer
Item
j
Item j
D2jD2jstructure
D2jstructure
structure
andand
contents
contents
and
of Item
j contents
of Item
j j
of Item
The Information-Processing Perspective





Epitomized in Newell and Simon’s (1972) Human
Problem Solving
Examines the procedures by which people acquire,
store, and use knowledge to solve problems.
Modeling problem-solving in terms of the
capabilities and the limitations of human thought
and memory.
Importance of knowledge structures, relationships,
procedures in learning domains.
Use of rules, production systems, task
decompositions, and means-ends analyses.
April 29, 2003
Inference & Culture
Slide 21
Responses consistent with the "subtract
smaller from larger" bug
April 29, 2003
821
- 285
885
- 221
664
664
63
- 15
17
-9
52
12
Inference & Culture
Slide 22
C: Sue's configuration of
production rules for
operating in the domain
(knowledge and skill) is K
W0: Theory about how persons with
configurations {K1,...,Km} would be
likely to respond to items with
different salient features.
since
so
and
C : Sue's probability of
answering a Class 1
subtraction problem with
borrowing is p1
C : Sue's probability of
...
W :Sampling
theory
for items with since
feature set
defining Class 1
answering a Class n
subtraction problem with
borrowing is pn
W :Sampling
theory
for items with since
feature set
defining Class n
so
and
D11j : Sue's
D11
answerD11
to
Item j, Class 1
so
and
D21j structure
D2j
D2j
and contents
of Item j, Class1
of Item
j j
of Item
...
D1nj : Sue's
D11
answerD11
to
Item j, Class n
D2nj structure
D2j
D2j
and contents
of Item j, Class n
of Item
j j
of Item
C: Sue's configuration of
production rules for
operating in the domain
(knowledge and skill) is K
W0: Theory about how persons with
configurations {K1,...,Km} would be
likely to respond to items with
different salient features.
since
so
and
C : Sue's probability of
answering a Class 1
subtraction problem with
borrowing is p1
C : Sue's probability of
...
Like behaviorist
:Sampling
W
inference
at level of
theory
for items with since
behavior
in classes
of
feature set
so
defining Class n
structurally
similar
and
tasks.
W :Sampling
theory
for items with since
feature set
defining Class 1
so
and
D11j : Sue's
D11
answerD11
to
Item j, Class 1
D21j structure
D2j
D2j
and contents
of Item j, Class1
of Item
j j
of Item
answering a Class n
subtraction problem with
borrowing is pn
...
D1nj : Sue's
D11
answerD11
to
Item j, Class n
D2nj structure
D2j
D2j
and contents
of Item j, Class n
of Item
j j
of Item
C: Sue's configuration of
production rules for
operating in the domain
(knowledge and skill) is K
W0: Theory about how persons with
configurations {K1,...,Km} would be
likely to respond to items with
different salient features.
since
so
and
C : Sue's probability of
answering a Class 1
subtraction problem with
borrowing is p1
W :Sampling
theory
for items with since
feature set
defining Class 1
D11j : Sue's
D11
answerD11
to
Item j, Class 1
C : Sue's probability of
...
answering a Class n
subtraction problem with
borrowing is pn
W :Sampling
theory
for items with since
feature set
defining Class n
Patterns among
so
behaviorist claims are
and
data for inferences
D21j structure
D1nj : Sue's
unobservable
D2jabout
D11D11
D2j
answer
to
and contents
...
j, Class n
production rulesItem
that
of Item j, Class1
of Item
j j
of Item
govern
behavior.
so
and
D2nj structure
D2j
D2j
and contents
of Item j, Class n
of Item
j j
of Item
C: Sue's level of
troubleshooting
skill with is K.
W: [theory about strategies and
procedures people at various levels of
troubleshooting expertise tend to
employ when iteratively solving
problems in the domain.]
since
so
and
D1,t+1: Sue's
actions at
time t+1
Assessing inquiry
processes:
Time dependencies in
a troubleshooting task.
Past behavior &
consequences
becomes part of
setting for next action.
D2,t: Context
after time t
D1,t: Sue's
actions at
time t
D2,t-1:
Context after
time t-1
D1,t-1: Sue's
actions at
time t-1
D1,t-2: Sue's
actions at
time t-2
...
D2,t-2:
Context after
time t-2
...
The Sociocultural Perspective
Stresses how knowledge is conditioned and
constrained by the technologies, information
resources, representation systems, and social
situations ...
Incorporates explanatory concepts that have proved
useful in fields such as ethnography and sociocultural
psychology to study collaborative work, … mutual
understanding in conversation, and other
characteristics of interaction that are relevant to the
functional success of the participants’ activities.
Greeno, Collins, & Resnick, 1997, p. 7.
April 29, 2003
Inference & Culture
Slide 27
AP Studio Art
Portfolios
C: The level of
performance for
the Concentration
section is K.
W0: [Specification of general rubric to
the goals and and approach the
student describes in the narrative]
since
so
B: General
rubric
and
D1 :Student's learning
D2 :Conditions under
C
in the course of
carrying out the
concentration.
which the work was
carried out.
tailors
Statements in narrative explaining the
concentration, its influences, goals, etc.
D3j : Art
D11
piece j D11
in the
concentration.
AP Studio Art
Portfolios
C: The level of
performance for
the Concentration
section is K.
W0: [Specification of general rubric to
the goals and and approach the
student describes in the narrative]
since
so
Claim concerns
level of performance
represented by
unique project, in
socially-determined
general evaluation
scheme.
B: General
rubric
and
D1 :Student's learning
D2 :Conditions under
C
in the course of
carrying out the
concentration.
which the work was
carried out.
tailors
Statements in narrative explaining the
concentration, its influences, goals, etc.
D3j : Art
D11
piece j D11
in the
concentration.
AP Studio Art
Portfolios
C: The level of
performance for
the Concentration
section is K.
W0: [Specification of general rubric to
the goals and and approach the
student describes in the narrative]
since
so
B: General
rubric
and
D1 :Student's learning
D2 :Conditions under
C
in the course of
carrying out the
concentration.
which the work was
carried out.
tailors
Statements in narrative explaining the
concentration, its influences, goals, etc.
D3j : Art
D11
piece j D11
in the
concentration.
Data from student
are (1) works of art
and (2) explanation
of project goals,
approach, rationale.
AP Studio Art
Portfolios
C: The level of
performance for
the Concentration
section is K.
W0: [Specification of general rubric to
the goals and and approach the
student describes in the narrative]
since
so
B: General
rubric
and
D1 :Student's learning
D2 :Conditions under
C
in the course of
carrying out the
concentration.
which the work was
carried out.
tailors
Statements in narrative explaining the
concentration, its influences, goals, etc.
D3j : Art
D11
piece j D11
in the
concentration.
Student text helps
assure performance
conditions meet the
requirements of the
warrant.
AP Studio Art
Portfolios
C: The level of
performance for
the Concentration
section is K.
W0: [Specification of general rubric to
the goals and and approach the
student describes in the narrative]
Student text
contributes to how
raters apply general
evaluation rubric to
tailors
this student’s work.
since
so
B: General
rubric
and
D1 :Student's learning
D2 :Conditions under
C
in the course of
carrying out the
concentration.
which the work was
carried out.
Statements in narrative explaining the
concentration, its influences, goals, etc.
D3j : Art
D11
piece j D11
in the
concentration.
Conversational
Competence
C: Sue's level of
conversational
competence is K.
W: [theory about what people at
various levels of conversational
competence will behave in contexts
with specified features]
since
so
and
C
D3,t+1:
I's
speech act at
time t+1
D1,t+1: Sue's
speech act at
time t+1
D2,t: Context
after time t
D1,t: Sue's
speech act at
time t
D3,t: I's
speech act at
time t
D2,t-1:
Context after
time t-1
D1,t-1: Sue's
speech act at
time t-1
D1,t-2: Sue's
speech act at
time t-2
...
D2,t-2:
Context after
time t-2
...
D3,t-1: I's
speech act at
time t-1
D3,t-2: I's
speech act at
time t-2
...
Conversational
Competence
C: Sue's level of
conversational
competence is K.
W: [theory about what people at
various levels of conversational
competence will behave in contexts
with specified features]
and
D1,t+1: Sue's
speech act at
time t+1
D1,t: Sue's
speech act at
time t
D1,t-1: Sue's
speech act at
time t-1
since
so
Challenges:
1) Time dependencies.
2) Interlocutor’s
behavior affects
C
D3,t+1:
I's
speech act atby warrant for
context-- is required
time t+1
D2,t: evidence
Context
about certain aspects of
after time t
D3,t: I's
competence.
speech act at
time Naturalistic
t
3) D2,t-1:
How constrained?
vs.
Context after
interviewer.
time t-1
D3,t-1: I's
D1,t-2: Sue's
speech act at
time t-2
...
D2,t-2:
Context after
time t-2
...
speech act at
time t-1
D3,t-2: I's
speech act at
time t-2
...
Conclusion
What changes?
Developments in psychology, technology, and social
factors (e.g., accommodations) continually place
demands on assessment that outstrip familiar forms.
What doesn’t change?
We want to draw inferences about what students
know and can do as seen from some perspective;
that perspective tells us what kinds of things we need
to see them do, in what kinds of situations, to ground
those inferences.
April 29, 2003
Inference & Culture
Slide 35
Conclusion
We see elaborations, extensions, and
specializations of enduring principles of
evidentiary reasoning.
We find continued value in tools such as Toulmin
diagrams, Wigmore charts, and Bayesian
inference networks to understand yesterday's
assessments, manage today's, and design the
assessments of tomorrow.
April 29, 2003
Inference & Culture
Slide 36
Download