Prospectus for the PADI design framework in language testing Robert J. Mislevy

advertisement
Prospectus for the PADI design
framework in language testing
Robert J. Mislevy
Geneva D. Haertel
Professor of Measurement & Statistics
Assessment Research Area Director
University of Maryland
SRI International
ECOLT 2006, October 13, 2006, Washington, D.C.
PADI is supported by the National Science Foundation under grant REC0129331. Any opinions, findings, and conclusions or recommendations
expressed in this material are those of the authors and do not
necessarily reflect the views of the National Science Foundation.
October 13, 2006
ECOLT 2006
Slide 1
Some Challenges in
Language Testing




Sorting out evidence about interacting
aspects of knowledge & proficiency in
complex performances
Understanding the impact of “complexity
factors” and “difficulty factors” on inference
Scaling up efficiently to high volume tests—
task creation, scoring, delivery
Creating valid & cost-effective low volume
tests
October 13, 2006
ECOLT 2006
Slide 2
Evidence-Centered Design
Evidence-centered assessment design (ECD)
provides language, concepts, knowledge
representations, data structures, and
supporting tools to help design and deliver
educational assessments,
all organized around the evidentiary argument
an assessment is meant to embody.
October 13, 2006
ECOLT 2006
Slide 3
The Assessment Argument
What kinds of claims do we want to
make about students?
 What behaviors or performances can
provide us with evidence for those
claims?
 What tasks or situations should elicit
those behaviors?

Generalizing from Messick (1994)
October 13, 2006
ECOLT 2006
Slide 4
Evidence-Centered Design

With Linda Steinberg & Russell Almond at ETS
» The Portal project / TOEFL
» NetPASS with Cisco (computer network design &
troubleshooting)

Principled Assessment Design for Inquiry (PADI)
» Supported by NSF (co-PI: Geneva Haertel, SRI)
» Focus on science inquiry—e.g., investigations
» Models, tools, examples
October 13, 2006
ECOLT 2006
Slide 5
Some allied work
Cognitive design for generating tasks (Embretson)
Model-based assessment (Baker)
Analyses of task characteristics—test and TLU
(Bachman & Palmer)
Test specifications (Davidson & Lynch)
Constructing measures (Wilson)
Understanding by design (Wiggins)
Integrated Test Design, Development, and Delivery
(Luecht)
October 13, 2006
ECOLT 2006
Slide 6
Domain Analysis
What is important about this domain?
What work and situations are central in this domain?
What KRs are central to this domain?
Domain Modeling
How doKey
we represent
key aspects of the domain in
ideas:
terms of assessment argument.
Conceptual Assessment
Framework
Assessment
Implementation
Assessment Delivery

Explicit relationships
Explicit structures
Design structures: Student, evidence, and
Generativity
task models
Re-usability
Recombinability
How do we choose and present tasks,
and gather and analyze
responses?
Interoperability
How do students and tasks
actually interact?
How do we report examinee
performance?
Layers in the assessment enterprise
From Mislevy & Riconscente, in press
Domain Analysis
What is important about this domain?
What work and situations are central in this domain?
What KRs are central to this domain?
Domain Modeling
How do we represent key aspects of the domain in
terms of assessment argument.
Expertise research, task analysis,
curriculum,
use, critical
incident
Design
structures: target
Student, evidence,
and
task models
analysis, ethnographic studies, etc.
Conceptual Assessment
Framework
How do we choose and present tasks,
and gather and analyze responses?
Assessment
Implementation
In language assessment, importance of…
Assessment Delivery
•Psycholinguistics
How
do students and tasks
•Sociolinguistics
actually interact?
•Target language use
How do we report examinee
performance?

From Mislevy & Riconscente, in press
Domain Analysis
What is important about this domain?
What work and situations are central in this domain?
What KRs are central to this domain?
Tangible stuff
Domain Modeling
How do we represent key aspects of the domain in
terms of assessment argument.
e.g., what gets made and how it
Designin
structures:
Student, evidence,
and
Conceptual Assessment
operates
testing
situation
task models
Framework
Assessment
Implementation
Assessment Delivery

From Mislevy & Riconscente, in press
How do we choose and present tasks,
and gather and analyze responses?
How do students and tasks
actually interact?
How do we report examinee
performance?
Domain Analysis
What is important about this domain?
What work and situations are central in this domain?
What KRs are central to this domain?
Domain Modeling
How do we represent key aspects of the domain in
terms of assessment argument.
How do you get from
Conceptual Assessment
Framework
Assessment
Implementation
Assessment Delivery

From Mislevy & Riconscente, in press
here to
here?
Design structures: Student, evidence, and
task models
How do we choose and present tasks,
and gather and analyze responses?
How do students and tasks
actually interact?
How do we report examinee
performance?
Domain Analysis
What is important about this domain?
What work and situations are central in this domain?
What KRs are central to this domain?
Domain Modeling
How do we represent key aspects of the domain in
terms of assessment argument.
Conceptual Assessment
Framework
Assessment
Implementation
We will focus today on two
“hidden” layers:
Assessment Delivery

Design structures: Student, evidence, and
task models
From Mislevy & Riconscente, in press
How do we choose and present tasks,
and gather and analyze responses?
How do students and tasks
actually interact?
How do we report examinee
performance?
Domain Analysis
What is important about this domain?
What work and situations are central in this domain?
What KRs are central to this domain?
Domain Modeling
How do we represent key aspects of the domain in
terms of assessment argument.
Design structures: Student, evidence, and
task models
Conceptual Assessment
Framework
Assessment
Implementation
We will focus today on two
“hidden” layers:
Assessment Delivery

How do we choose and present tasks,
and gather and analyze responses?
How do students and tasks
actually interact?
How do we report examinee
performance?
Domain modeling, which
concerns the Assessment
Argument
From Mislevy & Riconscente, in press
Domain Analysis
What is important about this domain?
What work and situations are central in this domain?
What KRs are central to this domain?
Domain Modeling
How do we represent key aspects of the domain in
terms of assessment argument.
Conceptual Assessment
Framework
Assessment
Implementation
Assessment Delivery
Design structures: Student, evidence, and
task models
How do we choose and present tasks,
and gather and analyze responses?
How do students and tasks
And the Conceptual
Assessment
actually interact?
How do we report examinee
Framework, which
performance? concerns generative
& re-combinable design schemas

From Mislevy & Riconscente, in press
Domain Analysis
What is important about this domain?
What work and situations are central in this domain?
What KRs are central to this domain?
Domain Modeling
How do we represent key aspects of the domain in
terms of assessment argument.
Conceptual Assessment
Framework
Design structures: Student, evidence, and
task models
More on the
Assessment Argument
Assessment
Implementation
Assessment Delivery

From Mislevy & Riconscente, in press
How do we choose and present tasks,
and gather and analyze responses?
How do students and tasks
actually interact?
How do we report examinee
performance?
PADI Design Patterns



Organized around elements of assessment argument
Narrative structures for assessing pervasive kinds of
knowledge / skill / capabilities
Based on research & experience , e.g.
» PADI: Design under constraint, inquiry cycles, representations
» Compliance w. Grice’s maxims; cause/effect reasoning; giving
spoken directions

Suggest design choices that apply to different contexts,
levels, purposes, formats
» Capture experience in structured form
» Organized in terms of assessment argument
October 13, 2006
ECOLT 2006
Slide 15
A Design Pattern Motivated by
Grice’s Relation Maxims
Attribute
Value(s)
Name
Grice’s Relation Maxim—Responding to a Request
Summary
In this design pattern, an examinee will demonstrate following
Grice’s Relation Maxim in a given language, by producing or
selecting a response in a situation that presents a request for
information (e.g., conversation).
Central claims
In contexts/situations with xxx characteristics, can formulate and
respond to representations of implicature from referents .

semantic implication

pragmatic implication
Additional
knowledge that
may be at issue
Substantive knowledge in domain; Familiarity with cultural models;
Knowledge of language
October 13, 2006
ECOLT 2006
Slide 16
Grice’s Relation Maxims
Characteristic
features
The stimulus situation needs to present a request for relevant
information to the examinee, either explicitly or implicitly.
Variable task
features
Production or choice as response?
If production, oral or written production required?
If oral, single response to a preconfigured situation or part of an
evolving conversation?
If evolving conversation, open or structured interview?
Formality of prepackaged products (multiple choice, video taped
conversations, written questions or conversations, one to one or
more conversations which are prepared by interviewers)
Formality of information and task (concrete or abstract, immediate
or remote, information requiring retrieval or transformation, familiar
or unfamiliar setting and topic, written or spoken)
If prepackaged speech stimulus: length, content, difficulty of
language, explicitness of request, degree of cultural dependence.
Content of situation (familiar or unfamiliar, degree of difficulty)
Time pressure (e.g., time for planning and response)
Opportunity for control the conversation
October 13, 2006
ECOLT 2006
Slide 17
Grice’s Relation Maxims
Potential
performances and
work products
Constructed oral response
Constructed written or typed-in response
Answer to a multiple-choice question where alternatives vary
Potential features of Whether a student can formulate representations of implicature, as
performance to
they are required in the given situation.
evaluate
Whether a student can make a conversational contribution or express
the idea towards the accepted direction.
Whether a student provides the relevant information as is required.
Whether quality of choice among alternatives offered for a
production in a given situation satisfies the Relation Maxim.
Potential rubrics
(later slide)
Examples
(in paper)
October 13, 2006
ECOLT 2006
Slide 18
Some Relationships between Design
Patterns and Other TD Tools

Conceptual models for proficiency &
Task characteristic frameworks
» Grist for design choices about KSAs & task
features
» DPs present integrated design space

Test specifications
» DPs for generating argument, design choices
» Test specs for documenting, specifying choices
October 13, 2006
ECOLT 2006
Slide 19
Domain Analysis
What is important about this domain?
What work and situations are central in this domain?
What KRs are central to this domain?
Domain Modeling
How do we represent key aspects of the domain in
terms of assessment argument.
Conceptual Assessment
Framework
Assessment
Implementation
Design structures: Student, evidence, and
task models
How do we choose and present tasks,
and gather and analyze responses?
More on the Conceptual
Assessment Framework
Assessment Delivery

From Mislevy & Riconscente, in press
How do students and tasks
actually interact?
How do we report examinee
performance?
Evidence-centered assessment design
Technical
specs
that
embody
The three basic models
the elements suggested in the
design pattern
Task Model(s)
Evidence Model(s)
Student Model
Stat model
Evidence
rules
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
October 13, 2006
ECOLT 2006
Slide 21
Evidence-centered assessment design
Conceptual
Representation
The
three basic
models
Task Model(s)
Evidence Model(s)
Student Model
Stat model
Evidence
rules
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
October 13, 2006
ECOLT 2006
Slide 22
Screen shot of user interface
User-Interface Representation
October 13, 2006
ECOLT 2006
Slide 23
High-level UML Representation
of the PADI Object Model
UML Representation
(sharable data structures,
“behind the screen”)
October 13, 2006
ECOLT 2006
Slide 24
Evidence-centered assessment design
What complex of knowledge, skills, or other
attributes should be assessed?
Task Model(s)
Evidence Model(s)
Student Model
Stat model
Evidence
rules
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
October 13, 2006
ECOLT 2006
Slide 25
The NetPass Student Model
Multidimensional measurement
model with selected aspects of
proficiency
Networking Disciplinary Knowledge(SM)
Network Modeling(SM)
Network Proficiency(SM)
Can use same student model with
different tasks.
Design(SM)
October 13, 2006
Implement/Configure(SM)
ECOLT 2006
Troubleshoot(SM)
Slide 26
Evidence-centered assessment design
What behaviors or performances should reveal
those constructs?
Task Model(s)
Evidence Model(s)
Student Model
Stat model
Evidence
rules
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
October 13, 2006
ECOLT 2006
Slide 27
Evidence-centered assessment design

What behaviors or performances should reveal
From
unique
student
work
product
to
those constructs?
evaluations of observable variables—
i.e., task-level “scoring”
Task Model(s)
Evidence Model(s)
Student Model
Stat model
Evidence
rules
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
October 13, 2006
ECOLT 2006
Slide 28
Skeletal Rubric for
Satisfaction of Quality Maxims
4 Responses and explanations are relevant as required for current purposes of
the exchange and neither more elaborated than appropriate or insufficient for
the context. They fulfill the demands of the task with at most minor lapses in
completeness. They are appropriate for the task and exhibit coherent discourse.
3 Responses and explanations address the task appropriately and are relevant as
required for current purposes of the exchange, but they may either more
elaborated than required or fall short of being fully developed.
2 The responses and explanations are connected to the task, but are either
markedly excessive in information supplied or not very relevant to the current
purpose of the exchange. Some relevant information might be missing or
inaccurately cast.
1 The responses and explanations are either grossly relevant or are very limited
in content or coherence. In either case they may be only minimally connected
to the task.
0 Speaker makes no attempt to respond or response is unrelated to the topic. A
writing response at this level merely copies sentences from the topic, rejects the
topic or is otherwise not connected to the topic. A spoken response is not
connected to the direct or implied request for information.
October 13, 2006
ECOLT 2006
Slide 29
Notes re Observable Variables
Re-usable (tailorable) to different tasks &
projects
 Can be multiple aspects of performance
being rated.
 May be 1-1 relationship with Student model
Variables, but need not be.
 That is, there can be multiple aspects of
proficiency that are involved in probability of
high / satisfactory/ certain style of response

October 13, 2006
ECOLT 2006
Slide 30

Evidence-centered assessment design
Values of observable variables used to
What behaviors or performances should reveal
update probability distributions for
those constructs?
student-model variables via psychometric
model—i.e., test-level scoring.
Task Model(s)
Evidence Model(s)
Student Model
Stat model
Evidence
rules
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
October 13, 2006
ECOLT 2006
Slide 31
An NetPass EvidenceModel Fragment for Design
Measurement models indicate which SMVs,
Design(SM)
in which combinations, affect which
DK and DesignE
Correctness of OutcomeE (OB)
observables. Task features influence which
Networking Disciplinary Knowledge(SM)
ones and how much, in structured
Design ContextEmodels. Quality of RationaleE (OB)
measurement
Re-usable conditional-probability fragments
and variable names for different tasks with
the same evidentiary structure.
October 13, 2006
ECOLT 2006
Slide 32
Evidence-centered assessment design

What tasks or situations should elicit those
behaviors?
Task Model(s)
Evidence Model(s)
Student Model
Stat model
Evidence
rules
1. xxxxxxxx 2. xxxxxxxx
3. xxxxxxxx 4. xxxxxxxx
5. xxxxxxxx 6. xxxxxxxx
October 13, 2006
ECOLT 2006
Slide 33
Representations to the student,
and sources of variation
October 13, 2006
ECOLT 2006
Slide 34
Task Specification Template Determining Key Features (Wizards)

Setting
Corporation
Conference Center
University

Building Length
Less than 100m
More than 100m

Ethernet Standard
10BaseT
100BaseT

Subgroup Name
Teacher
Student
Customer

Bandwidth for a Subgroup Drop
10Mbps
100Mbps

Growth Requirements
Given
NA
October 13, 2006
ECOLT 2006
Slide 35
Structured Measurement Models

Examples of models
»Multivariate Random Coefficients Multinomial Logit
Model (MRCMLM; Adams, Wilson, & Wang, 1997)
»Bayes nets (Mislevy, 1996)
»General Diagnostic Model (von Davier & Yamamoto)
By relating task characteristics to difficulty with
respect to different aspects of proficiency, create
tasks with known properties.
 Can create families of tasks around same
evidentiary frameworks; e.g., For “read & write”
tasks, can vary characteristics of texts,
directives, audience, purpose.

October 13, 2006
ECOLT 2006
Slide 36
Structured Measurement Models
Articulated connection between task
characteristics and models of proficiency
 Moves beyond “modeling difficulty”

»Traditional test theory a bottleneck in multivariate
environment

Dealing with “complexity factors” and “difficulty
factors” (Robinson)
»Model complexity factors as covariates for difficulty
parameters wrt those aspects of proficiency they impact
»Model difficulty factors as either SMVs, if target of
inference, or as noise, if nuisance.
October 13, 2006
ECOLT 2006
Slide 37
Advantages: A framework that…
Guides task and test construction (Wizards)
 Provides high efficiency and scalability
 By relating task characteristics to difficulty,
allows creating tasks with targeted properties

Promotes re-use of conceptual structures
(DPs, arguments) in different projects
 Promotes re-use of machinery in different
projects

October 13, 2006
ECOLT 2006
Slide 38
Evidence of effectiveness
 Cisco
»Certification & training assessment
»Simulation-based assessment tasks
 IMS/QTI
»Conceptual model for standards for data
structures for computer-based testing
 ETS
»TOEFL
»NBPTS
October 13, 2006
ECOLT 2006
Slide 39
Conclusion
Isn’t this just a bunch of new
words for describing what we
already do?
October 13, 2006
ECOLT 2006
Slide 40
An answer (Part 1)
No.
October 13, 2006
ECOLT 2006
Slide 41
An answer (Part 2)
An explicit, general framework makes
similarities and implicit principles explicit:
» To better understand current assessments…
» To design for new kinds of assessment…
– Tasks that tap multiple aspects of proficiency
– Technology-based tasks (e.g., simulations)
– Complex observations, student models, evaluation
» To foster re-use, sharing, & modularity
– Concepts & arguments
– Pieces of machinery & processes (QTI)
October 13, 2006
ECOLT 2006
Slide 42
For more information…
www.education.umd.edu/EDMS/mislevy/
Has links to PADI, Cisco, articles, etc.
(e.g., CRESST report on Task-Based
Language Assessment.)
October 13, 2006
ECOLT 2006
Slide 43
Download