Prospectus for the PADI design framework in language testing Robert J. Mislevy Geneva D. Haertel Professor of Measurement & Statistics Assessment Research Area Director University of Maryland SRI International ECOLT 2006, October 13, 2006, Washington, D.C. PADI is supported by the National Science Foundation under grant REC0129331. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. October 13, 2006 ECOLT 2006 Slide 1 Some Challenges in Language Testing Sorting out evidence about interacting aspects of knowledge & proficiency in complex performances Understanding the impact of “complexity factors” and “difficulty factors” on inference Scaling up efficiently to high volume tests— task creation, scoring, delivery Creating valid & cost-effective low volume tests October 13, 2006 ECOLT 2006 Slide 2 Evidence-Centered Design Evidence-centered assessment design (ECD) provides language, concepts, knowledge representations, data structures, and supporting tools to help design and deliver educational assessments, all organized around the evidentiary argument an assessment is meant to embody. October 13, 2006 ECOLT 2006 Slide 3 The Assessment Argument What kinds of claims do we want to make about students? What behaviors or performances can provide us with evidence for those claims? What tasks or situations should elicit those behaviors? Generalizing from Messick (1994) October 13, 2006 ECOLT 2006 Slide 4 Evidence-Centered Design With Linda Steinberg & Russell Almond at ETS » The Portal project / TOEFL » NetPASS with Cisco (computer network design & troubleshooting) Principled Assessment Design for Inquiry (PADI) » Supported by NSF (co-PI: Geneva Haertel, SRI) » Focus on science inquiry—e.g., investigations » Models, tools, examples October 13, 2006 ECOLT 2006 Slide 5 Some allied work Cognitive design for generating tasks (Embretson) Model-based assessment (Baker) Analyses of task characteristics—test and TLU (Bachman & Palmer) Test specifications (Davidson & Lynch) Constructing measures (Wilson) Understanding by design (Wiggins) Integrated Test Design, Development, and Delivery (Luecht) October 13, 2006 ECOLT 2006 Slide 6 Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? Domain Modeling How doKey we represent key aspects of the domain in ideas: terms of assessment argument. Conceptual Assessment Framework Assessment Implementation Assessment Delivery Explicit relationships Explicit structures Design structures: Student, evidence, and Generativity task models Re-usability Recombinability How do we choose and present tasks, and gather and analyze responses? Interoperability How do students and tasks actually interact? How do we report examinee performance? Layers in the assessment enterprise From Mislevy & Riconscente, in press Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? Domain Modeling How do we represent key aspects of the domain in terms of assessment argument. Expertise research, task analysis, curriculum, use, critical incident Design structures: target Student, evidence, and task models analysis, ethnographic studies, etc. Conceptual Assessment Framework How do we choose and present tasks, and gather and analyze responses? Assessment Implementation In language assessment, importance of… Assessment Delivery •Psycholinguistics How do students and tasks •Sociolinguistics actually interact? •Target language use How do we report examinee performance? From Mislevy & Riconscente, in press Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? Tangible stuff Domain Modeling How do we represent key aspects of the domain in terms of assessment argument. e.g., what gets made and how it Designin structures: Student, evidence, and Conceptual Assessment operates testing situation task models Framework Assessment Implementation Assessment Delivery From Mislevy & Riconscente, in press How do we choose and present tasks, and gather and analyze responses? How do students and tasks actually interact? How do we report examinee performance? Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? Domain Modeling How do we represent key aspects of the domain in terms of assessment argument. How do you get from Conceptual Assessment Framework Assessment Implementation Assessment Delivery From Mislevy & Riconscente, in press here to here? Design structures: Student, evidence, and task models How do we choose and present tasks, and gather and analyze responses? How do students and tasks actually interact? How do we report examinee performance? Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? Domain Modeling How do we represent key aspects of the domain in terms of assessment argument. Conceptual Assessment Framework Assessment Implementation We will focus today on two “hidden” layers: Assessment Delivery Design structures: Student, evidence, and task models From Mislevy & Riconscente, in press How do we choose and present tasks, and gather and analyze responses? How do students and tasks actually interact? How do we report examinee performance? Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? Domain Modeling How do we represent key aspects of the domain in terms of assessment argument. Design structures: Student, evidence, and task models Conceptual Assessment Framework Assessment Implementation We will focus today on two “hidden” layers: Assessment Delivery How do we choose and present tasks, and gather and analyze responses? How do students and tasks actually interact? How do we report examinee performance? Domain modeling, which concerns the Assessment Argument From Mislevy & Riconscente, in press Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? Domain Modeling How do we represent key aspects of the domain in terms of assessment argument. Conceptual Assessment Framework Assessment Implementation Assessment Delivery Design structures: Student, evidence, and task models How do we choose and present tasks, and gather and analyze responses? How do students and tasks And the Conceptual Assessment actually interact? How do we report examinee Framework, which performance? concerns generative & re-combinable design schemas From Mislevy & Riconscente, in press Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? Domain Modeling How do we represent key aspects of the domain in terms of assessment argument. Conceptual Assessment Framework Design structures: Student, evidence, and task models More on the Assessment Argument Assessment Implementation Assessment Delivery From Mislevy & Riconscente, in press How do we choose and present tasks, and gather and analyze responses? How do students and tasks actually interact? How do we report examinee performance? PADI Design Patterns Organized around elements of assessment argument Narrative structures for assessing pervasive kinds of knowledge / skill / capabilities Based on research & experience , e.g. » PADI: Design under constraint, inquiry cycles, representations » Compliance w. Grice’s maxims; cause/effect reasoning; giving spoken directions Suggest design choices that apply to different contexts, levels, purposes, formats » Capture experience in structured form » Organized in terms of assessment argument October 13, 2006 ECOLT 2006 Slide 15 A Design Pattern Motivated by Grice’s Relation Maxims Attribute Value(s) Name Grice’s Relation Maxim—Responding to a Request Summary In this design pattern, an examinee will demonstrate following Grice’s Relation Maxim in a given language, by producing or selecting a response in a situation that presents a request for information (e.g., conversation). Central claims In contexts/situations with xxx characteristics, can formulate and respond to representations of implicature from referents . semantic implication pragmatic implication Additional knowledge that may be at issue Substantive knowledge in domain; Familiarity with cultural models; Knowledge of language October 13, 2006 ECOLT 2006 Slide 16 Grice’s Relation Maxims Characteristic features The stimulus situation needs to present a request for relevant information to the examinee, either explicitly or implicitly. Variable task features Production or choice as response? If production, oral or written production required? If oral, single response to a preconfigured situation or part of an evolving conversation? If evolving conversation, open or structured interview? Formality of prepackaged products (multiple choice, video taped conversations, written questions or conversations, one to one or more conversations which are prepared by interviewers) Formality of information and task (concrete or abstract, immediate or remote, information requiring retrieval or transformation, familiar or unfamiliar setting and topic, written or spoken) If prepackaged speech stimulus: length, content, difficulty of language, explicitness of request, degree of cultural dependence. Content of situation (familiar or unfamiliar, degree of difficulty) Time pressure (e.g., time for planning and response) Opportunity for control the conversation October 13, 2006 ECOLT 2006 Slide 17 Grice’s Relation Maxims Potential performances and work products Constructed oral response Constructed written or typed-in response Answer to a multiple-choice question where alternatives vary Potential features of Whether a student can formulate representations of implicature, as performance to they are required in the given situation. evaluate Whether a student can make a conversational contribution or express the idea towards the accepted direction. Whether a student provides the relevant information as is required. Whether quality of choice among alternatives offered for a production in a given situation satisfies the Relation Maxim. Potential rubrics (later slide) Examples (in paper) October 13, 2006 ECOLT 2006 Slide 18 Some Relationships between Design Patterns and Other TD Tools Conceptual models for proficiency & Task characteristic frameworks » Grist for design choices about KSAs & task features » DPs present integrated design space Test specifications » DPs for generating argument, design choices » Test specs for documenting, specifying choices October 13, 2006 ECOLT 2006 Slide 19 Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? Domain Modeling How do we represent key aspects of the domain in terms of assessment argument. Conceptual Assessment Framework Assessment Implementation Design structures: Student, evidence, and task models How do we choose and present tasks, and gather and analyze responses? More on the Conceptual Assessment Framework Assessment Delivery From Mislevy & Riconscente, in press How do students and tasks actually interact? How do we report examinee performance? Evidence-centered assessment design Technical specs that embody The three basic models the elements suggested in the design pattern Task Model(s) Evidence Model(s) Student Model Stat model Evidence rules 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx October 13, 2006 ECOLT 2006 Slide 21 Evidence-centered assessment design Conceptual Representation The three basic models Task Model(s) Evidence Model(s) Student Model Stat model Evidence rules 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx October 13, 2006 ECOLT 2006 Slide 22 Screen shot of user interface User-Interface Representation October 13, 2006 ECOLT 2006 Slide 23 High-level UML Representation of the PADI Object Model UML Representation (sharable data structures, “behind the screen”) October 13, 2006 ECOLT 2006 Slide 24 Evidence-centered assessment design What complex of knowledge, skills, or other attributes should be assessed? Task Model(s) Evidence Model(s) Student Model Stat model Evidence rules 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx October 13, 2006 ECOLT 2006 Slide 25 The NetPass Student Model Multidimensional measurement model with selected aspects of proficiency Networking Disciplinary Knowledge(SM) Network Modeling(SM) Network Proficiency(SM) Can use same student model with different tasks. Design(SM) October 13, 2006 Implement/Configure(SM) ECOLT 2006 Troubleshoot(SM) Slide 26 Evidence-centered assessment design What behaviors or performances should reveal those constructs? Task Model(s) Evidence Model(s) Student Model Stat model Evidence rules 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx October 13, 2006 ECOLT 2006 Slide 27 Evidence-centered assessment design What behaviors or performances should reveal From unique student work product to those constructs? evaluations of observable variables— i.e., task-level “scoring” Task Model(s) Evidence Model(s) Student Model Stat model Evidence rules 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx October 13, 2006 ECOLT 2006 Slide 28 Skeletal Rubric for Satisfaction of Quality Maxims 4 Responses and explanations are relevant as required for current purposes of the exchange and neither more elaborated than appropriate or insufficient for the context. They fulfill the demands of the task with at most minor lapses in completeness. They are appropriate for the task and exhibit coherent discourse. 3 Responses and explanations address the task appropriately and are relevant as required for current purposes of the exchange, but they may either more elaborated than required or fall short of being fully developed. 2 The responses and explanations are connected to the task, but are either markedly excessive in information supplied or not very relevant to the current purpose of the exchange. Some relevant information might be missing or inaccurately cast. 1 The responses and explanations are either grossly relevant or are very limited in content or coherence. In either case they may be only minimally connected to the task. 0 Speaker makes no attempt to respond or response is unrelated to the topic. A writing response at this level merely copies sentences from the topic, rejects the topic or is otherwise not connected to the topic. A spoken response is not connected to the direct or implied request for information. October 13, 2006 ECOLT 2006 Slide 29 Notes re Observable Variables Re-usable (tailorable) to different tasks & projects Can be multiple aspects of performance being rated. May be 1-1 relationship with Student model Variables, but need not be. That is, there can be multiple aspects of proficiency that are involved in probability of high / satisfactory/ certain style of response October 13, 2006 ECOLT 2006 Slide 30 Evidence-centered assessment design Values of observable variables used to What behaviors or performances should reveal update probability distributions for those constructs? student-model variables via psychometric model—i.e., test-level scoring. Task Model(s) Evidence Model(s) Student Model Stat model Evidence rules 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx October 13, 2006 ECOLT 2006 Slide 31 An NetPass EvidenceModel Fragment for Design Measurement models indicate which SMVs, Design(SM) in which combinations, affect which DK and DesignE Correctness of OutcomeE (OB) observables. Task features influence which Networking Disciplinary Knowledge(SM) ones and how much, in structured Design ContextEmodels. Quality of RationaleE (OB) measurement Re-usable conditional-probability fragments and variable names for different tasks with the same evidentiary structure. October 13, 2006 ECOLT 2006 Slide 32 Evidence-centered assessment design What tasks or situations should elicit those behaviors? Task Model(s) Evidence Model(s) Student Model Stat model Evidence rules 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx October 13, 2006 ECOLT 2006 Slide 33 Representations to the student, and sources of variation October 13, 2006 ECOLT 2006 Slide 34 Task Specification Template Determining Key Features (Wizards) Setting Corporation Conference Center University Building Length Less than 100m More than 100m Ethernet Standard 10BaseT 100BaseT Subgroup Name Teacher Student Customer Bandwidth for a Subgroup Drop 10Mbps 100Mbps Growth Requirements Given NA October 13, 2006 ECOLT 2006 Slide 35 Structured Measurement Models Examples of models »Multivariate Random Coefficients Multinomial Logit Model (MRCMLM; Adams, Wilson, & Wang, 1997) »Bayes nets (Mislevy, 1996) »General Diagnostic Model (von Davier & Yamamoto) By relating task characteristics to difficulty with respect to different aspects of proficiency, create tasks with known properties. Can create families of tasks around same evidentiary frameworks; e.g., For “read & write” tasks, can vary characteristics of texts, directives, audience, purpose. October 13, 2006 ECOLT 2006 Slide 36 Structured Measurement Models Articulated connection between task characteristics and models of proficiency Moves beyond “modeling difficulty” »Traditional test theory a bottleneck in multivariate environment Dealing with “complexity factors” and “difficulty factors” (Robinson) »Model complexity factors as covariates for difficulty parameters wrt those aspects of proficiency they impact »Model difficulty factors as either SMVs, if target of inference, or as noise, if nuisance. October 13, 2006 ECOLT 2006 Slide 37 Advantages: A framework that… Guides task and test construction (Wizards) Provides high efficiency and scalability By relating task characteristics to difficulty, allows creating tasks with targeted properties Promotes re-use of conceptual structures (DPs, arguments) in different projects Promotes re-use of machinery in different projects October 13, 2006 ECOLT 2006 Slide 38 Evidence of effectiveness Cisco »Certification & training assessment »Simulation-based assessment tasks IMS/QTI »Conceptual model for standards for data structures for computer-based testing ETS »TOEFL »NBPTS October 13, 2006 ECOLT 2006 Slide 39 Conclusion Isn’t this just a bunch of new words for describing what we already do? October 13, 2006 ECOLT 2006 Slide 40 An answer (Part 1) No. October 13, 2006 ECOLT 2006 Slide 41 An answer (Part 2) An explicit, general framework makes similarities and implicit principles explicit: » To better understand current assessments… » To design for new kinds of assessment… – Tasks that tap multiple aspects of proficiency – Technology-based tasks (e.g., simulations) – Complex observations, student models, evaluation » To foster re-use, sharing, & modularity – Concepts & arguments – Pieces of machinery & processes (QTI) October 13, 2006 ECOLT 2006 Slide 42 For more information… www.education.umd.edu/EDMS/mislevy/ Has links to PADI, Cisco, articles, etc. (e.g., CRESST report on Task-Based Language Assessment.) October 13, 2006 ECOLT 2006 Slide 43