The Use of a Co-Design Process in ECD to Support the Development of Large-Scale Assessments Terry Vendlinski Geneva Haertel SRI International CCSSO’s National Conference on Student Assessment Minneapolis, MN June 29, 2012 1 Acknowledgements Research findings and assessment tasks described in this presentation were supported by the following projects: Principled Assessment Design in Inquiry [National Science Foundation, REC-0089122 and REC-0129331]; An Application of Evidence-Centered Design to a State’s Large Scale Science Assessment [National Science Foundation, DRK-12 initiative, DRL-0733172];Principled Assessment Science Assessment Designs for Students with Disabilities [Institute of Education Sciences, US Department of Education, R324A070035]; Applying EvidenceCentered Design to Alternate Assessments in Mathematics for Students with Significant Cognitive Disabilities [US Department of Education, Contract to State of Utah, 09679];Alternate Assessment Design Reading (AAD-R): EvidenceCentered Design for Alternate Assessment [US Department of Education, Contract to State of Idaho, S368A090032]. In addition, SRI International provided Strategic Business Thrust (SBT) funds. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies. 2 Co-Design Process Geneva Haertel – SRI International Robert Mislevy – ETS Britte Cheng – SRI International Angela DeBarger – SRI International Daisy Rutstein – SRI International Terry Vendlinski – SRI International 3 Evidence-Centered Assessment Design Mislevy, Steinberg, & Almond at ETS in late 1990s Cisco / ETS / University of Maryland Principled Assessment Design in Inquiry (PADI) project SRI, University of Maryland, UC Berkeley, FOSS, BioKIDS National Science Foundation ECD for Large-Scale State Assessments SRI, Pearson, University of Maryland, Haney Research & Evaluation, GED Assessment Developers National Science Foundation 4 Evidence-Centered Assessment Design Formal, multiple-layered framework from Messick’s (1994) guiding questions: What complex of knowledge, skills, or other attributes should be assessed? What behaviors or performances should reveal those constructs? What tasks or situations should elicit those behaviors? 5 What is an ECD approach? A process by which evidence is gathered. Uses the framework to document information that supports the validity argument Documents what decisions have been made with regards to the assessment and the justification for those decisions. 6 Co-Design in the Context of ECD What is Co-Design? What sorts of expertise are required? What are the processes that might occur? 7 Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? Domain Modeling How do we represent key aspects of the domain in terms of assessment argument. Conceptualization. Conceptual Assessment Framework Assessment Implementation Assessment Delivery From Mislevy & Riconscente, 2006 Design structures: Student, evidence, and task models. Generativity. Manufacturing “nuts & bolts”: authoring tasks, automated scoring details, statistical models. Reusability. Students interact with tasks, performances evaluated, feedback created. Fourprocess delivery architecture. 8 Domain Analysis Gather substantive information about the domain of interest that has implications for assessment; how knowledge is constructed, acquired, used, communicated. Domain concepts, terminology, tools, knowledge representations, research findings, situations of use (heads up display), patterns of interaction. Representational forms and symbol systems used in domain (e.g., algebraic notation, Punnett squares, maps, computer program interfaces, content standards, concept maps). Could take days or weeks (two-hour blocks) 9 Domain Modeling Express assessment argument in narrative form based on information from Domain Analysis. Specifications of knowledge, skills, or other attributes to be assessed; features of situations that can evoke evidence; kinds of performances that convey evidence. Design patterns; “big ideas”, Toulmin and Wigmore diagrams for assessment arguments; assessment blueprints, ontologies, generic rubrics. Could take from an hour to a day (one to two hour blocks) 10 Design Pattern 11 Design Pattern Attributes Focal Knowledge, Skills & Attributes (KSAs) The primary KSAs targeted by the design pattern. What we want to make inferences about. Additional KSAs Other KSAs that may be required for successful performance on the assessment tasks. Potential Observations Features of the things students say, do, or make. Potential Work Products Some possible things one could see students doing that would give evidence about the KSAs. 12 Design Pattern Attributes Characteristic Features Aspects of assessment situations that are likely to evoke the desired evidence. Variable Features Aspects of assessment situations that can be varied in order to shift difficulty or emphasis. 13 Conceptual Assessment Framework Express assessment argument in structures and specifications for tasks and tests, evaluation procedures, measurement models. Student, evidence, and task models; student, observable, and task variables; rubrics; measurement models; test assembly specifications; task templates and task specifications. Algebraic and graphical representations of measurement models; task templates and task specifications; item generation models; generic rubrics; algorithms for automated scoring. 14 Can take from days to weeks Visual CAF Task Model(s) Evidence Model(s) Student Model Stat model Evidence rules 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx 5. xxxxxxxx 6. xxxxxxxx 15 Task Model Template 16 Assessment Implementation Implement assessment, including presentation-ready tasks and calibrated measurement models Item writing and task materials (including all materials, tools, affordances); pilot test data to hone evaluation procedures and fit measurement models. Coded algorithms for rendering tasks, interacting with examinees and evaluating work products; tasks as displayed; IMS/QTI/APIP representation of materials; ASCII files of item parameters. Time required varies according to number and complexity of items and tasks. 17 Assessment Delivery Coordinate interactions of students and tasks: task-and test-level scoring; reporting. Tasks as presented; work products as created; scores as evaluated. Renderings of materials; numerical and graphical summaries for individual and groups; specifications for results files. 18 Why Co-design? Co-Design can improve at any / all the ECD layers. Not all layers are required. Co-design may be most powerful at top three layers. Can be complex … so requires structure May take more time … and produce better products. 19 More Information Visit us: padi.sri.com Email us: Geneva.Haertel@sri.com Terry.Vendlinski@sri.com 20