1st Annual Data Summit: Comprehensive Assessment Systems & Formative Assessment David Abrams Sullivan County BOCES 8/20/2013 1 Introduction ● Validity is a process not a product. ● Design assessments to function as a lever for good instructional practices. ● Consciously design assessments so that the data produced can be used to: inform instruction; provide evidence of student achievement and/or growth; and facilitate the transition to the Common Core College & Career Ready Standards. ● Document the Process. 2 Assessment System Design: Key Questions/Bright Lines 1. What do I want to learn from this assessment? 2. Who will use the information gathered from this assessment? 3. What action steps will be taken as a result of this assessment? 4. What professional development or support structures should be in place to ensure the action steps are taken appropriately? 5. How will student learning improve as a result of using this assessment and will it improve more than if the assessment were not used? (Perie, Marion, & Gong, 2009) 3 Comprehensive Assessment System: Components Summative: given one time at the end of the semester or school year to evaluate students’ performance against a defined set of content standards. Can be used for accountability or to inform policy and/or can be teacher administered for grading purposes (Perie, Marion, & Gong, 2009). 4 Comprehensive Assessment System: Components Interim: Assessments administered during instruction to evaluate students’ knowledge and skills relative to a specific set of academic goals in order to inform policymaker or educator decisions at the classroom, school, or district level. The specific interim assessment designs are driven by the purposes and intended uses, but the results of any interim assessment must be reported in a manner allowing aggregation across students, occasions, or concepts (Perie, Marion, & Gong, 2009). 5 Comprehensive Assessment System: Components Interim Con’t: Key components of interim assessments are: 1) they evaluate students’ knowledge and skills relative to a specific set of academic goals; and 2) they are designed to inform decisions at both the classroom and beyond the classroom level (Perie, Marion, & Gong, 2009). 6 Comprehensive Assessment System: Components Formative: used by teachers to diagnose where students are in their learning, where gaps in knowledge and understanding exist, and to help teachers and students improve student learning. The assessment is embedded within the learning activity and linked directly to the current unit of instruction. 7 Comprehensive Assessment System: Components Formative assessments are used most frequently and have the smallest scope and shortest cycle while summative are administered least frequently and have largest scope and cycle. Interim fall between the two (Perie, Marion, & Gong, 2009). 8 Formative Assessment: Table Discussion-Information Processing How is Formative Assessment being used in your District, Schools, &/or Classroom? What constitutes a “quality formative assessment and how do you know?” What are your goals for implementing Formative Assessment in your school? Does your district have Professional Learning Communities/Whole Faculty Study Groups, vertical & horizontal, in place? Are you comfortable using data to inform instruction? 9 Formative Assessment: Perie, Marion, Gong ● Formative assessment is used by classroom teachers to diagnose where students are in their learning, where gaps in knowledge and understanding exist, and how to help teachers and student improve student learning. ● The assessment is embedded within the learning activity and linked directly to the current unit of instruction. ● The tasks presented may vary from one student to another depending on the teacher’s judgment. 10 Formative Assessment: Perie, Marion, Gong ● Providing corrective feedback, modifying instruction to improve the student’s understanding, or indicating areas of further instruction are essential aspects of a classroom formative assessment. 11 Formative Assessment ● …true meaning of formative assessment: an activity designed to give meaningful feedback to students and teachers and to improve professional practice and student achievement Reeves (2009). ● Three things must occur for an assessment to be formative: assessment is used to identify students who are experiencing difficulty; those students are provided additional time and support to acquire the intended skill or concept, and the students are given another opportunity to demonstrate what they have learned (DuFour, Eaker, and Karhanek 2010). 12 Formative Assessment ● Formative assessment is a systematic process to continuously gather evidence and provide feedback about learning while instruction is underway. ● The feedback identifies the gap between a student’s current level of learning and a desired learning goal. ● Teachers elicit evidence about student learning using a variety of methods and strategies, e.g. observation, questioning, dialogue, demonstration, and written response. (Heritage, et al 2009) 13 Formative Assessment ● Teachers must examine the evidence from the perspective of what it shows about student conceptions, misconceptions, skills, and knowledge. ● They need to infer the gap between students’ current learning and desired instructional goals, identifying students’ emerging understanding or skills so they can modify instruction. ● For assessment to be formative, action must be taken to close the gap based on evidence solicited. (Heritage, et al 2009) 14 Formative Assessment What We Know: ● It is not a kind of test. ● Formative Assessment practice, when implemented effectively, can have powerful effects on learning. ● Formative Assessment involves teachers making adjustments to their instruction based on evidence collected, and providing students with feedback that advances learning. ● Students participate in the practice of formative assessment through self and peer-assessment. (Heritage, 2011) 15 Formative Assessment: Teacher’s Role ● Effective when teachers are clear about the intended learning goals for a lesson: this means focusing on what students will learn, as opposed to what they will do. ● Teachers need to share learning goal with students. ● Teachers need to communicate the indicators of progress toward the learning goal. ● There is no single way to collect formative evidence because formative assessment is not a specific kind of test. (Heritage, 2011) 16 Formative Assessment: Student’s Role ● Student’s role begins when they have a clear conception of the learning target. ● In self-assessment, students engage in metacognitive activity which involves students in thinking about their own learning while they are learning. ● They generate feedback that allows them to make adjustments to their learning strategies. 17 Formative Assessment: Student’s Role ● It is important to include peer-assessment where students give feedback to their classmates. ● Students use the feedback; it is important that students have to both reflect on their learning and use the feedback advance learning. (Heritage, 2011) 18 Formative Assessment: Evidence Collection ● Whatever methods teachers use to elicit evidence of learning, it should yield information that is actionable by them and their students. ● Evidence collection is a systematic process and needs to be planned so that teachers have a constant stream of information tied to indicators of progress. (Heritage, 2011) 19 Formative Assessment: Feedback ● Feedback obtained from planned or spontaneous evidence is an essential resource for teachers to shape new learning through adjustments in their instructions. ● Feedback that teacher provides to students is also an essential resource so that student can take active steps to advance their own learning. (Heritage, 2011) 20 Common Formative Assessment ● Common assessment refers to those assessments given by teacher teams who teach the same content or grade level; those with “collective responsibiloity for the learning of a group of students who are expected to acquire the same knowledge and skills.” ● No teacher can opt out of the process: common assessment use the same instrument or a common process utilizing the same criteria for determining the quality of student work. (DuFour et al., 2010) 21 Common Formative Assessment: Benefits ● Promote efficiency for teachers ● Promote equity for students ● Provide an effective strategy for determining whether the guaranteed curriculum is being taught and, learned. ● Inform practice of individual teachers ● Build a team’s capacity to improve its program ● Facilitate a systematic, collective response to students who are experiencing difficulty ● Tool for changing adult behavior and practice (Bailey & Jakicic, 2012) 22 Formative Assessment: Table Discussion-Information Processing Do these views of Formative Assessment converge with your’s? Why/Why Not? What strikes you as most important given the key points regarding Formative Assessment? How do you see implementing assessment strategies that effectively utilize the crucial aspects of Formative Assessment? Are you comfortable designing Common Formative Assessments? 23 Formative Assessment: Evidence to Action-G-Study ● Heritage et al determined that there is little research to evaluate teachers’ ability to adapt instruction based on assessment of student knowledge and understanding. ● Research has shown, using math, that moving from evidence to action may not always be the seamless process formative assessment demands. ● G-study results provide data showing teachers do better at drawing inferences of student levels of understanding from assessment evidence, while having difficulties in deciding next instructional steps. 24 Formative Assessment: Evidence to Action-G-Study ● Heritage et al conducted a G-Study using 3 mathematical concepts as the instructional learning goal. ● The teachers’ pedagogical knowledge in mathematics was the object of measurement. The study was designed to provide information about potential sources of variation in measuring teachers' pedagogical knowledge in mathematics. ● The study design implies that there were three sources of score variability: rater, mathematics principle, and type of task. 25 Formative Assessment: Evidence to Action-G-Study ● Rater: Study used performance tasks; once source of variance was the possibility of score variation between raters due to interpretation and application of rubric and how stringent/lenient a rater may be. ● Principle: different types of domain specific principles may cause variance due to a given teachers’ preparation, a teacher may have more knowledge about one principle than others. (Study evaluated 3 principles: distributive property, solving equations, & rational number equivalence.) 26 Formative Assessment: Evidence to Action-G-Study Task: potential source of variability. Study focused on 3 types of tasks: identifying key principles; evaluating student understanding; and planning the next instructional step based on the evaluation of student understanding. 27 Formative Assessment: Evidence to Action-G-Study Results ● Main effect of principle and rater are minimal. Teachers knew the concept and knew how to evaluate the student work to determine learning. ● Important finding: regardless of the math principle, determining next instructional steps based on the examination of student responses tends to be more difficult for teachers. ● If teachers are not clear about what the next steps are to move learning forward, then promise of Formative Assessment to improve student learning is impacted negatively. 28 Formative Assessment: Evidence to Action-G-Study Results & Teacher Support ● Teachers need clear conceptions of how learning progresses in a domain; they need to know what precursor skills and understandings are for a specific instructional goal, what a good performance of the desired goal looks like, and how the skill increases in sophistication from the current level students have reached. ● Learning progressions describe how concepts and skill increase in sophistication in a domain from the most basic to the highest level, showing the trajectory of learning along which students are expected to progress. 29 Formative Assessment: Evidence to Action-G-Study Results & Teacher Support ● From a learning progression, teachers can access the big picture of what students need to learn, they can grasp what the key building blocks of the domain are, while having sufficient detail for planning instruction to meet short term goals. ● Teaches are able to connect Formative Assessment opportunities to short term goals as a means to track student learning. ● Learning progressions alone will not be sufficient. 30 Formative Assessment: Evidence to ActionG-Study Results & Teacher Support ● Teachers need to know what a good performance of the their specific short-term learning goal looks like. They must also know that good performance does not look like. ● Key finding: using assessment information to plan subsequent instruction tends to be the most difficult task for teachers as compared to other tasks. This finding gives rise to the question: can teachers always use formative evidence to effectively “form” action? 31 Formative Assessment: Instructional SupportMarzano, Pickering, and Pollock (2001) 9 highly effective, research-based strategies 1. Identifying similarities and differences 2. Summarizing and note-taking 3. Reinforcing effort and providing recognition 4. Homework and practice 5. Nonlinguistic representations 6. Cooperative learning 7. Setting objectives & Providing Feedback 8. Generating & testing hypothesis 9. Cues, questions, and advanced organizers 32 Formative Assessment: Table Discussion-Information Processing How can you use Professional Learning Communities to support teachers and students when implementing Formative Assessment? Based on Heritage et al’s findings, what type of professional development will you need to support Formative Assessment in your District’s? 33 Formative Assessment: Validity Framework Review Nichols, Meyers, & Burling (2009) Validity Framework for a Formative System: ● Figure 1: General Framework for Evaluating Validity ● Figure 2: Framework using the identification of specific procedural errors ● Figure 3: Framework for a system of reteaching Validity: The degree to which accumulated evidence and theory support specific interpretations of test scores entailed by proposed use of a test (JS Glossary). 34 Formative Assessment: Design ● Need to unwrap standards or evaluate State Testing Data to determine demonstrated areas of instructional need ● Define & Align to State and District Expectations Review Baily Figure 4.1: 5 Steps 1. Focus on Key Words 2. Map it Out 3. Analyze the target 4. Determine big ideas 5. Establish Guiding Questions for Instruction 35 Formative Assessment: Design ● Assessments must provide information about important learning targets that are clear to students and teacher teams. ● Assessments provide timely information for both students and teacher teams. ● Assessment must provide information that tells students and teacher teams what to do next. 36 Formative Assessment: Design ● Determine Assessment types: selected response; constructed response; performance task ● Determine number and balance of items ● Select/design assessment ● Administer, evaluate responses, and redesign instructional strategies 37 Formative Assessment: Design ● Assess again (in original assessment design, create enough items to have more than one assessment) ● Evaluate depth and breath of rigor: Webb’s Depth of Knowledge/Bloom’s Taxonomy ● Utilize PLC to support process and data discussions 38 Cognitive Response Demands: Reading Load Reading Load: The amount and complexity of the textual and visual information provided with an item that an examinee must process and understand in order to respond successfully to an item (Ferrara 2011). 39 Cognitive Response Demands: Reading Load Low Reading Load: May include a small amount of text. Moderate Reading Load: Lower amounts of text and visuals and less complex text. High Reading Load: May include a large amount of text much of which is complex linguistically or complex because of the content area concepts and terminology used (Ferrara, 2011). 40 Cognitive Response Demands: Mathematical Complexity Ranges for NAEP Low Complexity: Items may require examinees to recall a property or recognize a concept; these are straightforward, single operation items. Moderate Complexity: May require examinees to make connections between multiple concepts, multiple operations, and to display flexibility in thinking as they decide how to approach a problem. High Complexity: May require examinees to analyze assumptions made in a mathematical model or to use reasoning, planning, judgment, and creative thought; it assumes that students are familiar with the mathematics concepts and skills required by an item (Ferrara, 2011). 41 Common Core Transition: Academic Language Academic language is used to refer to the form of language expected in contexts such as the exposition of topics in the school curriculum, making arguments, defending propositions, and synthesizing information (Snow, 2010). (See Coxhead’s High-Incidence Academic Word List) 42 Bailey & Jakicic Sample Assessment Plan Learning Target Knowledge Identify similes & metaphors from text Five Matching Explain meaning of common similes & metaphors Develop a narrative paragraph w/both Application Analysis Evaluation Four Multiplechoice One Constructed Response 43 Revised Item Map for Locally Developed Assessments-Formative Assessments Sample Item Map Template: Prior to Assessment Administration Quest # Type (MC, CRQ, ERQ) Point(s) Learning Standard PI CC Literacy Standard Reading Or Math Load Lexile or other Reading Formula AL* Webb DOK Sample Item Map Template: Post Assessment Administration Quest # Type (MC, CRQ, ERQ) Point(s) Learning Std PI CC Literacy Std Reading Or Math Load Lexile or Other Reading Formula AL Webb DOK Item Data *AL: Academic Language 44 Multiple Measures-Joint Standards 13.7: In educational settings, a decision or characterization that will have major impact on a student should not be made on a simple test score. Other relevant information should be taken into account if it will enhance the overall validity of the decision. 45 Multiple Measures Multiple Measures are intended to improve quality of high-stakes decision making so decisions are not based on one single measure. Definition of multiple measures, criteria to evaluate each measure, and how these measures should be combined for use in decision making is not clear. (Henderson-Montero, Julian, & Yen, 2003) 46 Multiple Measures: Examples ● ● ● ● ● ● Test more than one content area Assess a content area using a combination of MC and CRQ formats Assess a content area using an on-demand test and a class based portfolio (writing) Assess school performance using a combination of academic tests and other indicators Make progressively “higher stakes” decisions about schools using a combination of accountability scores and other reviews Use for promotion/graduation processes by meeting certain criteria, even if student does not pass a State’s on-demand test (Gong & Hill, 2001). 47 Multiple Measures: Examples ● ● ● ● ● ● Have several assessment instruments that can be used by students of various proficiency or presentation/response needs Allow students multiple opportunities to retake the test to determine whether they meet minimum cut scores Allow for promotion/graduation Double score every constructed response item on tests used for high school student accountability Assess school performance using an average of at least two years’ of data Assess school performance using as many grades of students as practical. (Gong & Hill, 2001) 48 Multiple Measures ● Need to create a framework for combining multiple measures. Four Categories of Rules: 1. Conjunctive, 2. Compensatory, 3. Mixed conjunctive-compensatory, and 4. Confirmatory (Henderson-Montero, Julian, and Yen 2003; Chester, 2003). 49 Multiple Measures ● ● ● Conjunctive: attainment of a minimum standard (e.g. meeting a designated cut score on a specific exam) Compensatory: weaker performance on one measure can be offset by stronger performance on another. Performance on two or more measures is required before they are combined into a compensatory rule. Mixed conjunctive-compensatory: uses a combination of conjunctive and compensatory approaches; e.g. minimal performance level is required across measures, but beyond minimal level of performance, poorer performance on one measure can be counterbalanced by better performance on other measures. 50 Multiple Measures ● Confirmatory: employs information from one measure to validate or compare information from another, independent measure (e.g. statewide reading performance compared to NAEP). Generally apply a conjunctive rule where minimum performance on independent measure is required (e.g. Regents alternatives, i.e. AP score of Level 3). (Henderson-Montero, Julian, and Yen 2003; Chester, 2003). 51 Multiple Measures: Example Philadelphia's approach to combining multiple measures to reach Elementary promotion decisions. Measures of Different Constructs Different Measures Of the Same Constructs Multiple Opportunities Conjunctive (AND) Test scores in Reading & math; Satisfactory teacher Marks on ALL subject Areas; Successful Completion of Multidisciplinary & Service Learning Projects Satisfactory teacher marks AND test scores Accommodations & Alternate Assessments Compensatory (+/-) Complementary (OR) Multiple-Choice and Open-Ended Sections of SAT-9 SAT 9 OR Citywide Test Summer School; Retest in citywide test SAT-9 OR Spanish-language Aprenda; Citywide test In English OR Spanish (Chester, 2003) 52 Multiple Measures: Challenges ● ● ● ● Technical issues in combining specific assessments and indicators: use of common scale and/or index (e.g. combine test scores from NRT & CRT; state and local (NRT; CRT; other); combine test scores and other indicators; combining nonstandard and/or different combinations of measures. Sufficient data for characterizing school(s). “Valid” interpretations and uses of accountability for effective and fair school improvement & services to individual students (disaggregation of data) (Gong & Hill, 2003). Use of multiple measures does not necessarily improve the reliability and validity of the decisions that are made; it is the logic by which the measures are combined that determines the accuracy and appropriateness of the high-stakes decisions that are made (Chester, 2003). 53 Appendix Materials 54 Cognitive Response Item Coding: Bloom’s Taxonomy Levels Description Level 1: Knowledge Recall & Recognition Level 2: Comprehension Translate, Interpret, & Extrapolate Level 3: Application Use of generalizations in specific instances Level 4: Analysis Determine Relationships Level 5: Synthesis Create new relationships Level 6: Evaluation Exercise of learned judgment 55 Cognitive Response Item Coding: Webb’s Depth of Knowledge Level Description Level 1 Recall: recall of a fact, information, or procedure Level 2 Skill/Concept: use of information or conceptual knowledge, two or more steps, etc. Level 3 Strategic Thinking: requires reasoning, developing plan or a sequence of steps, some complexity, more than one possible answer. Level 4 Extended Thinking: requires an investigation, time to think and process multiple conditions of the problem 56 (www.wcer.wisc.edu) Cognitive Response Item Coding: Bloom & Webb: Side-By-Side Bloom’s Taxonomy Webb’s DOK Knowledge: Recall Comprehension: low level processing Recall Application: Use of abstractions in concrete situations Basic Application of Skill/Concept: use of information, conceptual knowledge, multiple steps Analysis: breakdown of a situation into component parts Strategic Thinking: requires reasoning, a plan, or multi-step processes Synthesis & Evaluation: Putting together elements and parts to form a whole Extended Thinking: requires investigation, time to think, and to process multiple conditions of task 57 Common Core Transition: Academic Language Academic language is designed to be concise, precise, and authoritative. To achieve these goals, it uses sophisticated words and complex grammatical constructions that can disrupt reading comprehension and block learning. Students need help in learning academic vocabulary and how to process academic language if they are to become independent learners (Snow, 2010). (See Coxhead’s High-Incidence Academic Word List) 58 Common Core Transition: Academic Language ● Maintaining the impersonal authoritative stance creates a distanced tone that is often puzzling to adolescent readers and is difficult for them to utilize in their writing. ● Students must have access to the allpurpose academic vocabulary that is used to talk about knowledge and that they will need to use in making their own arguments and evaluating others’ arguments (Snow, 2010). 59 Common Core Transition: Academic Language Formal Register (Formal Academic Tone): language found in most academic, business, & serious nonfiction. It is characterized by: ● Emotional distance between writer and audience; ● Emotional distance between the writer and topic; ● No colloquial language, slang, regionalisms, and dialects; ● Careful attention to conventions of edited American English; and ● Attention to the logical relationships among words and ideas (The St. Martin’s Handbook). 60 Item Types: Key Term Item Format: The variety of test item structures or types that can be used to measure examinees’ knowledge, skills, and abilities; these include: multiple-choice or selected response; open-ended or constructed response; essay; and performance task. (Perie, 2010). 61 Item Types: Key Terms Selected Response Items: Items or tasks in which students choose from among response or answer choices that are presented to them, e.g. MultipleChoice, True/False, Matching, Cloze (Carr & Harris, 2001). 62 Item Types: Conventional MC Three Parts: Stem, Correct Answer, and Distractors (Haladyna, 1999). ● Stem: Stimulus for the response; it should provide a complete idea of the problem to be solved in selecting the right answer. The stem can also be phrased in a partial-sentence format. Whether the stem appears as a question or a partial sentence, it can also present a problem that has several right answers with one option clearly being the best of the right answers. 63 Item Types: Conventional MC Three Parts: Stem, Correct Answer, and Distractors ● Correct Answer: the one and only right answer; it can be a word, phrase, or sentence. ● Distractors: Distractors are wrong answers. Each distractor must be plausible to test-takers who have not yet learned the content the item is supposed to measure. To those who have learned the content, the distractors are clearly wrong choices. Distractors should resemble the correct choice in grammatical form, style, and length. Subtle or blatant clues that give away the correct choice should be avoided. 64 Item Types: Key Terms Constructed Response Item: An exercise for which examinees must create their own responses or products rather than choose a response from an enumerated set. Short-answer items require a few words or a number as an answer, whereas extended-response items require at least a few sentences (Joint Standards, 1999). 65 Item Types: Key Terms Constructed Response Items Con’t: Constructed response items or tasks are used to assess processes or procedural knowledge or to probe for students’ understanding of knowledge and information. Constructed response tasks are often contrasted with selected response items or tasks (Carr & Harris, 2001). 66 Item Types: Key Terms Performance Assessments: Productand behavior-based measurements based on settings designed to emulate real-life contexts or conditions in which specific knowledge or skills are actually applied (Joint Standards, 1999). 67 Item Types: Key Terms Key Components of CR Items: ● Task: A specific item, problem, question, prompt, or assignment ● Response: Any kind of performance to be evaluated, including short/extended answer, essay, presentation, & demonstration. ● Rubric: Scoring criteria used to evaluate responses ● Scorers: People who evaluate responses (ETS, 2005) 68 Item Types: Key Terms Task-Specific Rubric: A set of scoring guidelines specific to a particular task. The criteria are addressed and described in terms of specific content or capacities that can be demonstrated in terms of particular, identified content relevant to the task (Carr & Harris, 2001). 69 Item Types: Key Terms High-Inference Constructed Response Items: Format requires expert judgment about the trait being observed and rubrics are used to evaluate. Many abstract qualities are evaluated this way, e.g. writing ability, organization, style, word choice, and math complex problem solving (e.g. proofs, quadratic equations) (Haladyna, 1999). 70 Item Types: Key Terms Low Inference Constructed Response Items: Format simply involves observation because there is some behavior or answer in mind that is either present or absent (e.g. scaffolded questions, writing conventions, measurements) (Haladyna, 1999). 71 Item Types: High & LowInference CR Attribute High-Inference High Usually Abstract, most valued in education Low-Inference Low Usually concrete Ease of Construction Design of items is complex, involving command or question, conditions for performance, and rubrics Design of items is not as involved as high inference, involving command or question, conditions for performance, and a simple mechanism Cost of Scoring Involves training; expensive Scoring not as complex but costs can still be high Type of Behavior Measured 72 Item Types: High & LowInference CR Attribute High-Inference Low-Inference Reliability Reliability can be a Results can be very problem due to inter- reliable due to rater reliability concrete nature of items Objectivity Can be subjective More objective Bias: Systemic Error Possible threats to validity: over or underrating Seldom yields biased observations (Haladyna, 1999) 73 Choosing An Item Format (MC v. CR) ● Most efficient and reliable way to measure knowledge is with MC format. ● Most direct way to measure skill is via performance, but many mental skills can be tested via MC with a high degree of proximity (statistical relation between CR and MC items of an isolated skill). If the skill is critical to ultimate interpretation, CR is preferable to MC. 74 Choosing An Item Format (MC v. CR) ● When measuring a fluid ability or intelligence, the complexity of such human traits favors CR item formats of complex nature (high-inference) (Haladyna, 1999). 75 Choosing An Item Format: Conclusions about Criterion Measurement Criterion Conclusion About MC & CR Declarative Knowledge Most MC formats provide the same information as an essay, short answer, or completion formats. Critical Thinking MC formats involving vignettes or scenarios provide a good basis for forms of critical thinking. MC format has good fidelity to the more realistic open-ended behavior elicited by CR. (Haladyna, 1999) 76 Choosing An Item Format: Conclusions about Criterion Measurement Criterion Conclusion About MC & CR Problem Solving Ability MC item sets provide a good basis for testing problem solving. However, more research is still needed. Creative Thinking Ability MC format limited. School Abilities (e.g. writing, reading, & mathematics) Performance has the highest fidelity to criterion for these school abilities. MC is good for measuring foundational aspects of fluid abilities, such as declarative knowledge or knowledge of skills. (Haladyna, 1999) 77 Test Validation: Data Collection & Analysis ● Answer Sheet Design & Scanning Procedures (Work w/BOCES & RIC) ● Utilize Item Banking Software ● Depth of data collection: student demographics; scores; and item level data, if possible ● Evaluate/Revise ● Generate trend data ● Review against other data to verify & audit rigor ● Document & Save Analyses 78 Test Validation: Data Collection & Analysis ● ● ● ● ● ● Parallel NYSTP Test Reporting Approaches Report General Performance on Test; recommendation, report %s of students at Levels 1-4 (place 3 cuts into the instrument). Run data for all student groups and then run Title I disaggregation: Race/Special Populations (Special Education & ELLs). If you have multiple Need Resources Capacity districts/schools, run analysis by category. Further analysis: disaggregate by building and class/instructor if possible. For shared exams, try to run a comprehensive analysis of all participating CSDs for general trends and Title I disaggregation (allows for possible benchmarking). 79 Test Validation: Data Collection & Analysis Psychometric Analysis: See Classical Test Analysis in SED ELA/Math 3-8 Technical Manuals. Use baseline measures to inform your analysis. Analysis Includes 4 Primary Elements: 1) Item level statistical information (item response patterns, item difficulty, and item discrimination); 2) Test level data (raw score statistics-mean & SD) & test reliability measures (Cronbach’s Alpha & Feldt-Raju Coefficient); 3) Test speededness (omit rates); & 4) Bias through DIF (differential item functioning). Use p-values to complete Item Maps for use with administrative and instructional staff. 80 Sources American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (1999). Standards for Educational and Psychological Testing. Washington, D.C.: American Educational Research Association. Baily, Kim. & Jakicic, C. (2012). Common Formative Assessment. Bloomington, IN: Solution Tree Press. 81 Sources Gong, Brian & Hill, Richard. (2001). Some Considerations of Multiple Measures in Assessment and School Accountability. CCSSO & US DOE Accountability Conference March 2324, Washington D.C.: 2001. Haladyna, Thomas M. (1999). Developing and Validating Multiple-Choice Test Items. Mahwah, NJ: Lawrence Erlbaum Associates, Publishers. 82 Sources Henderson-Montero, Dianne, Julian, Marc. C., & Yen, Wendy M. (2003). Multiple Perspectives and Multiple Measures: Alternative Design and Analysis Models. Educational Measurement: Issues and Practice. 22. 2, 7-12. Heritage, Margaret, Kim, J., Vendlinski, T. & Herman, J. From Evidence to Action: A Seamless Process in Formative Assessment? Educational Measurement: Issues and Practice, 28.3, 24-31. 83 Sources Heritage, Margaret. (2011). Formative Assessment: An Enabler of Learning. Retrieved from www.cse.ucla.edu. Herman, Joan L. and Choi, Kilchan (2012). Validation of ELA and Mathematics Assessments: A General Approach. Retrieved from www.cse.ucla.edu/products/states_schools/ValidationELA _FINA L.pdf 84 Sources Nichols, Paul, Meyers, J. & Burling, K (2009). A Framework for Evaluating and Planning Assessments Intended to Improve Student Achievement. Educational Measurement: Issues and Practice, 28.3, 14-23. Perie, Marianne, Marion, S., & Gong, B. (2009). Moving Toward a Comprehensive Assessment System: A Framework for Considering Interim Assessments. Educational Measurement: Issues and Practice, 28.3, 5-13. 85 David Abrams dabrams5@nycap.rr.com 86