SECOND SCIENCE ASSESSMENT WEBINAR Performance Assessments in Science at the State Level October 2012 WEBINAR AGENDA Introduction WestEd and 3 states will give a 10 minute overview of what they are doing in performance assessments at the state level in science: WestEd (Edys S. Quellmalz and Matt Silberglitt) Ohio (Lauren V. Monowar-Jones) Vermont (Gail Hall) Connecticut (Liz Buttner) Open discussion WESTED PERFORMANCE ASSESSMENTS FOR SCIENCE LEARNING Edys S. Quellmalz and Matt Silberglitt WestEd Presented to the Council of State Science Supervisors October 24, 2012 GOALS Needs for performance assessment Limitations of current science assessments Advantages of technology for science assessment Types of performance assessment Design principles for the assessment of iSTEM learning outcomes Promising innovative approaches Needed research and development PERFORMANCE ASSESSMENT FEATURES Assessment targets specified are difficult or impossible to measure well with conventional item formats Students construct responses, solutions, or products Tasks represent significant, recurring, realistic problems Criteria for evaluating performances are specified and communicated to examinees Performances represent both science and engineering practices in progress as well as culminating solutions or products LIMITATIONS OF CURRENT ASSESSMENTS Emphasis on disconnected, declarative knowledge • Neglect of integrated, knowledge structures in fundamental science systems • Emphasis on procedural algorithms and skills • Neglect of strategic inquiry practices in authentic problems • RELEVANCE TO CURRENT ASSESSMENT PROGRAMS Need innovative, technology-enhanced assessments that align with new frameworks that focus on fewer, deeper, more integrated core knowledge targets E.g., the new Framework for Science Education and next generation national science standards Models as structures for understanding and studying science systems (model-based learning) Science practices for using knowledge and inquiry in significant, recurring authentic tasks RELEVANCE TO CURRENT ASSESSMENT PROGRAMS Need innovative, technology-enhanced assessments that Target 21st century skills within STEM Use technology to engage students in use of “tools of the trade” Provide evidence supporting technology-enhanced performance assessment For summative and formative purposes Called for in the assessment consortia ADVANTAGES OF TECHNOLOGY FOR SCIENCE ASSESSMENT Present authentic, rich, dynamic environments Support access to collections of information, expertise Present phenomena difficult or impossible to observe and manipulate in classrooms Represent temporal, causal, dynamic relationships “in action” Allow multiple representations of stimuli and their simultaneous interactions (e.g., data generated during a process) Allow overlays of representations, symbols Allow student manipulations/investigations, multiple trials Allow student control of pacing, replay, reiterate Capture student responses during research, design, problem solving Allow use of simulations of a range of tools (internet, productivity, domain-based) ADVANTAGES OF TECHNOLOGY FOR SCIENCE ASSESSMENT On-line search for assessments aligned with standards Digital collections of assessments Access to innovative technology-based prototypes and collections Tools to support on-line assessment delivery and scoring On-line guidelines for scoring by teachers and students Online guidelines for interpretation of scores and implications for instruction On-line professional development on assessment literacy 11 COGNITIVELY-PRINCIPLED ASSESSMENT DESIGN Learning science research (E.g., How People Learn Measurement theory and research on measuring learning (E.g. Knowing What Students Know) Assessment argument linking claims of learning, to evidence of learning, to tasks eliciting the evidence EVIDENCE-CENTERED DESIGN Student Model What complex of knowledge, skills, or other attributes should be assessed? Evidence Model Task Model What behaviors or performances should reveal the relevant knowledge and skills described in the student model? What tasks or situations should elicit the behaviors or performances described in the evidence model? Messick, 1993 Mislevy, Almond, & Lucas, 2004 SCIENCE CONSTRUCTS (STUDENT MODEL-ASSESSMENT TARGETS) From national frameworks and standards for science, mathematics, engineering, technology Cross-cutting concepts-E.g., Systems and System Models (Next Generation Science Standards) Cross-cutting practices-E.g., problem solving, communication, collaboration (NAEP Framework for Technology and Engineering Literacy) TASK MODELS Integrated applications to natural and designed world Applied, significant, recurring problems in the natural and designed world Scenario-based tasks building across a problem solving/inquiry/design sequence EVIDENCE MODEL What evidence is collected Explicit responses Logged processes for technology delivered How the evidence is evaluated and summarized Scoring Rubrics How the evidence is reported for intended purposes and users LIMITATIONS OF PERFORMANCE ASSESSMENTS Less available information/documentation of measures-descriptive or technical quality Lack of attention to alignment of outcome measures to science standards Few descriptions of coverage/balance Outcome measures tend to emphasize content, declarative knowledge Little attention to application of practices LIMITATIONS OF SCIENCE ASSESSMENTS Practices –not measured well by static, conventional formats Few measures during (processes, formative) vs. at end (summative) Little measurement of collaboration and communication Lack of deliberate design to measure for transfer of cross-cutting concepts and practices Little attention to establishing/documenting technical quality CHALLENGES FOR DESIGNING ISTEM ASSESSMENTS Specification of desired learning outcomes Need for focus, coherence of knowledge and processes and whether situated in a domain and/or in integrated problems Coverage-What is the balance of assessment targets? What is the balance and coherence of classroom curriculum-embedded for formative purposes and district and state summative tests? CHALLENGES FOR DESIGNING SCIENCE ASSESSMENTS Need to tailor assessment design to assessment purpose-intended use of the data Formative/summative Embedded to monitor (use of feedback and coaching) and adjust vs. Culminating to report proficiency status Duration, scope, time More extended, spread over multiple classes/periods Embedded vs. external Documentation of measures Descriptions, technical quality (validity of interpretation, reliability) PROMISING NEW ASSESSMENT DESIGNS ENABLED BY TECHNOLOGY Alignments Access to resources and expertise Network with collaborators, experts Collections Delivery Entry of rubrics, ratings, work in progress, final artifacts Scoring-auto and online training and scoring, moderated rating sessions Reporting-customized to users PROMISING NEW ASSESSMENT DESIGNS FOR SCIENCE USING TECHNOLOGY TO SUPPORT HANDS-ON PROJECTS AND ASSESSMENTS Blended model of equipment and technology Entry of rubric ratings, calibrated training sessions Annotated postings of designs, prototypes, tests Embedded tasks to test knowledge and skills during projects Electronic science notebooks Electronic portfolios Juried exhibitions posted, streamed, archived Promising New Assessment Designs for Interactive Task Design Features Dynamic presentations of spatial, causal, temporal phenomena in a system Multiple overlapping representations Interactivity Supports iterative, active inquiry and design Multiple response formats Reduce reliance on text Rapid, customized interaction, feedback, reporting RESEARCH ON LEARNING IN SCIENCE SIMULATIONS Facilitate formation of organized mental models of system components, interactions, and emergent behaviors Facilitate transfer Facilitate use of systematic problem solving & inquiry Situate in authentic, significant, recurring problems in the natural and designed world Highly engaging NAEP 2014 FRAMEWORK AND SPECIFICATIONS FOR TECHNOLOGY AND ENGINEERING LITERACY SimScientists: Force and Motion-Fire Rescue PISA: Reactor http://www.nagb.org/publications/frameworks /tech2014-framework/ch_toc/index.html SIMSCIENTISTS TEST EFFECTS OF POLLUTION ON CELLS SIMSCIENTISTS TEST EFFECTS OF CALORIES ON ACTIVITY LEVEL RESEARCH NEEDS Analysis of extant assessments-large scale and classroom, formative and summative Analyses of performance assessment opportunities Review of promising exemplars Formulation and testing of different purposes, designs, and evidence collection strategies Pilot studies of performance assessment design models for established and new genre of technologyenhanced learning environments Documenting technical quality with alternative psychometric methods Contact Information equellm@wested.org msilberg@wested.org http://simscientists.org OHIO Lauren V. Monowar-Jones, PhD Project Coordinator Ohio Performance Assessment Pilot Project Ohio Department of Education Office of Assessment Lauren.Monowar-Jones@education.ohio.gov 1 A Look Into the Future of Ohio’s Science Assessments 10/24/2012 THE OHIO PERFORMANCE ASSESSMENT PILOT PROJECT ALWAYS DO WHAT YOU ARE AFRAID TO DO. 10/24/2012 THE TASK DYAD LEARNING SYSTEM Learning Task Curriculum embedded Assessment Task OHIO’S TASK DYAD LEARNING SYSTEM THE DYAD SYSTEM OHIO’S NEXT GENERATION ASSESSMENTS PARCC-Developed Assessments English language arts - Grades 3 – 8 and high school Mathematics - Grades 3 – 8 and high school Operational school year 2014-15 State-Developed Assessments Science - Grades 5, 8 and high school Social Studies - Grades 4, 6 and high school Operational school year 2014-15 A SNEAK PEEK PILOT: TEACHERS’ ROLES Coaches for students. Scorers. Developers. Reviewers. OPAPP PARTICIPANTS TEACHERS Cohort 1: Sep 2008- May 2012 15 LEAs HS: ELA, Math, Science Cohort 2: Sep 2011- Dec 2013 7 LEAs HS: ELA, Math, Science, SS, Career Tech Cohort 3: Jan 2012 June 2014 • 6 LEAs • ES: ELA, Math, Science, SS • Cohort 4: Nov 2012 – December 2013 • 15 LEAs • HS: ELA, Math, Science, Social Studies Career Tech • Cohort 5: July 2013 – May 2014 • Recruiting in March • ES: ELA, Math, Science, SS OPAPP PARTICIPANTS COACHES Cohort 1: Cohort 2: Grade 3: 3 coaches, Grade 4: 3 coaches, Grade 5: 2 coaches Cohort 4: 2 ELA, 3 Math, 2 Science, 2 SS Cohort 3: 2 ELA, 3 Math, 2 Science 4-5 online coaches Cohort 5: 4-5 online coaches OPAPP PARTICIPANTS HIGHER ED Cohort 1: 3 Cohort 2: up to 20 Cohort 3: up to 15 Cohort 4: none* Cohort 5: none* Purpose of HE involvement is To influence HE teaching To influence teacher preparation To provide content expertise LESSONS LEARNED Task Writing: It is hard to write to a non-native delivery system. It is hard for assessment contractors to learn to write good curriculum. It is hard to develop good rubrics for Learning Tasks. It is hard to align Learning and Assessment Tasks well. Online Delivery System: Schools are not always “teched up” enough for this model. School firewalls can be problematic for learning tasks. Internal internet access may be more of a problem than previously thought. LESSONS LEARNED Teachers: Not all teachers are ready to use technology in their classrooms or labs. Professional Development needs to be low impact on time and high impact on practice. Scoring/Reporting: Need method for identifying student work for re-score (that does not put the state in the position of qualifying teachers to score). Need more data and information about how to present results to teachers so they make sense (both to psychometricians and to teachers). LESSONS LEARNED Teachers: Not all teachers are ready to use technology in their classrooms or labs. Professional Development needs to be low impact on time and high impact on practice. Scoring/Reporting: Need method for identifying student work for re-score (that does not put the state in the position of qualifying teachers to score). Need more data and information about how to present results to teachers so they make sense (both to psychometricians and to teachers). LESSONS LEARNED Teachers: Not all teachers are ready to use technology in their classrooms or labs. Professional Development needs to be low impact on time and high impact on practice. Scoring/Reporting: Need method for identifying student work for re-score (that does not put the state in the position of qualifying teachers to score). Need more data and information about how to present results to teachers so they make sense (both to psychometricians and to teachers). VERMONT VERMONT STATE SCIENCE ASSESSMENT An Overview Gail Hall and Kathy Renfrew Science Assessment Coordinators VERMONT’S JOURNEY A Winding Trail… State Performance Assessments—since 2000 Vermont PASS Assessment Partnership for Assessment of Standards-based Science NECAP Assessment New England Common Assessment Program Collaboration with RI and NH THE DETAILS.. Spring Assessment—Grades 4, 8, 11 Content Domains Life Science… 24% Physical Science… 24% Earth/Space Science… 24% Inquiry Task… 28% Data from Inquiry Performance Task investigations are collected by student partners. Scored items are answered individually. Spring in Vermont NECAP SCIENCE TEST DESIGN Session 3: Grades 4 & 8 Estimated time needed: 75 minutes (Schedule 120 minutes) 7 or 8 Inquiry Task Questions 2-point Short Answer & 3-point Constructed Response NECAP SCIENCE TEST DESIGN Session 3: Grade 11 Estimated time needed: 45-60 minutes (Schedule 60 minutes) The High School Task is always a Data Analysis Task. 7 or 8 Inquiry Task Questions based on a variety of data sets. 2-point Short Answer & 3point Constructed Response 2008 GRADE 8 INQUIRY PERFORMANCE TASK The Scenario: 2008 GRADE 8 INQUIRY PERFORMANCE TASK The Set-up: 2008 GRADE 8 INQUIRY PERFORMANCE TASK Prediction: Posed jointly by working partners Using Ethan’s experience and your understanding of force and the motion of objects, predict how the mass of a parked car will affect the distance the parked car moves when hit. Explain your answer. Write your prediction and explanation in the box below. Using Ethan’s experience and your understanding of force and the motion of objects, predict how the slope of a hill will affect the distance moved by a car that gets hit. Explain your answer. Write your prediction and explanation in the box below. 2008 GRADE 8 INQUIRY PERFORMANCE TASK Next Steps: Materials for the Investigation Investigation Directions Provided Varying Slope –using block of wood Varying Mass 1-3 washers in cup 2008 GRADE 8 INQUIRY PERFORMANCE TASK Collect data Measure distance cup(with washers) moves 2008 GRADE 8 INQUIRY PERFORMANCE TASK Score able Task Items Construct a graph Explanation—effect of mass on movement of parked car— with evidence Explanation—effect of slope on movement of parked car— with evidence How well do data support prediction? Predict movement in different situation --flat, dry surface. Identify and explain variables. Design a new investigation. LESSONS LEARNED… • Measure of Critical Thinking • Student Progress • Challenging to Construct • Time • Outstanding PD Opportunity • Collaboration ADDITIONAL VERMONT INQUIRY PERFORMANCE TASKS http://education.vermont.gov/new/html/pgm_assessment/necap/r esources/released_items.html Grade 4 Grade 8 (All are Performance Tasks) • Playground Trash • Magnetism • Birds, Beaks & Survival • Natural Selection • Sled Pull • Force & Motion • Sand Movers • Erosion • Soil and Water • Conductors and Insulators Grade 11 (All are Data Analysis) • Rainy Morning (PT) • Colliding Plates (PT) • Plate Tectonics • Pond weeds (Data Analysis) • Aquatic Ecology • Mass and Matter (PT) • Conservation of Mass • Fox and Rabbits (PT) • Predator/Prey • Ocean Currents (Data Analysis) • Acid Lakes • Driver’s Education • Force & Motion • Location • Earthquakes • Cod on Georges Bank • Human Impact on Ecosystems • Antifreeze • Properties of Matter • Mercury in Fish Guidelines for the Development of Science Inquiry Tasks http://education.vermont.gov/new/pdfdoc/pgm_assessment/necap/other_resourc es/science/guidelines_inquiry_tasks_021508.pdf Thank you for attending. For further information: Gail Hall—Middle and High School Science Assessment Coordinator, Vermont Department of Education gail.hall@state.vt.us Kathy Renfrew—Elementary Science Assessment Coordinator, Vermont Department of Education kathy.renfrew@state.vt.us CONNECTICUT CONNECTICUT CURRICULUMEMBEDDED SCIENCE PERFORMANCE TASKS: Teaching Tools Linked to State Assessments FAST FACTS Extended, open-ended investigations related to a concept within a learning standard (5-7 class sessions). Models of activities that engage students in using inquiry standards to learn content in state standards; Teachers decide when (and how) to EMBED the task within a standard-based learning unit. Teacher Manuals – pacing guide, materials list, pedagogy notes, safety, resources Descriptive inquiry feedback rubrics (Gr. 3-8 only) One task for each grade (Gr. 3-8) Five high school tasks (one for each strand of stds) SUPPLIES In 2005, SDE provided “durable” equipment kits to all elementary and middle schools ($250,000): Graduated cylinders, hand lenses, droppers, stethoscopes, wires, bulb holders, soil sieves, etc. Districts are expected to provide consumable materials and replace equipment Materials kits for Gr. 3-8 embedded tasks are available for purchase through SK Boreal Labs. ANATOMY OF A GR. 3-8 CURRICULUMEMBEDDED PERFORMANCE TASK LEARNING CYCLE “Mini” instructional units, each including: Experiment 1 (guided inquiry) Formative feedback to student (state rubric) Research Through Reading and Writing Experiment 2 (independent inquiry) Summative writing assignment Summative feedback to student (state rubric) ANATOMY OF A HIGH SCHOOL CURRICULUM-EMBEDDED PERFORMANCE TASK Five embedded tasks Each includes a laboratory activity and a Science, Technology and Society (STS) research investigation. BASIC INFORMATION Curriculum-embedded Performance Tasks are suggested models for instruction and not mandated exercises. Instructional materials for use within the classroom during the course of the normal instructional day and within the appropriate instructional context (teacher determined). Questions assessing INQUIRY skills on state tests reference embedded task “scenarios” Elementary test: 6 out of 18 SR questions are related to tasks for Gr. 3,4 and 5 Middle grades test: All 3 CR questions are task-related (6 points out of 21 inquiry points are related to tasks for Gr. 6, 7, and 8) High school test: All 5 CR questions are task-related (15 out of MORE BASIC INFO Science Content Areas Addressed: Gr. 3 – properties of matter (absorbency) Gr. 4 – electric circuits, conductors/insulators Gr. 5 – central nervous system; reaction time Gr. 6 – soil porosity and permeability Gr. 7 – cardiovascular system; pulse rate Gr. 8 – friction High school tasks… EVEN MORE BASIC INFO Strand I: Energy Transformation -Solar Cooker, Laboratory Investigation -Connecticut Energy Use, STS Activity Strand II: Chemical Structures and Properties -Synthetic Polymers, Laboratory Investigation -Plastics Controversy, STS Activity Strand III: Global Interdependence -Acid Rain, Laboratory Investigation -Connecticut Brownfield Sites, STS Activity Strand IV: Cell Chemistry and Biotechnology -Enzyme, Laboratory Activity -Labeling Genetically Altered Foods, STS Activity Strand V: Genetics, Evolution and Biodiversity -Yeast Population Dynamics, Laboratory Investigation -Human Population Dynamics, STS Activity FINAL BASIC INFO Assess knowledge and inquiry skills STS research depends on doing internet data searches and working with Excel spreadsheets Student work on embedded tasks is assessed by teachers using state-developed rubrics for inquiry and for lab reports. LINKS TO EMBEDDED TASKS ELEMENTARY AND MIDDLE GRADES TASKS – http://www.sde.ct.gov/sde/cwp/view.asp?a=2618&q=32 0890 HIGH SCHOOL TASKS http://www.sde.ct.gov/sde/cwp/view.asp?a=2618&q=32 0892 STATE ASSESSMENT SAMPLE ITEMELEMENTARY Some students did an experiment to find out which type of paper holds the most water. They followed these steps: 1. Fill a container with 25 milliliters of water. 2. Dip pieces of paper towel into the water until all the water is absorbed. 3. Count how many pieces of paper towel were used to absorb all the water. 4. Repeat with tissues and napkins. If another group of students wanted to repeat this experiment, which information would be most important for them to know? (a) The size of the water container (b) The size of the paper pieces * (c) When the experiment was done (d) How many students were in the group STATE ASSESSMENT SAMPLE ITEM – HIGH SCHOOL STATE ASSESSES INQUIRY ABILITIES REFERENCING TASK SCENARIOS NOTE THAT THE CMT QUESTIONS DO NOT ASSESS A CORRECT “OUTCOME” OF A PERFORMANCE TASK OR STUDENTS’ RECOLLECTION OF THE DETAILS OF THE PERFORMANCE TASK. Students who have had numerous opportunities to make observations, design experiments, collect data and form evidence-based conclusions are likely to be able to answer the task-related CMT questions correctly, even if they have not done the state-developed performance tasks. However, familiarity with the context referred to in the test question may make it easier for students to answer the question correctly. INQUIRY STANDARDS - ELEMENTARY Embedded Tasks engage students in using all the inquiry skills defined in the 2004 CT Science Framework. The embedded tasks for Grades 3-5 feature the following Expected Performances for Scientific Inquiry, Literacy and Numeracy: 1. Make observations and ask questions about objects, organisms and the environment. 2. Seek relevant information in books, magazines and electronic media. 3. Design and conduct simple investigations. 4. Employ simple equipment and measuring tools to gather data and extend the senses. 5. Use data to construct reasonable explanations. 6. Analyze, critique and communicate investigations using words, graphs and drawings. 7. Read and write a variety of science-related fiction and nonfiction texts. 8. Search the Web and locate relevant science information. 9. Use measurement tools and standard units (e.g., centimeters, meters, grams, kilograms) to describe objects and materials. 10. Use mathematics to analyze, interpret and present data. INQUIRY STANDARDS – MIDDLE SCHOOL The embedded tasks for Grades 6-8 feature the following Expected Performances for Scientific Inquiry, Literacy and Numeracy: 1. Identify questions that can be answered through scientific investigation. 2. Read, interpret and examine the credibility of scientific claims in different sources of information. 3. Design and conduct appropriate types of scientific investigations to answer different questions. 4. Identify independent and dependent variables, and those variables that are kept constant, when designing an experiment. 5. Use appropriate tools and techniques to make observations and gather data. 6. Use mathematical operations to analyze and interpret data. 7. Identify and present relationships between variables in appropriate graphs. 8. Draw conclusions and identify sources of error. 9. Provide explanations to investigated problems or questions. 10. Communicate about science in different formats, using relevant science vocabulary, supporting evidence and clear logic. TECHNICALITIES The performance tasks are instructional materials modifiable by teachers to accommodate student needs and interests; not rated for reliability or validity. Inquiry Feedback Rubrics were taken through an inter-rater reliability study led by TERC (2009). No data collected on student work; data comes from item stats on CMT/CAPT task-related questions. LESSONS LEARNED Teachers appreciate the flexibility, but often don’t use the tasks effectively (short-cuts) Teachers don’t use the tasks when they’re teaching the content (connection between inquiry and content is missed) On-going PD is needed PD providers need to be trained District administrators don’t reinforce the important learning value of the tasks. District administrators don’t purchase supplies for the tasks. ELIZABETH BUTTNER SCIENCE CONSULTANT CT STATE DEPARTMENT OF EDUCATION 860-713-6849 ELIZABETH.BUTTNER@CT.GOV