Using Emerging Technologies to Improve Student Achievement: The Potential of Virtual Performance Assessments Chris Dede Harvard University Chris_Dede@harvard.edu www.gse.harvard.edu/~dedech/ 1 River City Introduction Flawed Assessments Undercut Student (and Teacher) Achievement “Drive-by” high stakes tests frighten many students into suboptimal performance, which cumulatively leads to disengagement, low self-efficacy, and alienation Students are rightly wary of investing in knowledge that tacitly is not valued because it is not measured or rewarded. Teachers are forced to emphasize test performance rather than domain mastery Current Summative Tests Undercut Achievement and Motivation Paper-and-pencil item-based tests are inexpensive, reliable, and practical – but not valid for higher order thinking skills, such as scientific inquiry, or 21st century skills, such as mediated collaboration. Physical performance assessments are more valid for sophisticated skills, but unreliable, impractical, expensive, and limited in types and number of tasks possible The Assessment Triangle Cognition model of how students represent knowledge & develop competence in the domain Observation Observations tasks or situations that allow one to observe students’ performance Cognition Interpretation methods for making sense of the data Interpretation Reasoning from Evidence Mediated Performances are an Untapped Resource Cognition is distributed across human minds, tools/media, groups of people, and space/ time; dispersed physically, socially, and symbolically Event-logs of performances and communications provide insights Distributed learning: collaborative, mediated, scaffolded, and data-generating Types of Rich Datastreams Multi-User Virtual Environments: Immersion in virtual contexts with digital artifacts and avatar-based identities Wikis and other forms of Web 2.0 media Asynchronous Discussions Intelligent Tutoring Systems Games Augmented Realities What is a MUVE? An “Alice in Wonderland” experience where users enter a virtual space that has been configured for learning Learners represent themselves through graphical avatars to communicate with others’ avatars and computer-based agents, as well as to interact with digital artifacts and virtual contexts River City Figure 1: Lab Equipment inside the University Figure 2: River Water Sampling http://muve.gse.harvard.edu/rivercityproject Evidence of Student Work Assessment data: Pre-post content Pre-post affective Embedded assessments (formative) Performance assessment (summative) Contextual Data: Attendance records Demographic data School data Observations Interviews Active Data: Team chat Notebook entries Tracking of in-world activities: Data gathering strategies Pathways Inquiry processes Event Logs as Observational Data Indicates with Timestamps Where students went With whom they communicated and what they said What artifacts they activated What databases they viewed What data they gathered using virtual scientific instruments What screenshots and notations they placed in teambased virtual notebooks unobtrusive observational data Student’s Role in the River City MUVE Travel back in time 6 times between 1878-79 Bring 21st century skills and technology to address 19th century problems Help town understand and solve part of the puzzle of why so many residents are becoming ill Work as a research team Keep track of clues that hint at causes of illnesses Form and test hypotheses in a controlled experiment Make recommendations based on experimental data Capturing Data on Change over Time Visit 1 Fall, 1878 Visit 2 Winter, 1879 Visit 3 Spring, 1879 Visit 4 Summer, 1879 Students visit the same places and see how things change over time. They spend an entire class period in an individual season, gathering data. “Evidence Gathering” An important, generic inquiry process amount (how much evidence per time spent) range (coverage/balance among all the types of evidence) saliency (importance of the evidence in understanding causality in the situation) clustering (grouping of evidence based on its causal affiliation) “Evidence Gathering” Foundational for other inquiry processes hypothesis formation, experimental design, and argumentation Related to student attributes self-efficacy, metacognition, engagement, and content knowledge Virtual Performance Assessments Funded by Institute of Educational Sciences Three year grant Design three virtual performance assessments to assess middle grade (6th and 7th) students' science inquiry learning in a standardized testing setting http://virtualassessment.org NSES Model of Inquiry Identify questions that can be answered through scientific investigation (not independent of knowledge) Design and conduct a scientific investigation Use appropriate tools and techniques to gather, analyze, and interpret data Develop prescriptions, explanations, predictions, and models using evidence Think critically and logically to make the relationships between evidence and explanations Recognize and analyze alternative explanations and predictions Communicate scientific procedures and explanations Use mathematics in all aspects of scientific inquiry Authentic Environments A Challenge on which Every Student has Roughly Equal Familiarity Assessment Platform 3-D Immersive Environment for Science Experimentation Based on Authentic Setting Highly Secure, Cross Platform Application Built in the Unity Framework Realistic Complex Causal Model For Science Experimentation Back End Architecture Real-Time Analysis of Student Paths All Interactions are Logged for Future Research Ensure Data Integrity by Encrypting Data Along the Way Complex Student Work Product is Recorded as XML, which can be tokenized EcoMUVE (www.ecomuve.org) Formative/Diagnostic Formative, diagnostic assessment provides more leverage for improvement than summative measures Formative, diagnostic assessment is richer and more accurate than summative measures Potentially, formative, diagnostic assessment could substitute for summative measures. Module 1: Pond Ecosystem Modeled after Black’s Nook Pond in Cambridge, MA “Submarine” Tool Instruction and Assessment based on Learning Trajectories Table 1: Forces as Interactions facet cluster (Krauss & Minstrell, 2002) 00 All forces are the result of interactions between two objects. Each object in the pair interacts with the other object in the pair. Each influences the other. 01 All interactions involve equal magnitude and oppositely directed action and reaction forces that are on the two separate interacting bodies. 40 Equal force pairs are identified as action and reaction but are on the same object. For the example of a book at rest on a table, the gravitational force on the book and the force by the table on the book are identified as an action-reaction pair. 50 Effects (such as damage or resulting motion) dictate relative magnitudes of forces during interaction. 51 At rest, therefore interaction forces balance. 52 "Moves", therefore interacting forces unbalanced. 53 Objects accelerate, therefore interacting forces unbalanced. 60 Force pairs are not identified as having equal magnitude because the objects are somehow different. 61 The “stronger” object exerts a greater force. 62 The moving object or the one moving faster exerts a greater force. 63 More active/energetic exerts more force. 64 Bigger/heavier exerts more force. 90 Inanimate objects cannot exert a force. Types of Rich Datastreams Multi-User Virtual Environments: Immersion in virtual contexts with digital artifacts and avatar-based identities Wikis and other forms of Web 2.0 media Asynchronous Discussions Intelligent Tutoring Systems Games Augmented Realities Related Initiatives Cisco-Intel-Microsoft global initiative on assessing 21st century skills Advances in European measures, such as PISA Evolution of US tests, such as NAEP Numerous other scholars working on games and simulations for learning and assessment A Breakthrough in the Next Few YearsBut Don’t Wait! “Disruptive” Assessment Rewarding Achievement Useful in Real World Students see academic learning as relevant Quality is measured in sophisticated ways along multiple dimensions Rote teaching and learning are exposed as tragically inadequate Learning and formative assessment are richly interwoven in engaging ways Call for New Measures of Inquiry Paper-and-pencil tests, such as the National Assessment of Educational Progress (NAEP), Third International Math and Science Study (TIMSS), and New Standards Science Reference Exams (NSSRE), don’t measuring inquiry well and aren’t aligned with the NSES standards NAEP published their framework for establishing a new science assessment in 2009 that calls for multiple modes of assessment, including interactive computer assessments “Immersive” Interfaces for Learning Virtual Reality Full sensory immersion via head-mounted displays or CAVES Multi-User Virtual Environments Immersion in virtual contexts with digital artifacts and avatar-based identities Ubiquitous Computing Wearable wireless devices coupled to smart objects for “augmented reality” Affordances of Immersive Interfaces The types of behaviors immersive interfaces can enable Complex situations with tacit clues Simulated scientific instruments Virtual experimentation Simulated collaboration in a team Adaptive responses to student choices Documented in Event-logs and Chat-logs Traditional Evaluation of Quality Inferential methods: On average, students in the River City treatment scored .2 points higher on the post self-efficacy in general science inquiry section of the affective measure (t=2.22, p<.05). On average, students in this sample who saw higher gains in self efficacy in general science inquiry scored higher on the post test. These gains were higher for students in the River City project (n=358). Yet these results tell us nothing about patterns, behaviors, and processes that lead to inquiry. We are also limited by # of variables we can build into our inferential models. Goals of IES VPA Project Proof of Concept for Immersive Virtual Performance Assessments (IVPAs) that Measure Sophisticated Intellectual/Social Skills Establish higher validity than physical performance assessments (PPAs) Establish higher reliability and usability than PPAs, as well as lower cost No challenges of physical materials Virtual worlds enable performances impossible in classrooms Detailed tracking of participant behaviors Respectable psychometrics compared to paper-and-pencil item-based tests Establish that student engagement leads to every participant working hard to succeed The importance of shifts in identity Research Questions Can we construct a virtual assessment that measures scientific inquiry, as defined by the National Science Education Standards (NSES)? What is the evidence that our assessments are designed to test NSES inquiry abilities? Are these assessments reliable? Research Methods Alignment studies Cognitive analysis studies (think-alouds with students) Generalizability study across three instances of the same assessment Assessment Framework Evidence Centered Design I. Domain Analysis II. Domain Modeling III. Conceptual Assessment Framework IV. Assessment Implementation V. Assessment Delivery VI. Refinement Design Process is Not Linear Domain Analysis Domain Modeling Assessment Implementation Conceptual Assessment Framework Domain Analysis We analyzed different models for science inquiry: NSES Standards (National Research Council, 1996) Inquiry Cycle (White & Frederiksen, 1998) Novice-expert models (Chi, Feltovich, & Glaser, 1981) Scientific Discovery as Dual Search (SDDS) (Klahr, 2000) Epistemological & Strategic (Kuhn & Pease, 2008) NAEP Framework (NAEP, 2008) Inquiry Models “The whole of science is nothing more than a refinement of everyday thinking.” -- Einstein, 1936 (quoted in Klahr, 2000) Inquiry is the way we think. Some people do it better. Experts are doing something cognitively different in their head. Enhanced Assessment Platform Use Performance Palettes to Collect Student Work Minimize the Prediction of Language Art Skills via use of Audio Instruction and Visual Cues Enable Realistic Use of Tools Anywhere in the World Map of the Context Can vary the casual model, so the assessment can differ from one student or class to another – as long as each model has an equivalent amount of evidence collectable with equivalent time and effort Back End Architecture