Virtual Performance Assessments

advertisement
Using Emerging Technologies to
Improve Student Achievement:
The Potential of Virtual
Performance Assessments
Chris Dede
Harvard University
Chris_Dede@harvard.edu
www.gse.harvard.edu/~dedech/
1
River City Introduction
Flawed Assessments Undercut
Student (and Teacher) Achievement



“Drive-by” high stakes tests frighten many
students into suboptimal performance,
which cumulatively leads to disengagement,
low self-efficacy, and alienation
Students are rightly wary of investing
in knowledge that tacitly is not valued
because it is not measured or rewarded.
Teachers are forced to emphasize
test performance rather than domain mastery
Current Summative Tests Undercut
Achievement and Motivation


Paper-and-pencil item-based tests are
inexpensive, reliable, and practical – but not
valid for higher order thinking skills, such as
scientific inquiry, or 21st century skills, such as
mediated collaboration.
Physical performance assessments are more
valid for sophisticated skills, but unreliable,
impractical, expensive, and limited in types
and number of tasks possible
The Assessment Triangle

Cognition


model of how students
represent knowledge &
develop competence in
the domain
Observation
Observations

tasks or situations that
allow one to observe
students’ performance

Cognition
Interpretation

methods for making
sense of the data
Interpretation
Reasoning from Evidence
Mediated Performances
are an Untapped Resource
Cognition is distributed across human
minds, tools/media, groups of people, and
space/ time; dispersed physically, socially,
and symbolically
Event-logs of performances and
communications provide insights
Distributed learning: collaborative,
mediated, scaffolded, and data-generating
Types of Rich Datastreams






Multi-User Virtual Environments:
Immersion in virtual contexts with
digital artifacts and avatar-based identities
Wikis and other forms of Web 2.0 media
Asynchronous Discussions
Intelligent Tutoring Systems
Games
Augmented Realities
What is a MUVE?
 An “Alice in Wonderland”
experience where users enter
a virtual space that has been
configured for learning
 Learners represent themselves
through graphical avatars to
communicate with others’
avatars and computer-based
agents, as well as to interact
with digital artifacts and
virtual contexts
River City
Figure 1: Lab Equipment
inside the University
Figure 2: River Water Sampling
http://muve.gse.harvard.edu/rivercityproject
Evidence of Student Work


Assessment data:
 Pre-post content
 Pre-post affective
 Embedded assessments
(formative)
 Performance assessment
(summative)
Contextual Data:
 Attendance records
 Demographic data
 School data
 Observations
 Interviews

Active Data:
 Team chat
 Notebook entries
 Tracking of in-world
activities:



Data gathering strategies
Pathways
Inquiry processes
Event Logs as
Observational Data
Indicates with Timestamps






Where students went
With whom they communicated
and what they said
What artifacts they activated
What databases they viewed
What data they gathered
using virtual scientific instruments
What screenshots and notations they placed in teambased virtual notebooks
unobtrusive observational data
Student’s Role in the
River City MUVE



Travel back in time 6 times between 1878-79
Bring 21st century skills and technology
to address 19th century problems
Help town understand and solve part of
the puzzle of why so many residents
are becoming ill




Work as a research team
Keep track of clues that hint at causes of illnesses
Form and test hypotheses in a controlled experiment
Make recommendations based on experimental data
Capturing Data on
Change over Time
Visit 1
Fall, 1878
Visit 2
Winter, 1879
Visit 3
Spring, 1879
Visit 4
Summer, 1879
Students visit the same places and see how things change
over time. They spend an entire class period in an individual
season, gathering data.
“Evidence Gathering”

An important, generic inquiry process




amount (how much evidence per time spent)
range (coverage/balance among all the types of
evidence)
saliency (importance of the evidence in
understanding causality in the situation)
clustering (grouping of evidence based on its
causal affiliation)
“Evidence Gathering”

Foundational for other inquiry processes


hypothesis formation, experimental design,
and argumentation
Related to student attributes

self-efficacy, metacognition, engagement,
and content knowledge
Virtual Performance
Assessments




Funded by Institute of Educational
Sciences
Three year grant
Design three virtual performance
assessments to assess middle grade
(6th and 7th) students' science inquiry
learning in a standardized testing setting
http://virtualassessment.org
NSES Model of Inquiry








Identify questions that can be answered through scientific
investigation (not independent of knowledge)
Design and conduct a scientific investigation
Use appropriate tools and techniques to gather, analyze,
and interpret data
Develop prescriptions, explanations, predictions, and
models using evidence
Think critically and logically to make the relationships
between evidence and explanations
Recognize and analyze alternative explanations and
predictions
Communicate scientific procedures and explanations
Use mathematics in all aspects of scientific inquiry
Authentic Environments
A Challenge on which Every Student has Roughly Equal Familiarity
Assessment Platform
 3-D Immersive Environment for Science Experimentation
Based on Authentic Setting
 Highly Secure, Cross Platform Application Built
in the Unity Framework
 Realistic Complex Causal Model For Science Experimentation
Back End Architecture
 Real-Time Analysis of Student Paths
 All Interactions are Logged for Future Research
 Ensure Data Integrity by Encrypting Data Along the Way

Complex Student Work Product is Recorded as XML, which can
be tokenized
EcoMUVE (www.ecomuve.org)
Formative/Diagnostic



Formative, diagnostic assessment provides
more leverage for improvement than
summative measures
Formative, diagnostic assessment is richer
and more accurate than summative measures
Potentially, formative, diagnostic assessment
could substitute for summative measures.
Module 1: Pond Ecosystem
Modeled after Black’s Nook Pond in Cambridge, MA
“Submarine” Tool
Instruction and Assessment
based on Learning Trajectories
Table 1: Forces as Interactions facet cluster (Krauss & Minstrell, 2002)






00 All forces are the result of interactions between two objects. Each object in the pair
interacts with the other object in the pair. Each influences the other.
01 All interactions involve equal magnitude and oppositely directed action and reaction forces
that are on the two separate interacting bodies.
40 Equal force pairs are identified as action and reaction but are on the same object. For the
example of a book at rest on a table, the gravitational force on the book and the force by the
table on the book are identified as an action-reaction pair.
50 Effects (such as damage or resulting motion) dictate relative magnitudes of forces during
interaction.

51 At rest, therefore interaction forces balance.

52 "Moves", therefore interacting forces unbalanced.

53 Objects accelerate, therefore interacting forces unbalanced.
60 Force pairs are not identified as having equal magnitude because the objects are
somehow different.

61 The “stronger” object exerts a greater force.

62 The moving object or the one moving faster exerts a greater force.

63 More active/energetic exerts more force.

64 Bigger/heavier exerts more force.
90 Inanimate objects cannot exert a force.
Types of Rich Datastreams






Multi-User Virtual Environments:
Immersion in virtual contexts with
digital artifacts and avatar-based identities
Wikis and other forms of Web 2.0 media
Asynchronous Discussions
Intelligent Tutoring Systems
Games
Augmented Realities
Related Initiatives




Cisco-Intel-Microsoft global initiative on
assessing 21st century skills
Advances in European measures, such as PISA
Evolution of US tests, such as NAEP
Numerous other scholars working on games
and simulations for learning and assessment
A Breakthrough in the Next Few YearsBut Don’t Wait!
“Disruptive” Assessment
Rewarding Achievement Useful in Real World




Students see academic learning as relevant
Quality is measured in sophisticated ways
along multiple dimensions
Rote teaching and learning are exposed
as tragically inadequate
Learning and formative assessment are
richly interwoven in engaging ways
Call for New Measures
of Inquiry


Paper-and-pencil tests, such as the National
Assessment of Educational Progress (NAEP),
Third International Math and Science Study
(TIMSS), and New Standards Science Reference
Exams (NSSRE), don’t measuring inquiry well
and aren’t aligned with the NSES standards
NAEP published their framework for establishing
a new science assessment in 2009 that calls for
multiple modes of assessment, including
interactive computer assessments
“Immersive” Interfaces
for Learning



Virtual Reality
Full sensory immersion via head-mounted
displays or CAVES
Multi-User Virtual Environments
Immersion in virtual contexts with
digital artifacts and avatar-based identities
Ubiquitous Computing
Wearable wireless devices coupled to
smart objects for “augmented reality”
Affordances of
Immersive Interfaces

The types of behaviors
immersive interfaces can enable





Complex situations with tacit clues
Simulated scientific instruments
Virtual experimentation
Simulated collaboration in a team
Adaptive responses to student choices
Documented in Event-logs and Chat-logs
Traditional Evaluation of Quality
Inferential methods:
On average, students in the River City treatment scored .2
points higher on the post self-efficacy in general science
inquiry section of the affective measure (t=2.22, p<.05).
On average, students in this sample who saw higher gains in
self efficacy in general science inquiry scored higher on the
post test. These gains were higher for students in the
River City project (n=358).
Yet these results tell us nothing about patterns, behaviors,
and processes that lead to inquiry. We are also limited
by # of variables we can build into our inferential models.
Goals of IES VPA Project
Proof of Concept for Immersive Virtual
Performance Assessments (IVPAs) that
Measure Sophisticated Intellectual/Social Skills

Establish higher validity than physical performance
assessments (PPAs)



Establish higher reliability and usability than PPAs,
as well as lower cost



No challenges of physical materials
Virtual worlds enable performances impossible in classrooms
Detailed tracking of participant behaviors
Respectable psychometrics compared to
paper-and-pencil item-based tests
Establish that student engagement leads
to every participant working hard to succeed

The importance of shifts in identity
Research Questions



Can we construct a virtual assessment
that measures scientific inquiry, as
defined by the National Science
Education Standards (NSES)?
What is the evidence that our
assessments are designed
to test NSES inquiry abilities?
Are these assessments reliable?
Research Methods



Alignment studies
Cognitive analysis studies
(think-alouds with students)
Generalizability study across
three instances of the same assessment
Assessment Framework
Evidence Centered Design
 I. Domain Analysis
 II. Domain Modeling
 III. Conceptual Assessment Framework
 IV. Assessment Implementation
 V. Assessment Delivery
 VI. Refinement
Design Process is Not Linear
Domain Analysis
Domain
Modeling
Assessment
Implementation
Conceptual
Assessment
Framework
Domain Analysis
We analyzed different models for
science inquiry:






NSES Standards (National Research Council, 1996)
Inquiry Cycle (White & Frederiksen, 1998)
Novice-expert models
(Chi, Feltovich, & Glaser, 1981)
Scientific Discovery as Dual Search (SDDS)
(Klahr, 2000)
Epistemological & Strategic (Kuhn & Pease, 2008)
NAEP Framework (NAEP, 2008)
Inquiry Models
“The whole of science is nothing more than
a refinement of everyday thinking.”
-- Einstein, 1936 (quoted in Klahr, 2000)
Inquiry is the way
we think. Some
people do it
better.
Experts are doing
something cognitively
different in their head.
Enhanced Assessment
Platform



Use Performance Palettes to Collect Student Work
Minimize the Prediction of Language Art Skills
via use of Audio Instruction and Visual Cues
Enable Realistic Use of Tools Anywhere in the World
Map of the Context
Can vary the casual
model, so the
assessment can differ
from one student
or class to another –
as long as each model
has an equivalent
amount of evidence
collectable with
equivalent time and
effort
Back End Architecture
Download