Proposal Information Proposal Number 0310164 Proposal Title ROLE: Research on Learning in Virtual Worlds for Education Received on 12/02/02 Principal Investigator Brian Slator CO-PI(s) Lisa Daniels Institution North Dakota State U Fargo This Proposal has been Electronically Signed by the Authorized Organizational Representative (AOR). Program Information NSF Division DIV OF RSCH, EVALUATION AND COMMUNICATIO Program Name RESEARCH ON LEARNING & EDUCATI Program Officer James S. Dietz Telephone (703) 292-5156 E-Mail jdietz@nsf.gov Review Information External Peer Review began on 01/23/03 Proposal Status Status As of Today Dated: 04/21/03 This proposal has been declined by NSF. The Principal Investigator has been sent information about the review process and anonymous verbatim copies of reviewers' comments. Reviews All of the reviews of your proposal that have been released to you by your NSF program officer can viewed below. Please note that the Sponsored Research Office (or equivalent) at your organization is NOT given the capability to view your reviews. Context Statement The Review Process for this Competition Research on Learning and Education program (ROLE; NSF 02-023) received 86 proposals for its sixth competition date of December 2, 2002. The program expects to fund approximately 15 awards from this competition. Proposals were discussed by panels of 10 reviewers who met at NSF on January 21 to 24, 2003. Written reviews of each proposal were provided by panel members. In some cases, additional ad hoc reviews were provided byoutside readers. Each reviewer rated the merit of the proposal as Excellent, Very Good, Good, Fair, or Poor. For each proposal, one panel member wrote a summary of the panel's discussion. The panel as a whole rated the proposal as being Highly Competitive, Competitive, or Not Competitive. In the present competition, 12 proposals were rated Highly Competitive, 27 Competitive, and 47 Not Competitive. Staff Analysis Intellectual Merit: The panel believed that this proposal, on formative scenario assessments embedded in online environments, represented an exciting idea that is worthy of demonstration and exploration. Panelists felt that some of the correct expertise was represented among the team. However, there were a number of concerns. First, panelists felt that expertise in assessment and student learning was not well represented on the team. In particular, the panel recommended that the researchers address the literature on assessment and student learning and consider adding expertise in these areas. Second, the panel was not comfortable with the proposed reliance on expert judgment of face validity of the assessments and the lack of measurement of opportunity to learn. Third, the data collection and analysis methodology was not well-specified nor were the treatment and control conditions and how the research would be implemented. Broader Impact: The panel believed that this research has potential for broader impact that was not exploited in the proposal. Panel and staff recommend that the researchers consider measuring possible differential effects/outcomes on women and underrepresented racial and ethnic groups. No dissemination plan was included. In sum, this proposal generated some excitement among the panelists but critical areas of the literature and education research design were missing. The investigators are invited to discuss plans for future submission with a ROLE program official. Panel Summary #1 PROPOSAL NO.: 0310164 PANEL SUMMARY: This proposed project concerns learning and assessment in immersive virtual environments. The investigators are from North Dakota State University, and they will be building on funded work on two existing virtual worlds, Geology Explorer and Virtual Cell. A new project, the Virtual Archaeologist, will serve as a testing ground. The new proposed work is largely focused around what the investigators call 'scenario assessments' which are like interviews around situations that are akin to those that might be encountered in the virtual worlds. They want to develop intelligent software that can conduct these interactive scenario assessments, and they want to integrate some of this intelligence into a tutoring agent for the virtual environment. In addition to this software development, they also propose to conduct studies using a variety of empirical methods. This includes an ambitious longitudinal study. Overall, panelists expressed enthusiasm for the direction of the proposal. In particular, they felt that using technology in the service of creating more authentic assessments was an admirable goal. One panelist stated the development of the intelligent software agents - both for the interactive scenario assessments and for use in the virtual environment ? would constitute an important intellectual contribution. However, all panelists agreed that the investigators did not appear to have the expertise needed to conduct important aspects of the proposed project, and that this was strongly evident in the proposal. Given that the proposal is framed as having a focus on assessment, the lack of any evidence of knowledge of the literature on assessment was seen as a fatal flaw. Similarly, the proposed evaluation plan did not seem to have the relevant grounding. For example, it appeared that assessment instruments would just be judged on face validity by domain experts, and would not be grounded in any understanding of the relevant literature on scientific thinking and learning. One panelist commented that there's no real look at what kids' opportunity to learn is within the learning environment; i.e., there's no analysis of the treatment kids received and what outcomes that treatment could realistically be hoped to engender. Some panelists commented that the cost of the proposed work was not too high, and that it might make a nice demonstration project if scaled back. Panelists also made suggestions for how to create a more competitive proposal for the future. One suggestion was to bring in assessment expertise. Another was for the investigators to stick more closely to issues where they have the most expertise. In order to get an overview of the research on assessment, a panelist recommended the NRC report "knowing what students know." About broader impact: The excitement generated by virtual worlds is a reason to hope for broad impact. However, given the limitations discussed above, panelists felt that the broad impact of the project proposed here would not be great. Some panelists also made more specific comments about parts of the proposal that discussed impact. One panelist stated that he did not find the plan for dissemination convincing. Another panelist stated the applicants are not clear about why the groups served are underrepresented. PANEL RECOMMENDATION: Non Competitive Review #1 PROPOSAL NO.: 0310164 INSTITUTION: North Dakota State U Fargo NSF PROGRAM: RESEARCH ON LEARNING & EDUCATI PRINCIPAL INVESTIGATOR: Slator, Brian M TITLE: ROLE: Research on Learning in Virtual Worlds for Education RATING:Good REVIEW: What is the intellectual merit of the proposed activity? The authors propose work concerning learning and assessment in immersive virtual environments. The investigators are from North Dakota State University, and they will be building on funded work on two existing Virtual Worlds: Geology Explorer and Virtual Cell. A new project, the Virtual Archaeologist, will serve as a testing ground. The new proposed work is largely focused around what they call ?scenario assessments? which are like interviews around little situations, like those that might be encountered in the virtual worlds. They want to develop intelligent software that can conduct these interactive scenario assessments,? and they want to integrate some of this intelligence into a tutoring agent for the virtual environment. In addition to this software development, they also propose to conduct studies using a variety of empirical methods. This includes an ambitious longitudinal study. The biggest problem is one of focus. Initially, the proposal suggests that this work will be focused around developing and validating scenario assessments. But, at the end, there?s a large longitudinal study that at least seems like it has very broad aims. In fact, near the end, there?s a large table called the ?evaluation plan? that lists many kinds of data collection that are not listed anywhere else. The grounding in theories of learning and pedagogy are a bit weak. They state that research teaches us that learners are ?blank slates? and that learning is an ?active process. These statements are so superficial that they can?t come anywhere near for providing a rationale for immersive virtual worlds or any other aspects of this work. There is a similar problem with the pedagogical rationale. The heuristic they propose is to ?recreate real-world experiences in virtual worlds? in such a way that students can have experiences like those of scientists. But this must be overly simplistic. The real-world experiences of scientists span years and hundreds of ours. Even in an immersive virtual world, these experiences must be boiled down into a handful of hours, and the tasks must be radically narrowed. For these reasons, activities immersive virtual worlds may not differ in as fundamental a way from more traditional activities. There?s still the problem of how to radically simplify science so that it can be made tractable in the classroom. What are the broader impacts of the proposed activity? The technology here really pushes the envelope of what?s possible. Also, it?s a little hard to tell, but it seems like they are proposing to put intelligence into software in a way that is sensible. And they?re doing this using, in a sensible manner, some existing software resources such as WordNet. The big question, therefore, has to be whether this will be worth the effort. In response, I want to point to just one thing that I found worrying. They cite data that suggests that they get strong, significant effects when they compare students in their treatments to control subjects. But ? and I?m not sure about this ? it seems that they get these effects on scenario assessment items that are very similar to tasks that the students in their treatment have been exposed to in the virtual worlds It that?s right, then it?s not that important a result that their students do better on their tasks; of course students do better on tasks that are very familiar. So this makes me wonder what the evidence really is that supports more effort here. At the least, I?d like to see more care in interpretation of outcome measures. Part of the problem here is calibration of outcome. Suppose we had in mind that we wanted students to come out doing well on a certain kind of scenario assessment. How hard would it be to achieve this? Could we do it without immersive virtual worlds? Part of the problem is the lack of much of a theory of learning there. They are seeking to improve ?scientific reasoning? and ?scientific problem solving.? But as far as I know, there isn?t any consensus about what these things mean. So it?s hard to know what counts as an appropriate assessment. Summary Statement I believe that this boils down to infusing more money into various aspects of a large, ongoing project ? a project that is already quite interesting. As such this might not be such a bad thing. But we should be clear that there may not be a well-defined slice of the project that is being funded here. This is particular true of the empirical work. Review #2 PROPOSAL NO.: 0310164 INSTITUTION: North Dakota State U Fargo NSF PROGRAM: RESEARCH ON LEARNING & EDUCATI PRINCIPAL INVESTIGATOR: Slator, Brian M TITLE: ROLE: Research on Learning in Virtual Worlds for Education RATING:Poor REVIEW: What is the intellectual merit of the proposed activity? This proposal focuses on the development of scenario-based assessments to support virtual learning experiences in science education. It relies on content expertise to establish the face validity of the scenarios, to ensure that they reflect scenarios that actual scientists might encounter. Included as part of the project is software development for the learning environment and assessments, studies to explore past assessment data collected on the learning environment, and longitudinal studies of the effects on student learning. The most promising aspect of this proposal is the focus on scenario-based assessment design. As the PIs note, there is a need for assessments that match authentic learning experiences in science that target higher-order thinking skills. Developing such assessments is an expensive undertaking, and new technologies of the sort described in the propposal promise to make these types of assessments more easy to develop and scale. Already, the particular technologies that these assessments will draw from have proven successful in the design of scenario-based learning environments. The emphasis on fidelity to scientific practices in the scenarios is also promising, though science standards also emphasize developing students' basic literacy with scientific concepts for informed citizenry, which requires different kinds of assessment scenarios. There are some key issues, however, that would need to be addressed for this proposal to be successful. There is limited assessment expertise on this team; evidence of the use of principled assessment is limited here--there is no assessment blueprint with key student concepts to be mastered articulated; the content experts' scenarios are not to be checked against the concepts or themselves reviewed via think-alouds; and no plan for validation of the instruments, despite this being a focus of the proposal. I would direct researchers to the National Research Council publication, Knowing What Students Know (2000), for a sense of where the field of assessment is headed more broadly, and to the work of Black and Wiliam on formative classroom assessment. In addition, the NRC has recently published a volume on classroom assessment in science that the researchers are likely to find useful. Also, the proposed studies are not likely to provide adequate means to address the key research questions. First, the exploration of assessment data does not look into the possibility that it is not the form of the assessment, but rather student opportunity to learn, that is a factor that predicts student performance. Measuring opportunity to learn would be a beneficial addition to the validation of these assessments--providing a check of the instructional sensitivity of the scenarios (Do students have the opportunity to learn, through the instructional activities in the online environment, the skills that are tested on the assessments?) Second, the longitudinal study design is not well articulated. It's unclear who the treatment or comparison groups are and whether there are both treatment and comparison groups to be used. What are the broader impacts of the proposed activity? Technology does indeed promise to make assessments that are better matched to standards for learning in math and science education today more scalable and easy to create. However, the process articulated is inadequate to achieve that promise on a number of counts. First, there are no steps involved that link the assessment scenarios created back to student learning goals. This is an explicit step in assessment development, even when face validity is to be assured by expert content review. Second, the creation of full-fledged scenarios, rather than templates for scenarios, may not prove as scalable to other contexts, such as K-12 classrooms. Some of the more successful scenario-building tools developed at ILS are in fact templates for building scenarios, which may scale more easily. It would be more beneficial to have content reviewers construct templates, and test whether educators could create their own assessments with help from the system, in producing a scalable system for assessment design. Summary Statement This proposal focuses on the development of assessment and learning activities within a multi-user learning environment in science. The assessment activities will be scenariobased, designed to test students' problem solving skill in situations that actual scientists might face. While the approach--scenario-based assessment--is promising, the proposal lacks a clearly articulated process for assessment validation and may not scale as well as believed by the PIs, since all scenarios rely on experts, rather than educators, for their validity. Review #3 PROPOSAL NO.: 0310164 INSTITUTION: North Dakota State U Fargo NSF PROGRAM: RESEARCH ON LEARNING & EDUCATI PRINCIPAL INVESTIGATOR: Slator, Brian M TITLE: ROLE: Research on Learning in Virtual Worlds for Education RATING:Fair REVIEW: What is the intellectual merit of the proposed activity? It is important to consider the integration of assessment and learning environments. But how different is it from traditional design? What are the broader impacts of the proposed activity? This proposal focuses on a college-level audience, but could be scaled to other levels. Can see these learning environments being used in the context of higher education, but not necessarily in pre-college settings, given the contraints imposed by No Child Left Behind. Summary Statement An interesting demonstration project that is not exceedingly expensive. I worry about the methodology being too soft and lacking in specificity. Given the relatively small budget (in comparison to other development or demonstration projects), this work might be costeffective in producing some interesting results at a relatively low cost. Review #4 PROPOSAL NO.: 0310164 INSTITUTION: North Dakota State U Fargo NSF PROGRAM: RESEARCH ON LEARNING & EDUCATI PRINCIPAL INVESTIGATOR: Slator, Brian M TITLE: ROLE: Research on Learning in Virtual Worlds for Education RATING:Fair REVIEW: What is the intellectual merit of the proposed activity? Studies that address the inappropriate distoting impact of assessment on instruction in a sytematic and positive manner should be applauded. But much has been written about assessment that is not mentioned in this application. The proposal is much the weaker for this fact. I am also not convinced about the scenarios described. Do these really grip kids? In what way are they authentic? What are the broader impacts of the proposed activity? The study claims to be designed so that it will 'encourage the participation of underrepresented groups. The applicants are not clear about why the groups are underrepresented, viz. 'Whatever the reason, minority and low socio-economic student groups have historically been out-performed on traditional assessments than their nonminority, higher-income counterparts'. My feeling is that without an explantion for this, it is hard to have faith in the alternative model that they propose. Or at least they should be interested in investigating why their approach might treat everyone more fairly. The proposal seems weak on dissemination on first reading, too. Summary Statement I have the feeling that the technology is used because it has potential rather than because it is a proven tool. So the study is much more experimental than one might think. I remain to be convinced that the tasks will motivate kids to learn much useful science. Review #5 PROPOSAL NO.: 0310164 INSTITUTION: North Dakota State U Fargo NSF PROGRAM: RESEARCH ON LEARNING & EDUCATI PRINCIPAL INVESTIGATOR: Slator, Brian M TITLE: ROLE: Research on Learning in Virtual Worlds for Education RATING:Good REVIEW: What is the intellectual merit of the proposed activity? The goal of this proposal is to externally validate an intelligent, interactive, interviewbased tool for collecting scenario-based assessment data used to assess students' authentic experiences in immersion virtual environments. The researchers also intend to conduct longitudinal studies examining the impact of software tutoring agent strategies influence on higher level thinking, transfer and retention tracking students over three years. The research protocol presented involves multiple methods including quantitative and qualitative data gathering to provide formative assessment information in modifying the instructional materials. In addition, the researchers propose to integrate two artificial intelligence technology systems (Ask and WordNet) to provide an intelligent interactive assessment system called to Subjective Learner Assessment Technology tool (SLATE). The researchers seem well-qualified to conduct this study although, there is some concern about the project investigators available time to commit to this project with several other current grant awards. Conducting longitudinal studies that validate the use of alternative, intelligent authentic assessment systems seems a worthwhile goal. Unfortunately, the proposal details significant formative evaluation methods but does not fully address examining the validity of the authentic assessment activities. It is not clear whether the plan for validating the assessment method proposed evidenced by the research question is based in sound methodology. Limited information related to the theoretical framework of assessing student learning in science through scenarios and intelligent, automated interview systems is provided in the proposal. The researches propose to conduct a longitudinal investigation of the use of software tutoring agents in the automated SLATE system on higher level thinking, transfer and retention outcomes, however, they have not yet determined or created the assessment measures for this effort. What are the broader impacts of the proposed activity? The validation of authentic assessment is complex endeavor. Externally validating an automated authentic interview assessment tool seems quite complex. The broad impact of the proposed study is limited as currently represented. The proposal attempts to address the participation of underrepresented groups but lacks a strong plan for disseminating the potential findings. These are important factors to address. Summary Statement