Elissa Vaidman, ETEC 551 Research Review: A MUVE Towards PBL Writing SUMMARY Introduction There is no doubt that video games are at the top of the entertainment list. In fact, video game sales are currently neck and neck with Hollywood entertainment, and may soon surpass the entertainment giant. While there is an evident decline in student motivation, especially between the transition from elementary school to junior high, there has been an overwhelmingly steady incline of video game usage in all ages (114). Young learners’ interest in video games is quite apparent, as is their disinterest in the current model of instructor-lead learning. Many software developers and instructional designers have seen potential in melding those two worlds together in an attempt to create entertaining as well as effective educational games. Although there have been some successful educational software (Oregon Trail, Mathblaster, etc) there has been little empirical research of the actual effectiveness educational games have on learning. Often times studies have shown a correlation between the games and learning, but until recently, most of the research has been through qualitative analysis, anecdote or various case studies. Subjects The study used 44 students that were randomly divided between two fourth grade classrooms. The teachers were selected specifically because of their teaching attributes. The treatment teacher was selected because she self-reported discomfort with technology and had not used the technologyenhanced learning environment with her students. Therefore, student success could not be attributed to past knowledge and experience with the introduced program. The comparison teacher who would be leading the reading unit in traditional face-to-face instruction was chosen because she was a consistent user of technology and problem-based learning assignments. The researchers were also able to use her usual curriculum, since it utilized PBL, for the study rather than have to create a control curriculum for her to use. Instruments This study used three categories of assessment classified by Hickey and Pellegrino (2005): close, proximal and distal. The close measures (aka activity-oriented assessment) were the completed writing assignments which students submitted (virtually for Anytown students, and hardcopy to the instructor for the face-to-face group). A baseline was established for the Anytown students by the use of a “welcome” activity. Trained evaluators rated the improvement of the writing based on rubrics for state standardized tests. They were trained by the lead researcher and a lead teacher-rater who each had more than 5 years experience evaluating writing prompts for standardized tests. As a group, the raters (lead by the experienced teacher) would discuss their ratings until they were at a unanimous agreement. Proximal measures (aka curriculum-oriented assessment) were evaluated by comparing the scores of the students’ pretests and posttests. Again trained evaluators rated the improvement for each student as well as compared with their fellow students. Distal measures (aka standards-oriented assessment) were evaluated by rubrics provided by and validated by state’s standards. Responses were graded on a 4-point scale by the same trained teachers. Hypothesis The researchers developed 3 hypotheses for the effectiveness on PBL using the virtual environment of Anytown. The first hypothesis stated that the amount of time the teacher in the virtual environment spent answering procedural and directional questions would be less than the amount of time spent by the face to face instructor. The second hypotheses stated that the students in the treatment condition would complete more non-required assignments voluntarily than would the students in the face-to-face classroom. The third hypothesis stated that the quality of descriptive writing achievement for the students in the treatment condition would be greater than the quality of those in the face-to-face instruction. Of course, they expected each of their hypotheses to be found at a statistically significant level. Procedure Every student was given a pretest and posttest. The students chose from a hat which test they would take as the pretest and would then take the second option as their posttest. Therefore, the pre- and post-tests were balanced. The independent variable for the study was the type of instruction: face to face problem-based instruction referred to in the study as the Reading Curricular Unit versus the multiuser virtual environment problem-based learning referred to as the Anytown Language Arts Unit. The dependent variable for the study were student achievement on the posttest writing assignment taken from the state standardized exam, students’ submitted work for the unit, and the amount of time the teacher spent answering questions that were procedural or directional in nature. The students in the Anytown unit were to complete their unit solely using Anytown. They completed problem-based writing activities that were designed to “prompt the practice of descriptive writing, engagement in problem solving and student reflection upon their own personal experiences. There were 4 tasks given to students. The first (Writing Quests) was required of all students. The other three were optional and the students could choose them as part of a “free-choice” activity while they waited for feedback on the required task. In-game tutorials and resources were expected to empower the students to explore the environment on their own with little need from teacher-direction. The students in the face-to-face Reading unit were given the instructors normal curriculum since it was already comparable to Anytown instruction. However, the teacher was given the standards addressed in the instruction. However, the teacher was given the standards addressed in the Anytown environment and assessment measures so she created voluntary assignments as well. Analysis For the first hypothesis, researchers distinguished between questions that were procedure-based and questions which were educational-based, through the use of transcripts of recorded class sessions. They then calculated the amount of time spent on the procedure-based questions for both the treatment group and comparison group. The second hypothesis was evaluated by collecting all completed writing assignments that were offered as optional and not required. Researchers then compared the number of voluntary tasks completed by Anytown unit with the number completed by the face-to-face unit. For the third hypothesis, evaluations were taken for the 3 measurement tools: close, proximal and distal. For close level scores, researchers collected students’ work that was completed over the course of the Anytown unit and they were each scored by 3 teachers. These scores were only produced for the Anytown unit because “quests” were not a part of the face-to-face unit. Proximal level scores were based on a state-approved rubric for standardized tests. 3 teachers graded each pre and post-test on a 6-point scale. A repeated measure analysis was conducted to compare the pre and posttest scores for students in both units. Distal level scores were determined by a four-point scale that compared scores on the first quests with those of the final quests Results Hypothesis 1: A paired-C sample t-test showed a significant difference (t (15) = 5.947, p = .043) between time spent by the face-to-face instructor on procedure-based questions compared to the Anytown instructor. From the diagram on page 129, we can see that the treatment teacher on a whole spent far less time answering directional questions, and by the 4th day had a significant reduction in time spent on questions, jumping from 19 minutes to an average of 6 minutes. Hypothesis 2: Students in the Anytown group completed or worked on 26 voluntary tasks. No students in the face-to-face unit completed any voluntary tasks. A conclusion cannot be made from these results because, while it showed the Anytown students were more inclined to do optional activities, the reason why cannot be ascertained. It could be due to the motivating tools in the game (students could collect rewards, etc), or the novelty of using a computer. Because it is unknown, these results are inconclusive. Hypothesis 3: Close level scores were only completed for the treatment group based on required and non-required assignments completed in the Anytown environment. The study found no significant differences between scores on mandatory writing “quests” and volunteer “quests”. In proximal levels of measurement the study found significant differences between pretest and posttest scores in favor of the treatment group. Finally, in distal levels of measurement, the study found a significant difference in initial quest scores and end quest scores between classes. Conclusions The study concluded that MUVE does minimize teacher time to answer instructional questions. It also improves student writing ability significantly more than a standard face-to-face classroom. Thirdly, it increased voluntary writing practice. These conclusions support the use of the Anytown environment in writing practice as it keeps students engaged in their learning. CRITIQUES Introduction Although the researchers addressed a problem that hasn’t specifically been addressed, there has been research on educational games for other subjects such as Math and Science which found positive results. Therefore, it is to be expected that they would find results in their favor. Also, it is important to note here, in the beginning, that one of the researchers involved (Sasha Barab) is involved in the Quest Atlantic development. This automatically should bring some skepticism to the validity of this study as the researchers may have had an invested interest in the outcome. Review of Literature While there have been studies to support the effectiveness of video games in subjects such as math and science, the authors argue little research has been conducted on the effectiveness of video games on writing skills. In most situations, technology use for writing mostly involves word processing, rather than “enhance writing instruction, provide feedback, or encourage reflection” (117). Interestingly, the researchers do not distinguish between typing practice and writing practice. It’s important to distinguish the two because students may have a more difficult time typing if they have not had significant practice. Also, students in the classroom are not receiving the same amount of computer time and practice, which is an individual disadvantage for each of those students. Lastly, again, Sasha Barab has been involved in other research for Quest Atlantis as an employee and thus an invested interest in the outcome. Hypothesis I found no problems with their hypothesis, although in the introduction the researchers say that one of the factors in determining the value of “problem-driven digital learning” is measuring the amount of time the student spends on tasks. They did not address this in their hypothesis, and I assume it is because teachers do not have an infinite amount of time for each lesson in order to let their students do certain tasks for as much time as they want. Sample There is a problem with the way in which the teachers were chosen. First, the comparison teacher was chosen because she utilizes the computer a lot with her class and she uses the face-to-face problem based learning. The treatment teacher was chosen because she is uncomfortable with technology and rarely uses the lab with her students. I understand why they made these two choices; however, an ideal study would by the same teacher with different periods. Additionally, since they only used 1 school (in the Midwest), there is little generalizability for this study. Procedure & Instruments The problem with the Anytown environment is that it was intended “to create a small-town feeling” that would be familiar to most students. Unfortunately, this means there is likelihood it is un-relatable to bigcity students or immigrant students. It also then places the value of the culture inside the game and makes the tool political. It would be interesting to do a study of environment relate-ability between students in large cities compared to those in small towns. Secondly, the in-class students are doing their problem-based learning in groups. Interacting with each other and giving feedback and creating a final paper based on their discussions. In the Anytown environment it seems the students are on their own. They can interact with the students inside the virtual reality, however, there is no indication that they are put into groups by the teacher in order to discuss. The system has chat email and telegram functions that are meant to be used by the students for peer review, discussion and reflection, but there was no discussion whether or not this occurred or was studied. If it wasn’t a factor, then the researchers should not be able to call this a study on “problembased learning as it was defined in this article. Analysis I found no issues with their analysis or the evaluation method they used, or the tests they conducted to find significant differences. Results and Conclusions To me, it was unclear whether their distal measures in the 3rd hypothesis were conducted on the entire two classes, or only the computer class. In the hypothesis, it seemed as though writing scores would be conducted for all students, however, in the results section, it said they conducted the distal measures on the scores of first Quest iteration and last Quest iterations. I believe the Quests were only available in the Anytown unit, but am not sure if they meant the general first and last assignments given in the classroom as well. It was very unclear, from my perspective. Interestingly, students who scored high on the pretest scored lower on the posttest and those who scored lower on the pretest tended to score higher on the posttest. I find this to be a serious problem with the pre and posttest options, because it means one of the tests was significantly harder than the other. (Remember, students selected out of a hat which of 2 tests they would take for the pre-test, and the other would be their posttest). Overall, I felt the study did address their hypothesis adequately, although I think their hypothesis should have been slightly different, as discussed above. The biggest problem I see with this study is the fact that one of the researchers is the principle researcher for the company Quest Atlantis.