Game Evaluation Running head: TRAINING EFFECTIVENESS OF A COMPUTER GAME A Formative Evaluation of the Training Effectiveness of a Computer Game A Proposal Submitted to: Dr. Harold O’Neil (Chair) Dr. Richard Clark Dr. Edward Kazlauskas by Hsin-Hui (Claire) Chen University of Southern California 4325 Renaissance Dr., #307 San Jose, CA 95134 (408) 434-0773; hsinhuic@usc.edu In Partial Fulfillment for Ed.D. in Learning and Instruction November 24, 2003 1 Game Evaluation 2 Table of Contents ABSTRACT……………………………………………………………………… 5 CHAPTER I: INTRODUCTION………………………………………………….. 6 Background of the Problem…………………….………………………………….. 6 Purpose of the Study………………………………………………………………. 8 Significance of the Study…………………………………………………………… 8 CHAPTER II: LITERATURE REVIEW….……………………………………….. 10 Relevant Studies……………………………………………………………………. 10 Games and Simulations……………………………………..…………………….. 10 Theories of Games and Simulations…..…..……….………………………. 15 Game Selection…………………………………………………… 18 Design of Games and Simulations………………………………… 20 Training Effectiveness of Games.…………..….………………………….. 21 Promotion of Motivation………………………………………….. 23 Enhancement of Thinking Skills…………………………………. 25 Facilitation of Metacognition…………………………………….. 27 Improvement of Knowledge……………………………………… 28 Building Attitudes………………………………………………… 31 Summary………………………………..…………………………………. 32 Evaluation…………………………………………….…………………………… 33 Models of Evaluation………………………………….………………….. 34 Summative Evaluation ……………………..….…………………. 34 Formative Evaluation……………………..…..……………………….. 36 Game Evaluation 3 Table of Contents (Cont.) Kirkpatrick’s Four-Level Evaluation.…………………………………. 38 Game Evaluation………………………………………………………………. 43 Summary……………………………………………………………………… 46 Problem Solving……………….………………………………………………........ 47 Definition of Problem Solving …………………………………………….. 47 Significance of Problem-Solving Skills…………………………………….. 50 Assessment of Problem Solving…………………………….…………….. 51 Measurement of Content Understanding…………………………… 52 Measurement of Problem Solving Strategies…………….................. 61 Measurement of Self-Regulation……………....…………………….. 62 Summary………………………………………………………..……….. 64 Summary of the Literature………………………………………………………….. 64 CHAPTER III: METHODOLOGY………………………………………………. 67 Research Hypotheses…………………………………………………………….. 67 Research Design……………………………………………………………….…. 67 Pilot Study…………………………………………..……………………………. 67 Formative Evaluation………………………………………………………. 68 Participants……………………………………………….………….. 69 Puzzle-Solving Game…….………………………………………… 69 Knowledge Map…………….…..….………………………………. 71 Feedback……………………………………………….................. 77 Measures…………………………………………………………………… 77 Game Evaluation 4 Table of Contents (Cont.) Content Understanding Measure………………………………..… 77 Domain-Specific Problem-Solving Strategies Measure………..…... 79 Self-Regulation Questionnaire……………………………………… 81 Procedure..…..…………………………………………………………….. 81 Time Chart of the Main Study……………………………………. 82 Data Analysis.…………….……..…………………………………………… 82 Main Study….…………………….………………………………………………… 82 Method of the Main Study…………………………………….……………. 82 Participants………………………………..…………….………… 83 Game…….………..……………………………………………….. 83 Measures……………………………………………………………………. 83 Knowledge Map…………………………………………..………. 83 Domain-Specific Problem-Solving Strategies Measure…………... 83 Self-Regulation Questionnaire……………………………..………. 84 Procedure..…..…………………………………………………………….. 84 Computer-Based Knowledge Map Training……….…………………. 84 Game Playing……………………………………………..………. 84 Feedback on Game Playing Strategies.….… ……………………… 84 Data Analysis………………………………………………………………… 85 REFERENCES ………………………………….……..………………………… Appendix A Self-Regulation Questionnaire.………………………………..… 86 103 Game Evaluation 5 ABSTRACT Despite computer games and simulations’ potential power in instruction and training, research on their training effectiveness for adults is limited, and a framework of evaluation is lacking, therefore, more analysis and studies on their evaluation need to be conducted (O’Neil & Fisher, 2002; O’Neil, Baker, & Fisher, 2002; Ruben, 1999). In addition, according to previous studies, a computer game may be one of the most effective tools to improve problem-solving. Problem-solving is one of the most significant competences whether in job settings or in schools, and as a result, teaching and assessing problem-solving become one of the most significant educational objectives (Mayer, 2002). Therefore the researcher plans to conduct a formative evaluation on a computer game in terms of its effectiveness of enhancing learners’ problem-solving, including content understanding, domain-specific problemsolving strategies, and self-regulation. In the first part of this proposal, the author will review the relevant literature on computer games and simulations, evaluation models, and problem-solving. The second part of this proposal will be devoted to a pilot and main study of the formative evaluation on a computer puzzle-solving game in terms of their effectiveness of enhancing players’ problemsolving. Game Evaluation 6 CHAPTER I INTRODUCTION Background of the Problem As pointed out by Ruben (1999), researchers such as Abt (1970), Coleman (1969), Boocock & Schild (1968), Gamson (1969), Greenblat & Duke (1975), Pfeiffer & Jones (1969-1977), Ruben (1978), Ruben & Budd (1975), and Tansey & Unwin (1969), started to notice the potential effects of simulations and games in instruction decades ago. The merits of computer games include facilitating learning by doing (e.g., Mayer, Mouton, & Prothero, 2002), and triggering motivation and enjoyment. In addition, computer simulation games engage learners in a simulated experience of the real world, which makes learning practical (Martin, 2000; Stolk, Alexandrian, Gros, & Paggio, 2001). Due to those merits, games and simulations have been applied in various fields and settings, such as that of business, of K-16 organizations, and of military organizations. Furthermore, as pointed out by Stolk et al. (2001), for the training in some settings where practice and exercises in real situations are expensive and dangerous, computer games and simulations are helpful. For example, military settings applied computer-based training tools, such as war-games and simulators for task training. The same situation happens in the field of environmental crisis management; practicing dealing with natural disasters and industrial emergencies are usually very expensive and dangerous, therefore it is necessary to apply instructional gaming (Stolk et al., 2001). However few studies have shown the empirical effects of games and simulations on training and learning (O’Neil & Fisher, 2002). According to O’Neil and Fisher, the effects of computer games and simulations can be generally divided into five categories: promotion of Game Evaluation motivation, enhancement of thinking skills, facilitation of metacognition, enhancement of knowledge, and attitudes. They also indicated that despite the potential power of computer games on instruction and training, research on their training effectiveness is limited, and there was little gaming literature that was helpful in designing a formative evaluation of games. As pointed out by Ruben (1999), there is not enough research on the evaluation of games’ instructional effectiveness and its validity and reliability. According to researchers (e.g. O’Neil, Baker, & Fisher, 2002; Quinn, 1996), one of the critical concerns is time and expense. Therefore, more investment should be put in the analysis and studies on computer game evaluation (O’Neil, et al., 2002 O’Neil & Fisher, 2002; Quinn, 1996; Ruben, 1999). According to previous research, problem-solving is one of the most critical competence whether for lifetime learning or accomplishing tasks, whether in job settings, in academic settings (e.g. Dugdale, 1998), or any other settings. Although there is substantial previous research which reveals the utility of problem solving (e.g. Mayer, 2002), the methods to assess problem-solving skills still need to be refined. For example, assessing the problem-solving of students by giving them a test of separate and unconnected multiple choice questions, teachers are not accurately assessing students’ problem-solving abilities. Further, traditional standardized tests do not report to teachers or students what problemsolving processes they should lay emphasis on and why. Although we can find the most useful measures for problem solving competence in the cognitive science literature, these measures (e.g., think-aloud protocols), however, are inefficient to assess performance for diagnostic purposes, since their scoring is laborious and time-consuming (O’Neil, 1999). As a result, the National Center for Research on Evaluation, Standards, and Student Testing (CRESST) has developed a problem-solving assessment model to measure content 7 Game Evaluation 8 understanding, problem-solving strategies, and self-regulation, the three elements of problem-solving. Purpose of the Study Games and simulations have potential use in teaching and learning, and have been applied in the field of business, in academic organizations, and in the military settings, and as argued by Quinn (1991), computer games may provide effective environments for problemsolving. But there is little research on games’ training effectiveness. This researcher will conduct this study focusing on the evaluation on a computer game with regard to its effectiveness of improving problem-solving. The evaluation to be conducted in this study will be a formative one, which is applied while a program or a system is happening or forming. The formative evaluation is conducted to judge the worth of a program or to determine adjustments needed to attain the objectives while it is in progress instead of at the end of it. The researcher will apply the problemsolving assessment model developed by the National Center for Research on Evaluation, Standards, and Student Testing (CRESST) to measure content understanding, problemsolving strategies, and self-regulation, the three elements of problem-solving ability because of their validity and reliability indicated in previous literature (Herl, O’Neil, Chung, & Schacter et al., 1999; Mayer, 2002). Therefore, the main purpose of this study is to find out if game-playing helps increase players’ problem-solving ability. Significance of the Study Baker and Mayer (1999) indicate that educational assessment has at least three distinct uses in instructional improvement: first, the results of the assessment are used to keep educational organizations and students alerted to the academic goals. At the same time it Game Evaluation 9 motivates the schools and students to achievement the academic standards. Second, the assessments outcomes provide beneficial objective information, helping teachers to plan or revise their instruction or assisting administrators in allotting resources to eliminate the deficiency. Third, the assessment outcomes promote deeper understanding, according to the present viewpoint of “learning as an activity in which students seek to make sense out of presented material.” Some educators such as Amory (2001) have developed instructional games or software containing aspects of human evaluation, and evaluation on its learning environment; however, as pointed out by researchers such as O’Neil and Fisher (2002), O’Neil, Baker, and Fisher (2002), and Ruben (1999), the effort to evaluate game’s training effectiveness is little when compared to the enthusiasm and effort to take advantage of games’ potential power in training. Furthermore, evaluation is significant for program designers and executors to determine its effectiveness, value and further improvement needed, therefore, more analysis and studies on their evaluation need to be conducted on games. This researcher focuses on formative evaluation since it not only documents the computer game’s effects on training, but also discusses the implicit feedback designed in game terms of puzzle-solving strategies, which offers significant information for future trainers and developers to select, apply, or design computer games for training. Since the framework of evaluation on games’ effectiveness on training is lacking as pointed out by previous researchers, the other purpose of this study is to create a framework of evaluation on computer games’ effectiveness in improving problem-solving, which can be applied in future studies. Game Evaluation 10 CHAPTER II LITERATURE REVIEW Relevant Studies Games and Simulations As defined by Gredler (1996) “games consist of rules that describe allowable player moves, game constraints and privileges (such as ways of earning extra turns), and penalties for illegal (nonpermissible) actions.” In addition, the rules of games do not have to obey those in real-life and can be imaginative. On the other hand, Driskell and Dwyer’s (1984) defined a game as a rule-governed, goal-focused, microworld-driven activity incorporating principles of gaming and computer-assisted instruction, and a simulation game is one that consists real settings, participants’ roles, playing rules, and scoring systems (Christopher, 1999). A microworld defined by constructivists is a small but complete subset of reality where person can acquire knowledge of a specific domain by exploring in it as a knowledge construction tool (Rieber, 1996). As pointed out by Gredler (1996) Games and simulations have differences in both surface structure and deep structure; surface structure refers to the observable characteristics while deep structure is defined as the psychological mechanisms operating in the exercise. The surface structure of games, according to Gredler, are like “drawing cards, moving pieces around a board, and so on,” while the surface structure of a simulation is “a scenario or a set of data” to be addressed by the participant (Gredler, 1996, p 522). On the other hand, Gredler points out that their major differences in deep structure are: (1) while a game player intends to win in the game with competition, a participant in a simulation of a specific setting is executing serious responsibilities, deliberating feasible job procedures and possible consequences; (2) event sequence of a game is typically “linear”, Game Evaluation 11 whereas a simulation sequence is “branching,” which means that actions and decisions made previously will influence or result in the following situations and problems; (3) rules and settings of games are not necessarily realistic or matching the real world, but those of simulations are authentic and closely related to the real world. Finally, games are usually more fun-driven than simulations. The primary characteristics of games and simulations are shown in Table 1. Table 1 Primary Characteristics of Games and Simulations. Adapted and modified from Gredler, M. E. (1996) Games Setting: Students are transported to another world or Simulations X X environment Purpose: Fun X Competition and winning X Fulfilling a professional role X Executing a professional task X Event sequence: Typically linear X Nonlinear or branching Mechanisms thatSets of rules (may be imaginative) X X Game Evaluation determine consequences: Dynamic set of authentic causal relationships 12 X among two or more variables Participant is a component of the evolving scenario X X and executes the responsibilities of his or her role Participant interacts with a database or sets of processes to discover scientific principles, X explain or predict events, and confront misconceptions As seen in Table 1, a common feature of games and simulations is that they transport the players to another world. For example, a game player may deliberate strategies for a chess board game while a simulation participant may diagnose the problem and apply plausible policies as a mayor of a simulated city. Another similarity is that a participant is a part of the evolving situations and performs the duty of his or her role. The mechanisms of games and simulations, as shown in Table 1, are distinct. The rules of games may be imaginative while those of simulations may be close to real-life situations. That is, simulations are designed with the dynamic set of authentic causal relationships among two or more variables. In addition, players of simulations may encounter a database or sets of processes to discover scientific principles, explain, or predict events, and confront misconceptions (Gredler, 1996). Game Evaluation 13 Although researchers debate the definition of simulation and game, Martin (2000) cited and concluded it in his article the as the following: Games generally have rules and an expectation of competitive behavior toward winning (Jones, 1998a) and often include a large degree of chance (Jones, 1998). They sound appealing to students, although perhaps at the expense of learning (Jones, 1989; Lane, 1995) Simulation typically emphasizes a more academic and thoughtful exercise, often involves a model of a process, and typically supports learning specific content or about decision making. Shubik (1983) considers gaming to be primarily people centered and simulations to be primarily computer oriented. Lane (1995) agrees, describing games as interactive whereas simulations are described as models that can be left to run. Klein (1985) neatly describes a simulation as a game without players. (p. 457) Examples of application of simulations in the business sector are found in previous studies (e.g., Schank, 1997; Washbush & Gosen, 2001). In Washbush and Gosen’s (2001) study on undergraduate students on the effect of learning in an enterprise simulation, MICROMATIC. In this study, students’ performance of the simulation was measured by the end of play using the simulation’s scoring procedure and was based on net income, return on sales, and return on assets, and their learning was measured with multiple-choice questions and short-essay questions. The researchers found that although there was no significant relationship between simulation performances and learning, learning is meaningfully occurred from simulation play, especially when participants perceived their teams to be wellorganized. The results showed that students begin to master the skills and concepts presented, Game Evaluation 14 and that the simulation is a valid learning methodology. Also, Schank (1997) found that computer simulation games could be used to help adults to learn business skills. Furthermore, an examples of simulations used in military settings have been documented by O’Neil and Andrews (2000). As the researchers indicated, simulations have been used in aircrew training and assessment as an important tool, whether for individual or teams. “Simulations, games, and other experience-based instructional methods have had a substantial impact on teaching concepts and applications during this period.”(Ruben, 1999, p. 498). Also, researchers pointed out that simulations and games are widely accepted as a powerful alternative of traditional ways of teaching and learning, with the merits of facilitating learning by doing (e.g., Mayer, et al., 2002; Rosenorn & Kofoed, 1998), triggering motivation and enjoyment, and engaging learners in a simulated experience of the real world (Martin, 2000). For example, O’Neil & Fisher (2002) argue that computer games would be cost-effective for many leader development applications and could offer sufficient practice and feedback opportunities if designed appropriately. Finally, Gredler (1996) defines a phrase that means the mixture of games and simulations’ features; that is simulation games or gaming simulation. In this study, game will refer to either Gredler’s definition of game or gaming simulation. Amory (2001) argues that simulation games are more often applied in educational environments than other types of games, since playing simulation games, learners can focused on single goals, with decreased competition between learners and at their own pace. Game Evaluation 15 Theories of Games and Simulations One of the most important supporting theories of games and simulations is experience-based learning (Ruben, 1999). Experience-based learning is an important approach focusing on increasing the student's control and autonomy, an aspect of constructivism. Experience-based learning is the process whereby students learn something by doing it and connecting it with their prior experience, such as hands-on laboratory experiments, studio performances, practical training, etc. Computer games and simulations facilitate experience-based learning, by transporting learners to “another world,” where they are in control of the action, and providing them opportunities for learners to interact with a knowledge domain (Gredler, 1996). As pointed out by Ruben (1999), experience-based instructional methods had the potential to address many of the limitations of the traditional teaching methods (Ruben, 1999). Traditional instructional methods have several limitations such as: learning and teaching are hard to separate; knowledge and skills are not practiced and applied appropriately; learning tends to be individual work while its application occurs outside the classroom is usually social; traditional teaching is lacking in creativity and vividness; finally, the acquisition of problem solving is usually not emphasized. On the other hand, as pointed out by Ruben, experience-based learning instruction is an effective learning approach proved by several empirical studies. It provides more pluralistic and multivariate approaches to learning, and promotes collaboration and interactivity (O’Neil et al., 2002). Most importantly, experiential learning facilitates cognitive thinking and active learning. Further more, as pointed out by Mayer et al. (2002), learning with pictorial aids, such as multimedia and games, students learn by doing, working on realistic tasks instead of learning by solely being told by teachers. Game Evaluation 16 Furthermore, Adams (1998) points out that a game satisfies learners’ visual and auditory sensor and provides flexibility in learning, which makes it an attractive tool for teaching and learning, based on the perspectives of constructivism (Amory, 2001) and dualcoding theory (Mayer & Sims, 1998). The learning of constructivism must be active; teachers should guide the learners in the construction of mental models, and the guidance is based on the individual learner's background knowledge. Based on the concept of constructivism, new knowledge is constructed by a learner with his/her unique background knowledge and beliefs, by making sense of the knowledge, in multiple ways and in various contexts. In addition, constructivism learning is both an active and a reflective process, triggered by social interaction; it is internally controlled and mediated by the learner (Bruning, Schraw, & Ronning, 1999). As pointed out by researchers (e.g. Amory, 2001; Stolk, Alexandrian, Gros, & Paggio, 2001), a game player does not study a particular domain but becomes part of the scenario, therefore promotes active and meaningful learning, and stimulating self-regulation in learning. For instance, Stolk et al. (2001) argued that instructional simulations for teaching environmental crisis management provide alternative of exercises, since it is usually very expensive and dangerous to practice in real situations. Experience-based learning is one practical way of integrating constructivist methods into instruction. Experience-based instruction is assumed to be better than traditional teaching, since learners form a skill or acquire knowledge by doing. The other reason that games provide effective alternatives to traditional lectures is that they can facilitate learning (e.g., Adams, 1998; O’Neil & Fisher, 2002; Ruben, 1999). Simulation based learning is an effective way to learn and apply knowledge and skills quickly. Further more, previous research on emerging technologies have shown that computer-based learning enhances Game Evaluation 17 problem solving (e.g. Dugdale, 1998; Mulqueen, 2001; Poedubicky, 2001) and decision making skills (Poedubicky, 2001), and studies on transfer of learning (Ritchie & Dodge, 1992) have revealed that simulations facilitate the transfer more effectively than traditional methods of instruction (e.g., Adams, 1998; Fery & Ponserre, 2001; Mayer et al., 2002). The significant results of the empirical study conducted by Dugdale (1998) are one of the examples of computer-based learning enhancing students’ mathematical problemsolving. The researcher conducted this study through 15-day systematic observation of participants’ use of technology to approach the problems assigned; each use of a computer was recorded, along with the role of each participant in the computer use and a description of how the computer was applied to the problem. For example, the rated computer appropriateness of each day’s problem-solving assignment may be Y/appropriate or N/not appropriate, the comfort level and number of every participant’s applications of computers and problem-solving were recorded. In addition, participants’ maximum role in computer problem solving was rated by “non, passive, active, and central”, four levels. While gaining experience of applying advantage of computer investigation methods, participants also increased learner-initiated applications of technology to problem solving and effective computer use. In Mulqueen’s study on the effects of computer-based training on teachers’ interdisciplinary problem-solving ability, the self-reported level of problem-solving skills were improved in the second year of training. In addition, it was found that teachers became more willing to accept new learning challenges. In the study conducted by Ritchie and Dodge (1992), a computer microworld was used to simulate symbolic, physical phenomena. When high school students interacted with this simulated environment, they were able to Game Evaluation 18 grasp the key principles. Their test performance was improved, and their team work was fostered, and their subject skills across the curriculum were improved. Other researchers such as Alessi (2000a; 2000b) points out four critical elements of an instructional game, which are knowledge attributes, learners’ attributes, simulation attributes, and learners’ encoding, representing and using knowledge. There are two ways to acquire instructional games and simulations for research purposes; one of them is to buy off-the-shelf software, and another way is to develop them (Alisse, 1998; Alessi, 2000b; Amory, 2001; O’Neil & Fisher, 2002). Game Selection Researchers pointed out that play associated with games, is an important construct of learning (Quinn, 1994; Rieber, 1996). Rieber (1996) argued that a game may be a more meaningful way to present microworlds to learners than a simulation, and Amory argued that an effective instructional game should be pertinent to the learning objectives. According to O’Neil and Fisher (2002), a game designer and a trainer characterize a computer game in terms of different models or specifications. The former characterizes a game in terms of: (1) type of platform that supports the game, (2) type of players, (3) the contractor, (4) genre of the game, (5) purpose of the game, and (6) key milestones. However, a researcher or trainer, characterizes a game in terms of different five specifications regarding domain knowledge to be learned, which are (1) learning objectives, (2) training use, (3)learners, (4) practice, and (5) feedback. Very often, a training game lacks objectives since most of the games are created for fun. Customers have to generate the training applications from the game they choose according to their objectives and game specifications, and evaluate the effects by themselves. Game Evaluation 19 For example, in the three-phased game feasibility study conducted by Wainess and O’Neil (2003), the researchers selected three appropriate games among more than five hundred onthe-shelf games in order to use one of them as a platform for research on cognitive and affective issues related to games in education. The researchers (Wainess & O’Neil, 2003) managed the selection process based on the research needs and learning objectives of problem-solving. According to the objectives and needs, the games selected for further consideration should have several characteristics, such as adult oriented, single user play, suitable for problem solving research, etc. In some previous studies on games, the researchers have argued that games effects learning only when the appropriate games are selected and tailored for training (e.g., Baird & Silvern, 1999; Dawes & Dumbleton, 2001; Rabbitt, Banerji, & Szymanski, 1989). In Rabbitt et al.’s study about their empirical research on a videogame’s training effects, participants were trained to apply two different instructional games to practice. It is found that participants’ practicing different games resulted in distinct results of the IQ test. Rabbitt et al. concluded that a videogame could be tailored to be an efficient training tool for complex industrial and military tasks. Another example of game selection is the empirical study conducted by Dawes & Dumbleton (2001). The researchers conducted the study on game selection to support some aspects of learning in schools. In this study, eleven computer purchased games were considered based on several factors, such as technical issues, language comprehension, content suitability, teacher’s role, time constraint, and types of feedback. In addition, what types and amount of guidance should be given to learners should be considered when applying multimedia and gaming as teaching aids (e.g., Mayer, et al., 2002). Game Evaluation 20 Design of Games and Simulations It is very difficult to create an instructionally effective game because the comprehensive design paradigms derived from learning principles and well-designed empirical research on instructional simulations and games are lacking. Nevertheless, Quinn developed a game design model supported by educational theories to design a game based on system or users, which encompasses entertaining factors and the procedure to design a game. However he does not collect any data as to its effectiveness. Amory, Naicker, and Vincent (1999) established the Game Object Model for game design, which includes components that promote educational objectives and computer interfaces. Amory (2001) also points out that the development of an instructional game is composed of three perspectives, which are, the research it is based on, the development of resources, and software components. Resource development here includes activities, such as tools/ software selection, story line conception, object placement, image generation, game page creation, game player analysis, and game level testing; software development here means to develop game page editor, playback engine and puzzle creation. Based on these, Amory developed Zadarh, an educational adventure game, which addresses misconceptions held by biology students and presents information that could foster discussion and other interactive learning activities. It was found that students who played the game performed approximately the same in multiple choice questions of biology test than those who did not play the game, although the difference was not significant. In addition, Martin (2000) points out that, purpose, reality, timeline, feedback, decisions, participants, role, and close match of the simulation/game with the learning objectives are main elements to consider when designing a game. Game Evaluation 21 To design a game for training or instruction, the first issue is to find out the learning goal and objectives, which is not only is as the guideline to follow, but also as the criteria its feedback and assessment will be based on (Alessi, 2000c; Stolk et al., 2001). Since different goal/objectives require different types of feedback and assessment measures, game developers should design feedback forms and types, and assessment tools to find out if the game really helps learners or trainees achieve the learning goal and objectives, and its efficiency. For example, if the training/learning goal is to increase learners’ problem-solving ability, then each element of problem-solving ability including content understanding, problem-solving strategies, and self-regulation (O’Neil, 1999) should be taught and assessed in the game context (O’Neil, 2003). Furthermore, training goals may also affect self-efficacy, achievement, and the use of self-regulatory strategies in learning (Schunk & Ertmer, 1999). Thus, there are three other essential factors to consider in order to design effective instructional games and simulations: (1) the structure is designed to reinforce learners’ objective knowledge and skills (also, Stolk et al., 2001) (2) learners’ prior knowledge (also, Stolk et al., 2001), and (3) the complexity of problem solving (Gredler, 1996). (4) types and amount of guidance given to learners (e.g., Mayer, et al., 2002). In the study conducted by Stolk et al. (2001), researchers developed a simulation game to support the training of environmental crisis management. The gaming was developed according to learners’ relevant prior knowledge, and the game scenarios and crisis designed in the gaming resemble the those happen in the natural environment. Training Effectiveness of Games The potential power of computer games on training and instruction has been drawn the attention of educators for decades (e.g., Donchin, 1989; Malone, 1981; Quinn, 1991; Ruben, Game Evaluation 22 1999; Thomas & Macredie, 1994). Games have been applied in various subjects such as geography (Mayer et al., 2002; Moreno & Mayer, 2000; Tkacz, 1998), law, business (e.g., King & Morrison, 1998; Shank, 1997), physics (e.g., White & Frederiksen, 1998), and therapeutic situations (Ruben, 1999). As pointed out by O’Neil and Fisher (2002), computer games have been found beneficial for instruction and training due to their four characteristics: “(a) complex and diverse approaches to learning processes and outcomes; (b) interactivity; (c) ability to address cognitive as well as affective learning issues; and perhaps most importantly, (d) motivation for learning.” (p6). Although the main purpose of computer games has been entertainment, there are more and more people who apply computer games and simulations for training and instruction. As argued by Quinn (1991; 1996), computer games are effective tools for training problem-solving, since computer adventure games are a part of many learners’ experience, they are motivating, they are enjoyed by a wide range of age group, and provide engaging and familiar environment where problems are embedded; the games are a source of information on the question of what strategies subjects bring to bear to solve the problems. Further, computer games and simulations have been used to develop workers’ financial and banking skills in business settings (e.g., Adams, 1998; Faria, 1998; Lane, 1995; Wabush & Gosen, 2001). In Adams’s empirical study on a computer simulation game’s effectiveness in urban geographic education, students were evaluated by how they performed in the simulation game, essay and multiple choice questions. For example, one of the questions was “What do you think SimCity teaches people about urban processes?” He found that students who used the simulation game became cognitively aware of urban geographic problems, and became more curious about the urban fabric and the complicated repercussions of changes in the urban system in the real world. Furthermore, more than one third of the Game Evaluation 23 students wrote in their project with a new appreciation of urban planning and difficulties of managing urban funds. He points out that an urban geography class at a State University of New York rated the simulation their favorite project of the semester when compared to nine other projects of conventional lecture/exam class, and further, due to the attractive graphics and flexibility of the urban simulation model, the game has become an attractive tool for teaching urban geography and planning concepts. Unfortunately, the researcher did not mention in the report whether the results in the study were statistically significant. The military sector also uses simulation-based games to train flight and combat skills, and even to recruit new members (Chambers, Sherlock, & Kucik, 2002; O’Neil & Andrews, 2000; Rhodenizer, Bowers & Bergondy, 1998), and other researchers (Galimberti, Ignazi, Vercesi, & Riva, 2001) found that a networked videogame enhanced social interaction and cooperation. The effects of instructional games and simulation can be generally divided into five categories: promotion of motivation, enhancement of thinking skills, facilitation of metacognition, enhancement of knowledge, and building of attitude (O’Neil & Fisher, 2002). Promotion of motivation. Motivation is the psychological feature that arouses, directs, and maintains an organism to action (Woolfolk, 2001). Motivation has been found to have positive influence on performance (e.g., Clark, 1998; Emmons, 2000; Ponsford & Lapadat, 2001; Rieber, 1996; Urdan & Midhley, 2001; Ziegler & Heller, 2000). Ricci, Salas and Cannon-Bowers (1996) pointed out that dynamic interaction, competition, and novelty are three characteristics of computer-based gaming that contribute to its motivational appeal, and these three characteristics can produce significant differences in learner attitude. Furthermore, O’Neil and Fisher pointed out that Game Evaluation 24 computer games provide diversity, interactivity, importantly, and motivation for learning, and therefore have been applied in the instruction in different sectors, such as business (e.g., Adams, 1998; Washbush & Gosen, 2001), military (e.g. O’Neil & Andrews, 2000) and academic sectors (e.g. Adams, 1998; Amory, 2001; Amory, Naicker, Vincent, & Adams,1999; Barnett, Vitaglione, Harper, Quackenbush, Steadman, & Valdez, 1997; Ricci, et al., 1996; Santos, 2002). Further more, Amory (2001) indicates that games and simulation can not only combine theoretical concepts and practice, but also trigger intrinsic motivation and selfregulated learning. Several previous researchers pointed out the use of computer games increased learners’ intrinsic or extrinsic motivation (e.g., Amory,1999; Quinn, 1996; Rieber, 1996); the former is associated with inner feeling while the latter is triggered by external factors such as rewards and punishments (Woolfolk, 2001). In addition, Malone (1981) pointed out that intrinsic motivation is significant for problem-solving; a task can not be accomplished when the intrinsic motivation is absent, even though a learner works his best. He further pointed that games possess the characteristics of challenges and elements of fantasy, therefore trigger players’ motivated and interests in the game world. For instance, the results of a study conducted on six games’ instructional effects by the British Educational Communications and Technology Agency (Dawes & Dumbleton, 2001), showed that the use of computer games in instruction enhances students’ motivation. Learners in this study were observed to work positively and to persist in their engagement with the software, continuing their work after lesson times. For example, some of the games were found used voluntarily at breaks and lunchtimes, and it was found that learners, who started with easy levels volunteered to move on to more difficult levels, which required extra Game Evaluation 25 time and effort. Also, the research conducted by Amory, Naicker, Vincent, and Adams (1999) showed that participants were intrinsically motivated by playing computer games, especially the simulation and adventure games. In other articles (Amory, 2001; Quinn, 1996; Rieber, 1996), the authors argue that play associated with games triggers intrinsic motivation. For example, Rieber (1996) points out that the challenges, curiosity, fantasy, and controllability of games trigger intrinsically motivating learning. Enhancement of thinking skills Thinking skills are skills of information processing, reasoning, enquiry, creative thinking, learning strategies and evaluation skills (Dawes & Dumbleton, 2001; O’Neil, 1978). According to previous studies, computer games are assumed to enhance thinking skills. For example, in the study conducted by Mayer et al. (2002), the researchers used transfer test to measure the impact of Profile Game, a computer simulation game on geology learning. It was found that the computer game helped improve geology students’ thinking skills and visualspatial thinking in geology, and facilitate learning by doing. The study revealed that the computer game helped students think like geologists. Quinn measured learners’ usage of problem-solving strategies by transcriptions of the verbal protocols, and the computer traces; the number of attempts that each participant made to solve each problem, and the number of times that each character died, and the number of “go’s” that a participant made are recorded by the computers (1991). The transcripts were examined for evidence of the subjects’ strategy use of four categories, which are “recall”, “cause”, “trial and error”, and “other”. Quinn suggested that game’s problem-solving environment could be used not only to investigate the cognitive skills involved but also as an environment within which to learn and practice these skills. Game Evaluation 26 Dawes and Dumbleton’s (2001) reported a study on six computer games, “Age of Empires”, “Brain Teasing Games”, “Championship Manager”, “City Trader”, “Sim City 3000”, and “The Sims”, that were assigned to different schools respectively. According to the observation results, all of the games were found to enhance thinking skills or problem solving skills, including information processing, reasoning, enquiry, creative thinking and evaluation skills. In the report, the researchers concluded that if the level of challenge and type of game was appropriate for the students, their problem-solving and critical thinking skills could be facilitated. For example, SimCity, a computer simulation game, and Age of Empires, a real time strategy game, were found complex and flexible enough to enable students to apply different strategies, and require them to think about the interaction of a range of variables logically. In Adams’ (1998) empirical study regarding teaching urban geography with a computer simulation game, students were asked to write down the changes of the city they were in charge of and the amount of money in their coffer after playing the game following the hints built into the game (experiment B) and then after playing the game without following those hints (experiment C) to evaluate the results. The students were then assessed with essay and multiple-choice question. It was found that simulations enhanced students’ computer literacy, knowledge of geographical phenomena and processes, and their ability to critique a city’s development from different aspects such as the social, political, philosophical, scientific, and economic situations. However, the researcher did not mention whether this result was significant. Game Evaluation 27 In addition, computer games’ effects on improving reasoning skills, facilitating complex problem-solving, and enhancing transfer of learning were documented in previous articles (e.g. Adams, 1998; Crisafulli & Antonietti, 1993). Also, there is evidence, which shows playing computer games improve cognitive processes since it increases flexibility and variety of the knowledge representations, such as visual and auditory representation. For example, Okagaki and Frensch (1994) conducted a study on Tetris, a video game, using undergraduate students, none of whom had prior experience with the game, and assessed participants’ mental rotation, spatial visualization, and perceptual speed with paper-and-pencil test before and after playing the game. Using pencil-and-paper tasks, the researchers found that spatial-oriented video games have the potential to improve late adolescents’ mental rotation and spatial visualization skills, and bring players cognitive benefits during their playing. However, the positive results were only significant for male but not for female participants. Also, in two experiments conducted by Greenfield, DeWinstanley and Kirkpatrick (1994), participants’ divided attention was measured with choice reaction time in a luminance detection task, using response time to targets of varying probabilities at two locations on a computer screen. The significant results in the two experiments showed that video games strengthen strategies of divided attention, which implied that computer games can be applied to train a task that requires monitoring of multiple visual locations. Facilitation of Metacognition. Woolfolk (2001) defined metacognition as knowledge about our own thinking processes, which includes three kinds of knowledge. First, declarative knowledge about strategies to learn, to memorize, and to perform will. Second, procedural knowledge about Game Evaluation 28 how to use the strategies, and third, conditional knowledge about when and why to apply the former two kinds of knowledge. On the other hand, O’Neil and Abedi (1996) defined that metacognition as planning and self-checking, and it enables people to utilize various strategies to accomplish a goal. It has been shown in previous studies that metacognition facilitates knowledge and skills acquisition (e.g., Pirolli & Recker, 1994). Playing computer games not only has the potential benefits of enhancing metacognitive skills (Baird & Silvern, 1999; Bruning et al., 1999; O’Neil & Fisher, 2002; Pillay, Browlee, & Wilss, 1999). For instance, Bruning et al. (1999) and Pillay, Browlee, and Wilss (1999) found in their qualitative studies that game playing offers players an opportunity to apply metacognitive skills. When playing a game, players checked their own action, activated their schemata, found out relation and connection, and formed hypotheses. The researchers claimed that the frequent monitoring of thinking by game players is an application of metacognitive approach. Improvement of knowledge Knowledge includes domain-specific knowledge and domain-general knowledge, both of which include declarative and procedural knowledge (Brunning, Schraw, & Ronning, 1999). While domain-specific knowledge is helpful for a specific subject or activity, general knowledge is used for a very wide range of activities. Declarative knowledge is organized factual knowledge, and procedural knowledge is “knowing how” knowledge that facilitates a specific activity. As Brunning, Schraw, and Ronning (1999) pointed out, general knowledge is complementary to domain knowledge; however, their roles shift among task focus. Several studies have shown evidence that computer games can enhance learning and retention of knowledge. For instance, Westbrook and Braithwaite Game Evaluation 29 (2001) provided evidence that a health care game was an effective tool in improving learning outcomes, such as information-seeking skills and factual knowledge. In the study conducted by Westbrook and Braithwaite (2001), the researchers applied pre and post self-reported questionnaires consisted of learner demographics, learners’ reaction toward the game, learners’ knowledge of health system, and learners’ experience with computer games, to evaluate a health care simulation game, which was designed to promote information-seeking skills and the interaction the health system. Comparing the pre and post survey responses, the researchers found that participants’ domain knowledge of the health system, Medicare and private insurance significantly higher than before the game. In the study conducted by Ricci et al. (1996), it was found that the military trainees who were presented subject matter or chemical, biological, and radiological defense in computer-gaming form scored significantly higher in multiple-choice retention test then those who were presented the subject matter in paper-based form. The researchers used a trainee reaction questionnaire containing five statements with a 5-likert scale on the training task, and found significant positive correlations between reaction and retention test score. That is, participants who "(a) perceived their form of study as enjoyable, (b) felt they learned a lot about CBD during their training, and (c) felt confident that they would remember what they learned during training " tended to score significantly higher on the retention score than those who did not. In addition, Betz (1995-1996) also found that students who learned by both reading a text and playing computer simulators about the planning and management of a complex city system scored significantly higher in the Game Evaluation 30 examination, consisting of multiple choice and true/false questions, than those learned by only reading the text, even though the examination questions were based on the content and application of the text only. Also, Fery and Ponserre (2001) found that skills learned by playing a golf video game can be significantly transferred to actual golf playing, when the video game provides reliable demonstrations of actual putts and when players have the intention to learn golf. The subjects may simply enjoy playing the video game or use it to improve their knowledge and skills of playing a real-world golf game by analogizing the knowledge and skills acquired when playing the video game with the situations of virtual golf game. The participants' golf playing knowledge and skills were measured by experimenter, who indicated the correct posture and gave alignment references. The distances of the actual putts and the direction of the error in the force of putts during pre and post tests were collected and compared. The results showed that participants who played the simulation golf game with an intention to learn golf, significantly outperformed participants who played the video game only for entertainment, and than participants who did not play the video game. In Adams’s (1998) research on a computer simulation game’s educational effectiveness on urban geographic, the game helped students develop computer literacy, and knowledge of geographical phenomena and processes. Amory et al (1999) measured pre- and post-game, test of multiple-choice questions to measure students’ knowledge of environmental biology learned after game playing. The difference between students’ pre- and post-test results was not significant. Game Evaluation 31 Another example is found in an empirical study on a computer game used to train cadets at an Israeli Air Force flight school (Gopher, Weil, & Bareket, 1994). Transfer effects from game training to actual flying were tested during several flights from the transition stage to the high-performance jet trainer. The outcomes were analyzed based on the two types of knowledge: the first type is specific skills involved in performing the game. The second type of knowledge is the general ability of trainees to cope with the high processing and response demands of the flight task and teach better strategies of attention. Results showed for game skills that that the training-with-game group performed significantly better than the trainingwithout-game group in test flights. Gopher et al. (1994) concluded that the game maintaind its relevance and was easier to generalize when variables were changed or new tasks with a similar context were encountered, therefore was integrated with the regular training program of the Air Force. Santos (2002) developed a simulation game to help students understand the monetary policy. In the study, the researcher gave a survey to the participants after the completion of the internet-based, interactive teaching aid that introduces undergraduate students to the domestic and international consequences of monetary policy. According to the outcome of the survey, 91 percent of the students believed that their participation in the game improved their understanding of central bank policy and its effects on a global economy, and 90 percent of students felt that need to include the simulation game in the money and banking course. Furthermore, the additional written comments at the end of the survey also fully supported these findings. Building attitudes Game Evaluation 32 Attitudes are commonly viewed as summary evaluations of objects (e.g., oneself, other people, issues, etc.) along a dimension ranging from positive to negative (e.g., Petty, Priester, & Wegener, 1994). For the evaluation on attitude toward computer game, the Computer Game Attitude Scale, CGAS, which evaluates student attitudes toward educational computer games, has been created to assist computer game designers and teachers in the evaluation of educational software games. Chappell and Taylor (1997) further found the evidence of its reliability and validity. In the two studies conducted by Westbrook and Braithwaite (2001) and Amory et al (1999) learner attitudes were measured with questionnaires. Comparing pre and post questionnaires, Westbrook and Braithwaite found learners’ interest in the health care system was significantly enhanced after completing the game. A study by Adams (1998) showed that the most important learning associated with using computer games is not the learning of facts but rather the development of certain attitudes acquired through interaction with software (e.g., becoming aware of the complexity of a task, developing respect for decision makers in the real world, and developing humility toward accomplishing the task). In this study, the participants’ attitude was measured by open-ended questions, and was found changed positively and significantly. For example, students’ answers of the questions revealed that their interest, appreciation and respect for urban planning and planners were promoted. In addition, Wellington and Faria (1996) found that when playing LAPTOP, a marketing simulation game specifically designed for use in introductory marketing courses, participants’ attitudes were changed significantly. In their study, participants’ changes in attitudes were measured along with each of their decision made in the game. Game Evaluation 33 Summary A game is a rule-governed, goal-focused, activity incorporating principles of gaming and computer-assisted instruction; and a simulation game is one that consists of real settings, participants’ roles, playing rules, and scoring systems. Simulations, games, and other experience-based instructional methods have had impact on teaching concepts and applications during this period. Despite games and simulations’ potential power in instruction and training, research on their training effectiveness is limited; therefore, more analysis and studies on their evaluation need to be conducted. Alessi (2000a; 2000b) points out four critical elements of an instructional simulation game, which are, knowledge attributes, learners’ attributes, simulation attributes, and learners’ encoding, representing and using knowledge. There are two ways to apply instructional games and simulations; one of them is to buy off-the-shelf software, and another way is to develop them. There are four criteria for media selection: simulation of all necessary conditions of the job setting; sensory-mode information, feedback; and the cost. Amory (2001) points out that the development of an instructional game is composed of three elements, which are the research to be based on, the development of resource, and software components. The effects of computer games on training and instruction have been found beneficial in some cases for instruction and training due to some of their characteristics. These effects can be generally divided into five categories: promotion of motivation, enhancement of thinking skills, facilitation of metacognition, enhancement of knowledge, and building of attitude. Game Evaluation 34 Evaluation Evaluation is the process of determining achievement, significance or value; it usually involves decision-making about performance and about appropriate strategies after prudent appraisal and study (Woolfolk, 2001). Evaluation is the analysis and comparison of current progress or outcome and prior condition, often in order to improve the program or make further decision; it can be conducted on persons or a program; it answers the questions “How well did we do?” “How much did we accomplish?” “Should we go on?” and “What should we improve?” based on the specific goal/objectives or standards (e.g. O’Neil, et al., 2002; Quinn, Alem, & Eklund, 1997). As pointed by Quinn et al. (1997), for example, when learning with technology is designed, there are educational objectives intended to be achieved, so the “learning effectiveness assessment” offers a means of measuring the attainment of the objectives “against a set of both design and acceptance criteria.” Models of Evaluation Based on the timing, content, usage and purpose of the information collected, an evaluation is categorized into two types: the first one is summative evaluation: to verify the value and merits of the program itself. The second type is a formative evaluation: to identify and correct problems and thereby improve the program. However, some researchers argue that all evaluation is formative evaluation since uncovered drawbacks of a program often result in making changes to it (O’Neil, et al., 2002). On the other hand, Peat and Franklin (2002) claimed that it is beneficial and effective to adopt a mix of formative and summative evaluation using on-line computer-based assessment. The following section is devoted to the description of different models of evaluation. Summative Evaluation Game Evaluation 35 The most common method of evaluation is summative evaluation (Scriven, 1967), which judges the value of a program at the end of it, with an emphasis on its final outcome. For example, Peat and Franklin (2002) pointed that many universities have introduced computer-based assessment for summative evaluations, with the purpose to provide immediate outcome of a program. Conducting a summative evaluation, the evaluators compare participants’ performance and the application before and after a program, and analyze its cost and effect. As pointed out by Kirkpatrick (1994) the summative evaluation verifies the worth and merits of the training itself, placing emphasis the overall results of a program in terms of its performance levels, time, and cost-effectiveness. For example, in the aspect of education, it may be conducted to investigate the effects and efficiency of a new educational program (Dugard & Todman, 1995). The evaluators may further compare the outcome of that particular program with other alternatives, analyze the results, and then make decisions or changes for the future. Otherwise, they may make comparisons between participants who receive the treatment (program) and those who do not receive the treatment (program) or who receive a different treatment (program); that is, using a comparison group. The drawback of summative evaluation is that it is not helpful for the diagnosis of problems which occur in the process or formation of a program. That is, if the outcome of a summative evaluation found not ideal, summative evaluation won’t offer ways to find out what the problem is and what to do to make the improvement. As pointed out by O’Neil et al (2002): “Given that this state is most common in early stages of development, comparative, summative-type evaluations are usually mis-timed and may create an unduly negative Game Evaluation 36 environment for productivity. Furthermore, because summative evaluation is typically not designed to pinpoint weaknesses and explore potential remedies, it provides almost no help in the development/improvement cycle that characterizes the systematic creation of new methods.” (p.15) Summative evaluation is typically followed by a formal report about the cost and effect of a program. For instance, an evaluation will usually report what activities bring what participants what influence. This information is to decide whether the program is worth carrying on in terms of the costs and effectiveness. For example, a summative evaluation was conducted by Morris (2001). The researcher designed a summative evaluation on a computer-assisted learning program which was designed to be used by students to improve their understanding of psychology. As pointed out by the researcher, the summative evaluation recorded detail data of the students’ interaction with the learning program, such as their actions at the interface, the screens that they had visited, their responses to the learning activities and the specific feedback that they received for each activity. The results of the pre-test and post-test control group design of this summative evaluation were considered in order to find out the effects of the computerassisted learning program and to compare them with the effects of the paper-based instruction. Also, an implication was made that the development of the computer-assisted learning program should involve formative evaluation which should be conducted to ensure easy usage of the program for students and provide information for improvement of the learning materials. Formative Evaluation Game Evaluation 37 A formative evaluation is typically conducted at the outset or during the development of a program, and its purpose is to judge the worth of a program while the program is forming and provide information for the developer to improve that program and its process (Baker & Alkin, 1973; Baker & Herman, 1985; Kirkpatrick, 1994; O’Neil et al., 2002). As O’Neil et al. (2002) pointed out, a formative evaluation focuses on the effectiveness of the development process, refer to which the program developer can decide whether similar approaches may be also feasible and efficient. It helps program developer or manager to find out whether the program is working out as planned, and uncover any obstacles, barriers or unexpected opportunities. Therefore, unlike the summative evaluation conducted typically only at the end of the program for the overall results. O’Neil also points out that the purpose of formative evaluation method is to identify the possibility of success and failure of each part and element of the program; “this approach requires that data be developed to permit the isolation of elements for improvement and, ideally, the generation of remedial options to assure that subsequent revisions have a higher probability of success.”(2002, p15). In addition, some researchers (e.g., Baker & Alkin, 1973; Baker & Herman, 1985; Scriven, 1967) argue that formative evaluation is conducted to provide information for the development of a program or internal use by program managers; it is a structured method that provides program staff with additional feedback about their work in order to fine tune the implementation and ensure the success of the program. For example, Barootchi and Keshavarz (2002) conducted formative evaluation on English as a foreign language learners by establishing evaluation portfolio to assess their progress, achievement and reaction, in order to keep the evaluation an ongoing activity for program planning and improvement. Game Evaluation 38 In some cases, the formative evaluation may be conducted before a program is implemented formally, and feedback will be collected from the participants many times during the development of the program in order to revise it as needed. As pointed out by O’Neil et al. (2002), “Interactive formative evaluation would be accomplished during the project, not at the completion.” In others, formative evaluation may be conducted throughout the life of a program as guidance for continuous program improvement. When a formative evaluation is conducted for a training, or instructional program, the evaluator’s role is as the “quasi third-party,” who must be familiar with the objectives, procedures and limitations of that program, so the evaluation is of deeper level for program improvement, not simply an outcome assessment (O’Neil et al., 2002). Kirkpatrick’s Four-Level Evaluation One of the most popular models of evaluation is Kirkpatrick’s (1996) four–level evaluation (Arthur, Tubre, Paul, & Edens, 2003). The Level 1 of the model is to evaluate reaction; that is, to measure users or students’ feelings (e.g. Mehrotra, 2001; Naugle, Naugle & Naugle, 2000; Weller, 2000) about a program, or what we call “customer satisfaction.” According to Kirkpatrick, negative feelings toward a program reduce its effects, so the positive result of this level’s evaluation is the prerequisite of the program. The Level 2 of the four-level model is to evaluate learning; in other words, to evaluate the degree to which participants have obtained the required materials or objective knowledge, and the extent to which participants have changed their attitudes (e.g. Mehrotra, 2001). The third level is to evaluate behavior/application, which is their ability to transfer what they have acquired from the program to the real life or to practical use (e.g., Salas, 2001). The purpose of Level 4 evaluation is to find out the results of the program (e.g., Salas, 2001). The results Game Evaluation 39 include the impact of the program on the organization, such as improved quality, decreased costs, reduced mistakes, increased profits, higher return on investment (ROI), vice versa. Level 1 evaluation is usually conducted with self-report questionnaire by the participants (e.g. Adams, 1998; Ricci et al., 1996; Salas, 2001). As pointed out by Kirkpatrick (1994), there are some reasons to conduct level 1 evaluation. First, adults feel interested and learn better when they can relate the program to their prior experience; on the contrary, they feel bored or even reluctant to go on when they feel the program irrelevant. Second, confusion can be discovered using the evaluation. Third, it has potential for pointing out missing content (e.g. Weller, 2000). Fourth, it can find out if participants feel engaged. Fifth, it can gauge participants’ overall feelings about the program (e.g. Weller, 2000). Although learners or trainees’ favorable feeling toward the program does not ensure learning (e.g. Arthur et al, 2003; Forsetlund, Talseth, Bradley, Nordheim, & Bjorndal, 2003), it does influence the possibility of whether a program will be supported or implemented in the future (Kirkpatrick, 1994). For example, as pointed out by Arthur et al (2003), students’ ratings of instructors’ teaching effectiveness have received a great deal of attention in psychological and educational literature. In addition, Marsh and Roche (1997) found that teaching effectiveness and learners’ reaction are positively correlated. Further more, in a study conducted by Mehrotra (2001) to evaluate a training program to increase faculty productivity in aging research, it was found that participants’ satisfaction with the training program motivated them to conduct research. Another example of Level 1 evaluation is a research conducted by Weller (2000). In his study, student reaction to a computer-mediated communication tutor group for university distance learning course was examined with Game Evaluation 40 questionnaires. The result not only showed participants’ feeling about the program but also revealed both the program’s benefit and drawback such as low level of tutor involvement. Motivation was measured with questionnaire in the study. Ricci et al. (1996) used a trainee reaction questionnaire containing five statements with a 5-likert scale on the training task, and found significant positive correlations between reaction and retention test score; participants who "(a) perceived their form of study as enjoyable, (b) felt they learned a lot about CBD during their training, and (c) felt confident that they would remember what they learned during training " scored significantly higher on the retention score than those who did not. Furthermore, in this study, participants who received the training in computer-based form performed significantly than those who received the training in paper-and-pencil form, and participants who received the training in computer-based form had better reaction to the training program. Level 2 evaluation is based on the predefined objectives; evaluators have to make sure there is no confusion between after-program-performance and on-the-job performance. It can be conducted through performance tests such as role-playing, paper-and-pencil check list tests (e.g., Henderson, Klemes, & Eshet, 2000; Mayer, 2002; Mayer & Moreno, 1998; Mayer & Wittrock, 1996), multiple choice, matching and test sheets, and computer-basedknowledge maps (e.g., Baker & Mayer, 1999; Chuang, 2003; Herl et al., 1999; Mayer, 2002; Schacter et al., 1999; Schau & Mattern, 1997). However, developing an effective test with validity and reliability is challenging. Game Evaluation 41 Furthermore, Salas (2001) pointed that assessment of learning via attitude change is the most popular form of assessing learning. For example, Salas (2001) assessed the aviators’ positive attitudes changed toward the aircrew training program with pre and post training self-report questionnaire, and pointed out that learning could also be evaluated by measuring knowledge learned by trainees according to the predefined criteria. The researcher further pointed out that multiple measures provide stronger evidence of learning outcome. Mehrotra’s (2001) research on the research training program is also considered an example of evaluation on changed attitude, one component of level 2 evaluation. In the study, it was found that the training program has energized participants and enhanced their motivation to increase faculty productivity in aging research, Level 3 evaluation is a complicated evaluation designed to ensure that the program or training has a positive influence on job performance (e.g. Harrell, 2001). It can be conducted through one-on-one interviews or questionnaires. The latter one is costly but gives us the most useful information. On the other hand, Salas (2001) pointed out that studies that gathered behavioral data tended to use a combination of various tools, such as behavioral observation forms, behavioral checklists, analysis of crew communication, and peer or selfevaluations/reports. In Salas’s (2001) study, the researcher also assessed if aviators transfer the behaviors that was previously learned to the operation in the Cockpit. The aviators’ behavioral data was collected. As pointed out by Salas, the most common method of assessing behavioral change was measuring behaviors related to the training objectives while participants performed in a simulation environment. Further, researchers (e.g. Naugle, Naugle, & Naugle, 2000) pointed out that educational settings and state departments of education are already addressing the issue about whether students who had completed the Game Evaluation 42 program used this new skills and knowledge on the real-world situation. The researchers (Naugle, Naugle, & Naugle, 2000; Salas, 2001) also pointed out the behavior should be assessed before and after the program, and could be assessed not only by the program implementer, but also by the participants, using self or peer assessment. Level 4 evaluation-results, is the highest level of evaluation in Kirkpatrick’s (1994) evaluation framework, and is the most complicated one, therefore, despite its high value, very few evaluations were conducted at this level (Salas, 2001). Salas pointed out the difficulty of collecting information for program results “in terms of time, resources, identification of a clear criterion, and low occurrences of accidents and mishaps” (p. 651). When conducting level 4 evaluation, evaluators are looking for evidence instead of a direct and simple result, since we may find evidence that the program has influence on the organization by comparing pre- and post- test or experimental and control groups, to recognize the possibility that some variables could have contributed to the result. The evidence of positive results may be higher productivity, increased sales, reduced costs, improved quality, etc, but we may not be certain that one is the only cause. For example, Naugle, Naugle, and Naugle pointed out that in educational settings, the desired results are often less explicit or measurable, therefore evaluators need to look for evidence. Also, applying Kirkpatrick’s framework for evaluating an aircrew training program (Salas, 2001), the researcher reviewed 58 published accounts of the training program to determine its cost-effectiveness, and found the results uncertain. However, as pointed out by the researcher, several evidence showing the effectiveness of the program had been found, such as reduction in accident rates. Due to the timing and attributes of the four levels, level 1 and 2 of Kirkpatrick’s evaluation are typically applied as formative evaluation conducted in the process and Game Evaluation 43 happening of a program or a plan, while level 3 and 4 are usually applied as summative evaluation conducted at the end of it to find out the final results. In addition, an evaluation is not definitely conducted at all of the levels of. Evaluators may apply only one or two levels as needed (Blanchard, Thacker, & Way, 2000; Kirkpatrick, 1996). For example, in Harrell’s (2001) report about the evaluation of training effectiveness, he emphasized only on the third level to find out if trainees behaved differently on the job after a training program. Further more, as pointed out previously in the section of games’ training effectiveness in this article, Arthur et al. (2003) applied only reaction and learning evaluation, Level 1 and Level 2 evaluation in their empirical study regarding a simulation game’s training and testing effectiveness of students’ visual attention. Game Evaluation Despite the rush to embrace instructional game, there has been a lack of evaluation, whether summative or formative evaluation; there is limited evidence as to the training effectiveness of games, so the framework of evaluation on the learning results of games needs to be built up (O’Neil & Fisher, 2002; Quinn, 1997; Ruben, 1999). For an instructional game evaluation, a summative evaluation will be conducted to find out whether the game will last for a specific period of time or is intended to be constantly adapted and upgraded as new software options occur, and all of other induced cost, then compare its uncovered outcome and impact, and finally find out its return on investment, ROI. The study done by Parchman, Ellis, Christinaz and Vogel (2000) is among the a few studies which are helpful in designing evaluation of an instructional game. In their study, the researchers conducted a formative evaluation on four alternatives of instructional methods to teach navy electronic technology: Game Evaluation 44 “computer-based instruction”, instruction of “computer-based adventure game”, the traditional “classroom instruction”, and instruction of “computer-based drill and practice”. The evaluation of the effectiveness outcomes was limited to Kirkpatrick's Level two evaluation. Participants' subjective knowledge was assessed with 40-item multiple-choice test, and their motivation was assessed with motivation questionnaire. However, the evaluation results of the group of training with game were not better than that of other three groups. The researchers pointed out that some game elements of challenge, fantasy, and curiosity may detract from, rather than enhance, the instruction. The study conducted by Westbrook and Braithwaite (2001) is also one of the a few studies which are helpful for designing a game evaluation. In the study, the researchers applied pre and post questionnaires consisted of learner demographics, learners’ reaction toward the game, learners’ knowledge of health system, and learners’ experience with computer games, to evaluate a health care simulation game, which was designed to promote information-seeking skills and the interaction the health system. In addition, games with different goal/objectives should be evaluated with different assessment measures, to find out if the game really helps learners or trainees achieve the learning goal and objectives, and its efficiency (O’Neil, et al., 2002; Quinn, 1996). For example, if the training/learning goal is to increase learners’ problem-solving ability, the measures to assess problem-solving ability including content understanding, problem-solving strategies, and self-regulation can be applied (O’Neil, 1999). The evaluation of a game can be formative evaluation or summative, according to the needs and purposes of the evaluation. If the evaluation is to identify and correct problems and Game Evaluation 45 thereby improve the game, the formative evaluation should be conducted. However, if the purpose is to verify the value and benefits of the game, then a summative evaluation should be made. For example, O’Neil et al. (2002) developed a framework for formative evaluation on games. This framework is for evaluation conducted during the process of a project, not after it is implemented to find out the outcome. As seen in Table 2, the procedure involves multiple steps. The formative evaluation starts with examining if the design of game is consistent with its specifications and end with the time period of revisions. Activities 4 to 9, as pointed by O’Neil et al., involve new data collection. The researchers indicate that the framework would be modified based on the need to provide useful and timely information to the developers. Table 2 Formative Evaluation Activity, adapted from O’Neil et al., (2002) 1. Check the system design against its specifications 2. Check the design of assessments for outcome and diagnostic measurement against specifications. Design and try out measures 3. Check the validity of instructional strategies embedded in the system against research literature 4. Conduct feasibility review with the instructors. • Are right tasks being trained? – 5. Review to be conducted with instructors Conduct feasibility tests with the students Game Evaluation 6. • One-on-one testing with protocol analysis • Small-group testing 46 Assess instructional effectiveness. • cognitive – e.g., does it improve domain knowledge (e.g., information technology skills), transfer problem-solving skills, self-regulation? • affective – • 7. e.g., does it improve self-efficacy? are there differential effects for identifiable subgroups Does more game-based training lead to better game performance (e.g., loss/exchange ratio) • need to track a player’s performance across multiple games 8. Do experts and novices differ in performance? 9. Does more training lead to better performance? 10. Revise based on 1-9 activities. Conducting an evaluation for an instructional game is challenging, since it needs to uncover whether separate components created by developers and course designers interact appropriately and effectively when combined altogether. Further more, the evaluators must understand the training objectives and focuses of the evaluation. As pointed out by O’Neil et al. (2002), “Evaluation is a mix of requirements driven and technology push factors. For example, if it is requirements-driven, then the training objective/assessments are critical; on the other hand, if it is technology driven, then fun/challenge/fantasy issues are critical” (O’Neil et. al., 2002). In addition, the evaluation on a computer game needs to find out if all Game Evaluation 47 facilities work well together and how long they will last, considering the relevant cost, including software, hardware, maintenance, update fee, and etc., to find out the relationship of cost and effect. Summary Evaluation is the process of determining achievement, significance or value; it is the analysis and comparison of current progress or outcome and prior condition based on the specific goal/objectives or standards. While summative evaluation focuses on outcomes and is typically conducted at the end of a program, formative evaluation is normally conducted throughout the program to evaluate the process of a program development. Kirkpatrick’s (1994) four–level evaluation model is the most common used model. The four levels are reaction, learning, behavior/application and results. There is limited evidence as to the training effectiveness of games for adults, so the framework of evaluation on the learning results of game needs to be developed. Since different goal/objectives and games should be evaluated with different assessment measures, game developers should design appropriate assessment tools to find out if the game really helps learners or trainees achieve the learning goal and objectives, and its efficiency. For example, this study will evaluate a game with training/learning goal to increase learners’ problem-solving ability, so the measures to assess problem-solving ability including content understanding, problem-solving strategies, and self-regulation can be applied (O’Neil, 1999) in this study. In a pilot study, we will use O’Neil’s framework of formative evaluation, and apply the first two levels of Kirkpatrick’s (1994) evaluation model. In the main study, we will evaluate the impact of a game on problem solving discussed in the following section. Game Evaluation 48 Problem solving Definition of Problem Solving Problem solving is cognitive processing directed at achieving a goal when no solution method is obvious to the problem solver (Mayer & Wittrock, 1996, Simon, 1973). According to Baker and Mayer (1999), it has four characteristics, which are cognitive, process-based, directed, and personal. Baker and Mayer (1999) further indicated that problem-solving has four steps. The first step is “problem translation”, the problem solver identifying available information and translating it in the situation where the problem occurs, into his/her mental model. The second step is “problem integration”; the problem solver putting together the pieces of information into a structure. The last two steps of problem-solving are “solution planning” and “solution execution”, developing a feasible plan and implementing it to solve the problem. The first two components constitute the problem representation phase of problem-solving while the latter two components constitute the problem solution phase of problem solving. Further, Sternberg and Lubart (2003) explained that analytic part of human’s intelligence initially recognizes and structures problems, and evaluates the ideas that occur during the process of problem solving, while the practical part of the intelligence is to figure out which ideas may work well and which ideas will further result in good ideas. The National Center for Research on Evaluation, Standards, and Student Testing (CRESST) concluded that problem-solving ability is composed of three elements, which are content understanding, problem-solving strategies, and self-regulation. The elements and their hierarchical order can be seen in Figure 1. Game Evaluation 49 Figure 1 National Centre for Research on Evaluation, Standards, and Student Testing (CRESST) model of problem solving. Problem Solving Content Understanding Problem Solving Strategies Domain Specific Problem Solving Strategies Domain Independent Problem Solving Strategies Self-Regulation Metacognition Motivation Planning Effort Self-Monitoring Self-Efficacy In addition, problem solving strategies can be further categorized into two types, which are domain independent and domain specific problem solving strategies. The former ones refer to general strategies such as the application of multiple representation, mental simulation, and analogy to problems. As pointed out by researchers (van Merrienboer, Clark, Game Evaluation 50 & de Croock, 2002), cognitive schemata enable problem solvers to solve a new problem by serving as an analogy. On the other hand, domain specific problem solving strategies are taskdependent strategies (O’Neil, 1999) that guide problem solving in the domain by reflecting the way problems may be solved effectively (van Merrienboer, et al., 2002). The examples of using task-dependent strategies are using Boolean search strategies in a search task, applying a computer language to write a computer program, or applying equation solving strategies to solve a math problem (O’Neil, 1999). Baker and Mayer (1999) explain that domain-specific aspects of problem solving strategies are those that are unique to specific subject or field such as geometry, geology, or genealogy, and those involve the specific content understanding, procedural knowledge, and discourse in the subject domain. The third element of problem-solving is self-regulation, which includes two subcategories, metacognition and motivation; the former one further composes self-checking and self-planning, and the later one is composed of effort and self-efficacy. Self-efficacy refers to cognitive judgments of one’s capabilities within a specific domain or a specific task, and it affects the achievement in that domain or of that task (Bong & Clark, 1999). The significance of individual problem solving skills Previous researchers have pointed out the significance of problem solving (e.g. Cheung, 2002; Mayer, 1998, O’Neil, 1999). For example, O’Neil’s (1999) pointed out that problem solving skills have been suggested by many researchers to be among those critical competences required of students, college graduates, and employees. Cheung (2002) pointed out that problem-solving ability is important for a person's psychological and social functioning. Mayer (1998) argues that cognitive, metacognitive, and motivational skills are required for successful problem solving Game Evaluation 51 in academic settings. Further more, due to the rapid technological change, not only schools and colleges but also companies will transform into learning organizations, where individual problem solving skills are critical. For example, according to previous research, there is evidence that problem solving skills impact the bottom line, therefore high-paying jobs are usually those required higher level thinking skills (O’Neil, 1999). Therefore, as pointed out by researchers (e.g., Mayer, 2002; Moreno & Mayer, 2000), promoting problem-solving transfer has become one of the most important educational objectives. In Moreno and Mayer’s study (2000), for example, it was found that personalized messages in a multimedia science lesson produced better performance of problem-solving transfer and retention. Mayer (1998) also points out that although routine problem solving has been promoted successfully, educators need to spend more efforts on teaching non-routine problem solving skills. As a result educators need an assessment program that tests validly and efficiently how much students have leaned (retention) and how well they are able to apply it (transfer) (e.g., Day, Arthur & Gettman, 2001; Moreno & Mayer, 2000). Further, in the article by Dugdale (1998), the author also points out that recent literature about education has emphasized problem solving as a focus of school mathematics. Assessment of Problem Solving There is substantial previous research which reveals the significance of problem solving for all of the studies, institutes, workforces, and tasks and teaching for problemsolving transfer is as a result an important educational objective. However, an assessment framework that is valid and efficient need to be built up, and the methods to assess problemsolving skills still need to be refined (Mayer, 2002; O’Neil & Fisher, 2002; O’Neil, et al., 2002). For example, assessing students by giving them a test of separate and unconnected Game Evaluation 52 multiple choice questions, teachers are not accurately assessing students’ problem-solving abilities, and traditional standardized tests do not report to teachers or students what problemsolving and thinking processes they should provide emphasis on. Although we can find useful measures for problem solving competence in the cognitive science literature such as think-aloud protocols (e.g., Day, Arthur, &Gettman, 2001), those measures, however, are inefficient performance assessments, requiring extensive human scoring and a great amount of time (O’Neil, 1999). According to Baker and Mayer (1999), two aspects of problem-solving ability need to be tested as a whole, which are retention and transfer. Retention involves what learners have retained or remembered from what they have been presented, while transfer involves how much learners can apply the learned knowledge or skills in a brand new situation; retention is tested with routine problems, which are problems that learners have learned to solve, and transfer is tested with non-routine problems, which are problems that learners haven’t solved in the past (Mayer, 1998). According to the researchers, the assessment of problem-solving transfer should be the current emphasis of education, since learners need not only memorize the materials, but also to apply them in a novel situation or in the real world. In addition, problem solving ability may be assessed by checking the entire process when a task is being solved or the final outcome, “contrasting expert-novel performance.” Also, Day et al. (2001) pointed out that the more similar novices’ knowledge structures are to experts’ structures, the higher the level of novices’ skills acquisition is. The National Center for Research on Evaluation, Standards, and Student Testing (CRESST) has developed a problem-solving assessment model composed of content Game Evaluation 53 understanding, problem-solving strategies, and self-regulation, the three elements of problem-solving ability. The model is illustrated as the following: Measurement of Content Understanding Mayer and Moreno (1998) assessed content understanding with retention and transfer questions. In their study on the split-attention effect in multimedia learning, they gave participants retention test and matching test, containing questions designed to assess the extent to which participants remembered the knowledge delivered by the multimedia with animation and narration, or animation and on-screen text. An alternative way to measure content understanding is knowledge maps. Knowledge maps have been used as an effective tool to learn complex subjects (Herl et al., 1996) and to facilitate critical thinking and (West, Pomeroy, Park, Gerstenberger, & Dsndoval, 2000). Several studies also revealed that knowledge maps are not only useful for learning, but also a reliable and efficient measurement of content understanding (Herl et al., 1999; Ruiz-Primo, Schultz, & Shavenlson, 1997). For example, Ruiz-Primo et al. (1997) proposed a framework for conceptualizing knowledge maps as a potential assessment tool in science. Students need to learn how to locate, organize, discriminate between concepts, and use information stored in formats to make decisions, solve problems, and continue their learning when formal instruction is no longer provided. A knowledge map is a structural representation that consists of nodes and links. Each node represents a concept in the domain of knowledge. Each link, which connects two nodes, is used to represent the relationship between them; that is, the relationship between the two concepts. As Schau and Mattern (1997) point out, learners should not only be aware of Game Evaluation 54 the concepts but also of the connections among them. A set of two nodes and their link is called a proposition, which is the basic and the smallest unit in a knowledge map. Ruiz-Primo et al. (1997) claimed that as an assessment tool, knowledge maps are identified as a combination of three components: (a) a task that allows a student to perform his or her content understanding in the specific domain (b) a format in regard to the student’s response, and (c) a scoring system by which the student’s knowledge map could be accurately evaluated. Chuang (2003) modified their framework to serve as an assessment specification using a concept map. Table 3 lists the elements and characteristics of the knowledge maps identified in her study. Table 3 Domain Specifications Embedded in Chuang’s (2003) Study Adapted from Hsieh’s (2001) Study General Domain This Software Specification Scenario Create a knowledge map on environmental science by exchanging messages in collaborative environment and by searching relevant information from simulated World Wide Web environment Game Evaluation Participates Student team (two members) Leader The one who does the knowledge mapping Searcher The one who accesses the simulated World Wide Web environment to find relevant information and ask for feedback Knowledge map terms Predefined – 18 important ideas identified by content (Nodes) experts: atmosphere, bacteria, carbon dioxide, climate, consumer, decomposition, evaporation, food chain, greenhouse gases, nutrients, oceans, oxygen, photosynthesis, producer, respiration, sunlight, waste, and water cycle Knowledge map terms Predefined – 7 important relationships identified by (Links) content experts: causes, influences, part of, produces, requires, used for, and uses Simulated World Wide Contains of over 200 Web pages with over 500 images Web environment and diagrams about environmental science and other topic areas Training All students went through the same training section. 55 Game Evaluation The training included the following elements: • how to construct the map • how to search • how to communicate with the other group member Feedback Feedback is based on comparing group’s knowledge map performance to that of expert’s map performance Adapted knowledge Including knowledge of response feedback, messages of response feedback about how much improvement students have accomplished in current map compared with previous map will be provided, but does not contain search strategy for electronic information seeking. Representation Task-specific adapted knowledge of response feedback Graphics plus text Including knowledge of response feedback, messages about how much improvement students have accomplished in current map compared with previous map will be provided as well as the useful search strategy 56 Game Evaluation for electronic information seeking. Representation Graphics plus text Timing of feedback Both feedback used in this study can either be the immediate or the delayed feedback because the feedback accessing is controlled by the searchers. Type of Learning Collaborative problem solving Problem solving measures Knowledge map Content understanding and structure :(a) semantic content score; (b) the number of concepts; and (c) the number of links Information Seeking Browsing and searching Feedback The number of times students request feedback for their knowledge maps Self-regulation Team processes Planning, self-checking, self-efficacy, and effort Adaptability, coordination, decision making, interpersonal, leadership, communication 57 Game Evaluation 58 Researchers have successfully applied knowledge maps to measure students’ content understanding in science whether for high school students and adults (e.g., Chuang, 2003; Herl et al., 1999; Schacter et al., 1999; Schau et al., 2001). Schau et al. (2001) used selectand-fill-in knowledge maps to measure secondary and postsecondary students’ content understanding of science in two studies respectively. In the first study, the result of students’ performance on the knowledge map correlated significantly with that on a multiple test, a traditional measure (r= .77 for eighth grade and r=. 74 for seventh grade). According to the research result, knowledge map is therefore an assessment tool with validity. In the other study, Schau et al compared the results of knowledge maps with both traditional tests of multiple choice and relatedness ratings. Further, the mean of map scores increased significantly, from 30% correct at the beginning of the semester (SD=11%) to 50% correct at the end (SD=19%). At last, the correlation between knowledge map scores and multiple choice test scores, and the correlation between concept scores and relatedness ratings assessment were high. Recently, CRESST has developed a computer-based knowledge mapping system, which measures the deeper understanding of individual students and teams, reflects thinking processes in real-time, and economically reports student thinking process data back to teachers and students (Chung et al., 1999; O’Neil, 1999; Schacter et al., 1999). The computer-based knowledge map has been used in at least four studies (Chuang, in preparation; Chung et al., 1999; Hsieh, 2001; Schacter et al., 1999). In the four studies, the map contained 18 concepts of environmental science, and seven links of relationships, such Game Evaluation 59 as cause, influence, and used for. Students were asked to create a knowledge map in computer-based environment. In the study conducted by Schacter et al. (1999) students were evaluated by creating individual knowledge map, after searching the simulated world wide web. On the other hand, in the studies conducted by Chung et al. (1999), Hsieh (2001), and Chuang (2003) two students constructed a group map cooperatively through the networked computers, and their results showed that using networked computers to measure group processes was feasible. An example of a concept map is shown in Figure 2 and 3. As seen in Figure 2, the screen of computer was divided into three major parts. The numbered buttons located at the lower right part of the screen are message buttons for communication between group members; all predefined messages were numbered and listed on the handouts distributed to participants. When a participant clicked on a button on the computer screen, the corresponding message would be shown instantly on his/her and his/her partner’s computers simultaneously. The lower left part of the screen was where messages were displayed in the order sent by members. As seen in Figure 2, the top-left-hand part was the area where the map was constructed. Figure 2 User Interface for the System Game Evaluation 60 Game Evaluation 61 As seen in Figure 3, in this system (e.g. Chuang, in preparation; Hsieh, 2001), only a leader can add concepts to the knowledge map and make connection among concepts by clicking the icon of “Add Concept” on the menu bar and pressing the “Link” button respectively. There are 18 concepts of environmental science under “Add Concept” such as “atmosphere”, “bacteria”, “carbon dioxide”, and “water cycle”, and seven link labels (i.e., causes, influence, part of, produces, requires, used for, uses). A leader was asked to use these terms and links to construct a concept map using the computer mapping system. In addition, the leader could move concepts and links to make changes to the map. On the contrary, a searcher in each group could seek information from the Simulated World Wide Web environment and access feedback regarding the result of their concept map. Therefore, to construct a concept map successfully, both of the searcher and leader of a group must collaborate well. Figure 3 Add Concepts and Links Game Evaluation 62 Measurement of Problem Solving Strategies Problem solving strategies can be categorized as domain-independent/general and domain-dependent/specific (Alexander, 1992; Bruning, Schraw, & Ronning, 1999; O’Neil, 1999; Perkins & Salomon, 1989). Domain-specific knowledge is the knowledge about a particular field of study or a subject, such as the application of equations in a math question, the application of a formula in a chemistry problem, or the specific strategies to be successful in a game. On the other hand domain-general knowledge is a broad array of knowledge that is not linked with a specific domain, such as the application of multiple representations and analogies in a problem-solving task or the use of Boolean search strategies in a search task (e.g. Chuang, in preparation). CRESST has created a simulated Internet Web space to evaluate problem solving strategies such as information searching strategies and feedback inquiring strategies (Herl et al., 1999; Schacter et al., 1999). In the study conducted by Schacter et al, they found that students’ problem-solving strategies such as information browsing, focused searching, and feedback require improved significantly from the pretest to posttest. Mayer and Moreno (1998) conducted a study on the split-attention effect in multimedia learning and the dual processing systems in working memory, and assessed participants’ problem-solving strategies with a list of transfer questions. The results of their experiments showed that students who received concurrent narration describing the target pictures performed better on transfer tests than those who received concurrent on-screen text involving the same words; in terms of their content understanding and problem-solving strategies. Game Evaluation 63 Measure of Self-Regulation According to Brunning, Schraw, and Ronning (1999), some researchers believe that self-regulation include three core components: metacognitive awareness, strategy use, and motivational control. An alternative framework, according to CRESST’s model of problem solving (O’Neil, 1999) self-regulation is composed of metacognition and motivation. Metacognition encompasses two subcategories, which are planning and selfchecking/monitoring (Hong & O’Neil, 2001; O’Neil & Herl, 1998; Pintrich & DeGroot, 1990), and motivation is indicated by effort and self-efficacy (Zimmerman, 1994; 2000). O’Neil and Herl (1998) developed a trait self-regulation questionnaire examining the four components of self-regulation. Of the four components, planning is the first step because one must have a plan to achieve the proposed goal. In addition, self-efficacy is one’s belief in his/her capability to accomplish a task, and effort is how hard one would like to work on a task. In the trait self-regulation questionnaire developed by O’Neil and Herl (1998) planning, self-checking, self-efficacy, and effort are assessed using eight questions each. The reliability of this self-regulation inventory shown in previous studies (Hong & O’Neil, 2001). For example, in the research conducted by Hong and O’Neil (2001), the reliability estimates (coefficient α) of the four subscales of self-regulation, planning, self-checking, effort, and self-efficacy are .76, .06, .83, and .85 respectively, and the research also provided the evidence of construct validation. While the self-regulation questionnaire is used by CRESST, another measure, think aloud, is also used to assess self-regulation (Winne & Perry, 2000), in that, participants speak out their thinking process when solving a problem. The documented data are then Game Evaluation 64 analyzed psychologically, and the potential underlying thought processes are induced (Manning, Glasner, & Smith, 1996) For example, in the study conducted by O’Neil and Abedi, where metacognitive is considered “conscious and periodic self-checking of whether one’s goal is achieved and, when necessary, selecting and applying different strategies” (p3-4), the researchers developed a framework to assess state metacognition directly and explicitly. In addition, state metacognition assessed in this study is considered situation-specific, and vary rapidly. The framework developed is a set of self-reported, domain-independent, and found the framework reliable and valid. To evaluate problem-solving ability, previous researchers (e.g., Baker & Mayer, 1999; Baker & O’Neil, 2002; Mayer, 2002; O’Neil, 1999) further points out that computerbased assessment has the merit of integrating validity to generate test items and the efficiency of computer technology as a means of presenting and scoring tests. Summary Problem solving is cognitive processing directed at achieving a goal when no solution method is obvious to the problem solver. In addition, problem solving strategies can be further categorized into two types, which are domain independent and domain specific problem solving strategies. Also, self-regulation includes two sub-categories, metacognition and motivation; the former one further composes self-checking and self-planning, and the later one further compose effort and self-efficacy. Knowledge map is not only useful for teaching and learning, but also a reliable and efficient measurement of content understanding, in addition, CRESST has created a simulated Internet Web space to evaluate problem solving strategies such as information Game Evaluation 65 searching strategies and feedback inquiring strategies. Finally, while CRESST’s selfregulation questionnaire assesses self-regulation, another measure, think aloud, is another method assess domain dependent metacognition Computer-based problem-solving assessments are economical, efficient and valid measures that employ contextualized problems that require students to think for extended periods of time and to indicate the problem-solving heuristics that they were using and why; provide students access to information to solve the problem with, and offer detailed feedback to teachers, students and their parents about individual student’s problem-solving processes. Summary of the Literature Simulations, games, and other experience-based instructional methods have had a substantial impact on teaching concepts and applications during this period. Despite games and simulations’ potential power in instruction and training, research on their training effectiveness is limited; therefore, more analysis and studies on their evaluation need to be conducted. There are two ways to apply instructional games and simulations; one of them is to buy off-the-shelf software, and another way is to develop them. There are four criteria for media selection: simulation of all necessary conditions of the job setting; sensory-mode information, feedback; and the cost. On the other hand, Amory (2001) points out that the development of an instructional game is composed of three elements, which are the research to be based on, the development of resource, and software components. The effects of computer games and simulations on training and instruction shown in previous studies can be generally divided into five categories: promotion of motivation, enhancement of thinking skills, facilitation of metacognition, enhancement of knowledge, and building of attitude. Game Evaluation 66 Evaluation is the process of determining achievement, significance or value; it is the analysis and comparison of current progress or outcome and prior condition based on the specific goal/objectives or standards. While summative evaluation focuses on outcomes and is typically conducted at the end of a program, formative evaluation is normally conducted throughout the program to evaluate the process of a program development. In addition, Kirkpatrick’s (1994) four–level evaluation model is the most common model applied in different fields such as in business and academic settings. The four levels are reaction, learning, behavior/application and results. There is limited evidence as to the training effectiveness of games for adults, so the framework of evaluation on the learning results of game needs to be found out. Since different goal/objectives and games should be evaluated with different assessment measures, game developers should design appropriate assessment tools to find out if the game really helps learners or trainees achieve the learning goal and objectives, and its efficiency. For example, this study will evaluate a game with training/learning goal to increase learners’ problem-solving ability, so the measures to assess problem-solving ability including content understanding, problem-solving strategies, and self-regulation can be applied (O’Neil, 1999). In this study, we will apply the first two levels of Kirkpatrick’s (1994) four-level evaluation model for formative evaluation Problem solving ability is one of the most critical skills for working or learning and in almost every setting. However, its assessment measures still need to be further refined. Knowledge map is not only useful for teaching and learning, but also a reliable and efficient measurement of content understanding, in addition, CRESST has created a simulated Internet Web space to evaluate problem solving strategies such as information searching strategies Game Evaluation 67 and feedback inquiring strategies. Finally, while CRESST’s self-regulation questionnaire assesses self-regulation, another measure, think aloud (e.g., Day, Arthur, &Gettman, 2001), is another method assessing domain dependent metacognition. Further more, computer-based problem-solving assessments are economical, efficient and valid measures that employ contextualized problems that require students to think for extended periods of time and to indicate the problem-solving heuristics that they were using and why; provide students access to information to solve the problem with, and offer detailed feedback to teachers, students and their parents about individual student’s problem-solving processes. Game Evaluation 68 CHAPTER III METHODOLOGY Research Hypothesis Research Question: Will participants increase their problem-solving ability after playing a game (i.e. SafeCracker)?. Research Design The research will consist of a pilot study and a main study. The pilot study will focus on a formative evaluation. The main study will focus on the impact of the game on problem solving. Pilot Study A pilot study is a small-scale trial conducted before a research with a purpose to develop and examine the measures or procedures that will be used in the main study (Gall, Gall, & Borg, 2003). There are several advantages of conducting a pilot. First, a pilot study permits a preliminary testing of the hypotheses that lead to more precise hypotheses in the main study, and bring researchers new ideas or alternative measures unexpected before the pilot study. In addition, it permits a complete examination of the planned research procedures and reduces the number of treatment errors. Also, researchers may obtain feedback from the participants of the pilot study (Isaac & Michael, 1997). In previous studies on games’ effects, researchers conducted an initial trial to examine the utility of the objective software and to determine whether the computer environment/interface is understandable by the subjects (e.g. Amory, 1999; Greenfield, et al., Game Evaluation 69 1994; Quinn, 1991). For example, Quinn (1991) used an adventure game similar to the one to use in the main study in the pilot experiment to find out the applicability of the adventure game and the need for revision. In the preliminary study conducted by Amory et al. (1999), researchers tried to find out appropriate game type and its characteristics appropriate for education, and based on the result, the researchers designed an educational game for further study. There are four purposes of pilot study in this dissertation. First, it will be used to assess the functionality of the computer system. Second, it will be used to determine whether the environment was feasible and understandable for the participants. Third, it will be used to assess if the predicted time is suitable for participants to construct the knowledge map, play game and finish the tests. Finally, the pilot study will be conducted to find out participants’ feeling toward the game and the whole process. (e.g. Amory, 1999; Quinn, 1991). Formative Evaluation For this study, the researchers will apply the framework of formative evaluation as a pilot study (O’Neil, et al., 2002). According to O’Neil et al., to find out the feasibility of a program of educational technology and improve the program by offering information on its implementation and procedure. The study will follow a modified version of the O’Neil methodology to conduct a formative evaluation of a game as seen in Table 4 Table 4 Formative Evaluation Activity (adapted from O’Neil, et al., 2002) 1. Check the game design against its specifications 2. Check the design of assessments for outcome and measurement. Design and try out measures Game Evaluation 3. 70 Check the validity of instructional strategies embedded in the game against research literature 4. 5. Conduct feasibility review with the students. • Review to be conducted with students • Small-group testing (n=3-5) Implement revisions. Participants The participants of the pilot study will be four college students at the University of Southern California, aged from 20-35. The pilot study will be conducted after receiving approval of USC Review of Human Subjects. All participants will be selected to have no experience of playing SafeCracker. Puzzle-Solving Game The required characteristics of the computer game selected for this study needs to be as the following: 1) adult oriented, single user play and suitable for problem solving research, since this research is to find out a game’s effect on an adult’s problem-solving ability. 2) The game should be one that participants can learn how to play in a couple of minutes since this study is an initial trial that won’t be prolonged for a long time. 3) The selected game should be able to be replayed many times, at least in one hour. The selection of SafeCracker was based on a study by Wainess and O’Neil (2003). They conducted an evaluation on the research feasibility of potential 525 video games of three categories: puzzle games, strategy games, and educational games. The appropriate game was than searched among puzzle games, due to their properties and since they provide appropriate Game Evaluation 71 platform for studying games’ effectiveness of enhancing problem-solving ability. A participant in a puzzle-solving game is placed in a specific setting or story background, and tries to reason out possible task procedure and consequences. Failure to solve a puzzle previously encountered may result in future problems in the game. Their criteria include the following: the game selected should not be violent or appeal to one gender more than the other. It should not appeal to people with special interests, either. For example, baseball relevant games or wrestling related games tend to interest male and sports fans more than female and non sports fans. In addition, the game should not favor people with special skills, motor skills, rapid response, or background knowledge. For example, a game which is all about music, specially designed for legal practitioners, or totally biology related, should not be selected. Finally, pacing controlled by players is required, such as Chess and other traditional computer board games. SafeCracker, a puzzle-solving game was the final decision by Wainess and O’Neil (2003) since it facilitates problem solving with right and wrong solutions, and does not require special background knowledge or extraordinary visual-spatial ability. In addition, the pacing of SafeCracker, which is mainly designed for adults, is controlled by players. The other significant reason is that SafeCracker is not as popular as many other potential games. However, according to Wainess and O’Neil (2003), as the most ideal choice of game for this study, SafeCracker has three main drawbacks: (1) It may not be appropriate for testing transfer and retention outside the game per se. (2) Players’ actions within the program can not be tracked. (3) It is impossible to modify the game scenarios. The lack of source code or editor for SafeCracker is a major reason for these drawbacks. Game Evaluation 72 A player in SafeCracker is a candidate for a position as a head of security development at a world famous firm of security systems, therefore needs to accomplish a task given by the boss. The task is to open the safes in a mansion in 12 hours without any help from others. There are 35 safes scattered in about 60 rooms of the mansion. To open all of the safes, the player not only needs to do mathematic calculation, logical reasoning, and trial-and-error guessing, but also has to have good sense of direction and memorization. For example, to open the safe in the Kitchen/room 21, a player needs to solve a math/science problem of temperature conversion, and to crack the safe in the room of Technical Design/room 27, a player needs to solve an electrics/science problem of or circuit/current. However, before solving the problem in the Kitchen/room 21, the player needs to go to the room of Chief Engineer/room 6 to find the conversion diagram for temperatures; before solving the problem in Technical Design/room 27, the player needs to go to the room of Constructor’s Office/room 5 to find the diagram of electric circuit. The player is not offered any tools in advance; by cracking safes one after another, he/she will obtain tools and combinations needed to crack some of the following safes. SafeCracker’s specifications following Wainess and O’Neil’s specification of games are shown in Table 5. Table 5 Games Evaluation Specifications SafeCracker Game Evaluation Purpose/domain Puzzling solving with the focus on logical inference and trial-and-error. Type of game platform PC/CD ROM, Mac Analogous game Pandora’s Box Jewels of the Oracle Commercialization intent Primary Contractor Dreamcatcher Genrea Puzzle Training use Recreational use Length of game Unlimited (except the very beginning part of the game) Terminal learning objectives TBD Players/Learners Candidate for the position of security leader of a major company Type of learningb Problem solving Domain knowledge Math, history, physics, information searching, location/direction, science. Type of play 73 Game Evaluation Time to learnc 74 5 minutes game’s interface game rules Availability of tutorial or other types No of training supported Manual What is user perspective? No First person It is fund Primary Availability of cheats/hints 8 internet sitese Time frame Modern Plan of Instruction No Feedback in game Implicit After Action Review No Nature of practice One scenario per game play Single user vs. multiple user Single user a Action, role playing, adventure, strategy games, goal games, team sports, individual sports (Laird & VanLent, 2001). b Domain knowledge, problem solving, collaboration or teamwork, self-regulation, communication (Baker & Mayer, 1999). c Basic game play, i.e., an educated user, not winning strategies. d Challenge, fantasy, novelty, complexity. e http://www.cheatguide.com/cheats/pc/s/safecracker.shtml. http://www.gamexperts.com/index.php?cheat_id=2178 Game Evaluation 75 http://home.planet.nl/~laan0739/adventure/games/safe.html. http://faqs.ign.com/articles/424/424105p1.html http://fourfatchicks.com/Reviews/Safecracker/Safecracker.shtml http://www.thecomputershow.com/computershow/walkthroughs/safecrackerwalk.htm http://www.balmoralsoftware.com/safecrak/safecrak.htm http://www.uhs-hints.com/uhsweb/safecrkr.php http://www.justadventure.com/thejave/html/Games/GamesS/Safecracker/JAVE_Safecracker Extras.shtml Knowledge Map A knowledge map is a structural representation that consists of nodes and links. Each node represents a concept in the domain of knowledge. Each link, which connects two nodes, is used to represent the relationship between them. As Schau and Mattern (1997) point out, learners should not only be aware of the concepts but also of the connections among them. A set of two nodes and their link is called a proposition, which is the basic and the smallest unit in a knowledge map. Previous studies indicated that knowledge map is reliable and efficient measurement of content understanding (Herl et al., 1999; Ruiz-Primo, Schultz, & Shavenlson, 1997). Ruiz-Primo et al. (1997) suggested that as an assessment tool, a knowledge map is identified as a combination of three components: (a) a task that allows a student to perform his or her content understanding in the specific domain (b) a format in regard to the student’s response, and (c) a scoring system by which the student’s knowledge map could be accurately evaluated. Game Evaluation In this research participants will be asked to create a knowledge map in a computerbased environment and evaluated their content understanding before and after playing SafeCracker and receiving the feedback of domain-specific strategies. Table 6 lists the concept map specification that will be used in this study (modified and adapted from Chuang, in preparation) Table 6 Concept Map Specifications General Domain Specification This Software Scenario Create a knowledge map on the content understanding of Science individually, and by playing SafeCracker, a puzzle-solving game. Participants College students work. Each works on his/her own, doing the knowledge mapping and playing game Knowledge map terms Predefined – 12-15 important ideas identified by content experts: (Nodes) Knowledge map terms (Links) Predefined – 3-5 important relationships identified by content experts: SafeCracker, a puzzlesolving game Contains of over 50 rooms with about 30 puzzles and information about science, mathematics, and other topic areas Training All students will go through the same training The training included the following elements: • how to construct the map • how to play the selected puzzle-solving game Type of Learning Problem solving measures Problem solving 76 Game Evaluation Knowledge map Content understanding and structure :(a) semantic content score; (b) the number of concepts; and (c) the number of links Problem-solving strategy questions Includes questions of problem-solving retention and transfer Self-regulation questionnaire Planning, self-checking, self-efficacy, and effort Feedback Implicit feedback on game 77 Feedback Feedback provides information following an action or a response and allows a learner to evaluate the adequacy of the action/response (Brunning, Schraw, & Ronning, 1999; Kulhavy & Wager, 1993). In addition, feedback has significant influence on learning efficiency, motivation and self-regulation (Bandura, 2001; Nabors, 1999). Based on complexity, feedback can be categorized into three types: knowledge of response feedback, knowledge of correct response feedback, and elaborated feedback (Clarina, Ross, & Morrison, 1991; Dempsey, Driscoll, & Swindell, 1993). While knowledge of response feedback only tells the learner if his/her performance was correct or not, knowledge of correct response shows the learner the correct answer. (e.g. Bangert-Drowns, Kulik, Kulki, & Morgan, 1991; Clark & Dwyer, 1998; Pridemore & Klein, 1995). In addition, feedback can be delivered immediately after learner’s action or delayed for a while (Clariana et al., 1991; Hannafin & Reiber, 1989; Kulhavy & Stock, 1989; Kulik, & Kulik, 1988). Feedback can be presented visually in the formats of graphics or pictures, verbally in the format of texts or words, or simply covert in a program, which is implicit feedback. Also, feedback can be provided directly by the instructor/trainer, by other students, or simply Game Evaluation 78 implied in a program where participants have to estimate the information conveyed and figure out the solution. Additionally, feedback can be outcome feedback or cognitive feedback (Brunning, Schraw, & Ronning, 1999); the former provides specific information about performance, while the latter emphasizes on the relationship between performance and the task. Furthermore, Salas (1993) introduced adapted feedback, personalized feedback in computer-based or computer-assisted a program, which was found more effective than nonpersonalized feedback in learning higher level cognitive items with educational technology (Albertson 1986). For the purpose of this dissertation, the characteristics of the feedback will be: 1) implicit, since the feedback on game-playing strategies will be covert in the game instead of given by the researchers; when a player fails to solve a crack, he/she may estimate that the previous step may be inappropriate, and try a another solution. In SafeCracker, players can not solve the subsequent puzzle unless they do it in the predefined sequence.. 2) The feedback provided in the game is delayed until a player find out a subsequent problem can not be solved. 3) The feedback of the game is implicit feedback, since a player will find out the previous steps were right or wrong when the player tries to crack the subsequent safe. Measure Content Understanding Measure Content understanding measures were computed by comparing semantic content score of a participant’s knowledge map to semantic score of a set of two experts. The experts will be Wainess and Chen, a first example of the knowledge map for one room is shown in Figure 4. The knowledge map was developed by the author as shown in Figure 4 (Chen). The following description shows how these outcomes would be scored. First, the semantic score Game Evaluation 79 was calculated based on the semantic propositions, two concepts connected by one link, in experts’ knowledge map. Every proposition in a participant’s knowledge map would be compared against each proposition in the four experts’ maps. One match would be scored as one point. The average score across all two experts would be the semantic score of the student map. For example, as seen in Table 7, if a participant makes a proposition such as “room”, this proposition is then compared with two experts’ propositions. A score of one meant this proposition was the same with the proposition in one map of an expert. A score of zero means this proposition is not the same with any one of expert’s proposition. Table 7 shows “room contains key” received score one from the first two experts. Then the average score of this proposition would be 0.5. The total average score of each proposition would be the semantic score of a participant’s knowledge map. In our example in Table 7, the total score would be 2. Figure 4 Sample Knowledge Map room Results in contains key Results from crack causes map clue Game Evaluation 80 Table 7 An Example of Scoring Map Concept 1 Links Concept 2 Expert1 Expert2 Average Room contains key 1 1 1 Crack results Key 1 0 0.5 Crack 0 1 0.5 from Clue causes Total 2.00 Domain-Specific Problem-Solving Strategies Measure In this study, the researcher will modify Mayer and Moreno’s (1998) problem-solving question list to measure domain specific problem-solving strategies. In Mayer and Moreno’s (1998) research on the split-attention effect in multimedia learning and the dual processing systems in working memory, participants’ problem-solving strategies were assessed with a set of retention and transfer questions. Mayer and Moreno’s judged a participant’s retention scores by counting the number of predefined major idea units correctly stated by the participant regardless of wording. The example of the answer units for the retention were “air rises”, “water condenses”, “water and crystals fall”, and “wind is dragged downward”. In addition, Mayer and Moreno (1998) scored the transfer questions by counting the number of acceptable answers that the participant produced across all of the transfer problems. For example, the acceptable answers for the first transfer question about decreasing lightning intensity included “removing positive ions from the ground”, and one of Game Evaluation 81 the acceptable answers for question two about the reason for the presence clouds without lightning is that “the tops of the clouds might not be high enough to freeze.” The problem-solving strategies questions designed for this dissertation research will be relevant to the selected safes/problems in SafeCracker, the selected puzzle-solving game. Furthermore, those questions will be regarding the application of the strategies relevant to the puzzles/safes solving/cracking strategies participants may acquire after trying to solve the problems in the rooms pre-selected by the researchers from the 60 rooms of the game. The following are problem solving strategy questions of retention and transfer which will be used in this dissertation research: Retention question: – Write an explanation of how you solved the puzzle in the first room – Write an explanation of how you solved the puzzle in the second room. Transfer questions: – List some ways to improve the play in room 1 – List some ways to improve the play in room 2 – List some ways to improve the fun or challenge of playing the game in room 1 – List some ways to improve the fun or challenge of playing the game in room 2 Participants’ retention scores will be counted by the number of predefined major idea units correctly stated by the participant regardless of wording. The example of the answer units for the retention were “follow map”, “find clues”, “find key”, “differentiate rooms” and “tools are cumulative”. Game Evaluation 82 In addition, participants’ transfer questions will be scored by counting the number of acceptable answers that the participant produced across all of the transfer problems. For example, the acceptable answers for the first transfer question about ways to improve the play in room 1 includes “jot down notes”, and one of the acceptable answers for question three ways to improve the fun or challenge of playing the game in room 1 is that “increase clues needed to crack a safe.” Self-Regulation Questionnaire The trait self-regulation questionnaire designed by O’Neil and Herl (1998) will be applied in this study to access participants’ degree of self-regulation, one of the components of problem-solving ability. There was sufficient reliability of the self-regulation questionnaire, ranged from .89-.94, reported in previous study (O’Neil & Herl, 1998). A total 32 items are composed of eight items of each of the four factors: planning, self-checking, self-efficacy, and effort. For example, item 1 “ I determine how to solve a task before I begin.” is designed to access participants’ planning ability; and item 2 “I check how well I am doing when I solve a task” is to evaluate participants’ self-efficacy. The answer for each item is ranged from almost never, sometimes, often, to almost always. Procedure Time Chart of the Pilot Study Activity Time Introduction 2-3 minutes Self-regulation questionnaire 6-8 minutes Introduction on knowledge mapping 8 minutes Game introduction 5 minutes Game Evaluation Knowledge map (pre) 5 minutes Problem-solving strategy questions (pre) 2 minutes Game playing (room 1 & 2) 20 minutes Knowledge map (post) 5 minutes Problem-solving strategy questions (post) 2 minutes Debriefing 2 minutes Total 83 57-60 minutes Data Analysis According to the outcomes of the pilot study, some modifications will be made for the main study. For example, time may be adjusted whether the participants in the pilot study feel they didn’t have enough time to construct the map. Also, the pilot will examine if the new programming of knowledge mapper works successfully, and if the instructional lesson is appropriate. Further, the researcher will find out the problems of the computer system which may occur during the main study, such as system crash, and make necessary adjustments to them. In addition, the researcher will find out if the problem solving task is interesting for the participants and their feelings. Main Study Method of the Main Study Participants There will be 30 young adults aged from 20-35, participating in this main study. The main study will be conducted at a lab of USC after receiving the approval of USC Review of Game Evaluation 84 Human Subjects. All participants will be selected to have no experience of playing SafeCracker or other puzzle-solving games. Game The same puzzle-solving game, SafeCracker, will be used in the main study; however, some adjustments may be made according to the results of the pilot study. For example, where (in which room) to start the game, the number of rooms, the time allotted for participants to play the game may be adjusted, or the game instruction before participants’ playing the game. Measures Knowledge Map A knowledge map is a structural representation that consists of nodes and links. Each node represents a concept in the domain of knowledge. Previous literature has shown its validity and reliability of assessing content understanding (Herl et al., 1999; Mayer, 2002; Ruiz-Primo, Schultz, & Shavenlson, 1997). The same knowledge maps used in the pilot study will be used in the main study. However, adjustment of the time allowed participants to draw the maps may be made. In the study, subjects will be required to create a knowledge map in a computer mapping test before and after game-playing. The content understanding measures were computed by comparing semantic content score of a participant’s knowledge map to semantic score of a set of two experts (Schacter et al., 1999), the same as it is measured in the pilot study. Domain-Specific Problem-Solving Strategies Measures The same problem-solving strategy questions of retention and transfer modified from Mayer and Moreno’s (1998) problem-solving question list and modified by the researcher, Game Evaluation 85 used in the pilot study will be used in the main study. The problem-solving questions are related to the puzzle-solving strategies for the selected two rooms in SafeCracker. Also, the same scoring system of counting acceptable answers will be used with the main study. Self-Regulation Questionnaire In the main study, subjects’ self-regulation, one of the components of problemsolving ability, will be assessed, using the same self-regulation questionnaire (O’Neil & Herl, 1998) with thirty-two questions used in the pilot study. Procedure The same procedure used in the pilot study (modified) will be used in the main study. Computer-Based Knowledge map Training Participants will be trained how to use computer-based knowledge map, including adding/erasing concepts and create/delete links between concepts. Game Playing The main study of this research will be done on SafeCracker, a computer puzzlesolving game. The participants will be asked to play in two specific rooms in SafeCracker. once after the first drawing of knowledge map, and the other after the second drawing of the knowledge map and the providing of task-specific feedback. Each section of game-playing will last for thirty minutes. Feedback on Game Play Strategies The same implicit feedback in the pilot study will be used in the main study. Game Evaluation 86 Data Analysis The descriptive statistics will be means, standard deviation, and correlation coefficients. The t-test will be used to examine the relationships between outcomes before and after game playing. Using t-test to compare the scores of knowledge mapping and problem-solving checklist before playing SafeCracker, the researcher will find out if playing SafeCracker enhances participants’ problem-solving ability of content understanding and problem-solving strategies. Game Evaluation 87 REFERENCES Adams, P. C. (1998). Teaching and learning with SimCity 2000 [Electronic Version]. Journal of Geography, 97(2), 47-55. Albertson, L. M. (1986). Personalized feedback and cognitive achievement in computerassisted instruction. Journal of Instructional Psychology, 13(2), 55-57. Alessi, S. M. (2000a). Building versus using simulations. In J. M. Spector & T. M. Anderson (Eds.), Integrated and holistic perspectives on learning, instruction technology: Improving understanding in complex domains (pp. 175-196). Dordrecht, The Netherlands: Kluwer. Alessi, S. M. (2000b). Simulation design for training and assessment. In H. F. O’Neil, JR. & D. H. Andrews(Eds.), Aircrew training and assessment (pp. 197-222). Mahwah, NJ: Lawrence Erlbaum Associates. Alessi, S. M. (2000c). Simulation design for training and assessment. In H. F. O’Neil, Jr. & D. H. Andrews (Eds.), Aircrew training and assessment (pp. 197-222)., Mahwah, NJ: Lawrence Erlbaum Associates. Alexander, P. A. (1992). Domain knowledge: Evolving themes and emerging concerns. Educational Psychologist, 27(1), 33-51. Amory, A. (2001). Building an educational adventure game: Theory, design, and lessons. Journal of Interactive Learning Research, 12(2/3), 249-263. Amory, A., Naicker, K., Vincent, J., & Adams, C. (1999). The use of computer games as an educational tool: Identification of appropriate game types and game elements. British Journal of Educational Technology, 30(4), 311-321. Game Evaluation 88 Anderson, C. A., & Bushman, B. J. (2001, September). Effects of violent video games on aggressive behavior, aggressive cognition, aggressive affect, physiological arousal, and prosocial behavior: A meta-analytic review of the scientific literature. Psychological Science, 12(5), 353-358. Arthur, W. Jr, Strong, M. H., Jordan, J. A., Williamson, J. E., Shebilske, W. L., & Regian, J. W. (1995). Visual attention: individual differences in training and predicting complex task performance, Acta Psychologica, 88, 3-23. Arthur, W. Jr., Tubre, T., Paul, D. S., & Edens, P S. (2003). Teaching effectiveness: The relationship between reaction and learning evaluation criteria. Educational Psychology, 23(3), 275-285. Baird, W. E., & Silvern, S. B. (1999) Electronic games: Children controlling the cognitive environment. Early Child Development & Care, 61, 43-49. Baker, E. L., & Alkin, M. C. (1973). Formative evaluation of instructional development. AV Communication Review, 21(4), 389-418. (ERIC Document Reproduction Service No. EJ091462) Baker, E. L., & Herman, J. L. (1985). Educational evaluation: Emergent needs for research. Evaluation Comment, 7(2), 1-12. Baker, E. L. & Mayer, R. E. (1999). Computer-based assessment of problem solving. Computers in Human Behavior, 15, 269-282. Baker, E. L., & O’Neil, H. F. Jr. (2002). Measuring problem solving in computer environments: Current and future states. Computers in Human Behavior, 18(6), 609622. Game Evaluation 89 Bandura, A. (2001). Impact of Guided Exploration and Enactive Exploration on SelfRegulatory mechanisms and Information Acquisition Through Electronic Search. Journal of Applied Psychology, 86 (6), 1129-1141. Bangert-Drowns, R. L., & Pyke, C. (2001). A taxonomy of student engagement with educational software: An exploration of literate thinking with electronic text. Journal of Educational Computing Research, 24(3), 213-234. Barnett, M. A., Vitaglione, G. D., Harper, K. K. G., Quackenbush, S. W., Steadman, L. A., & Valdez, B. S. (1997). Late adolescents’ experiences with and attitudes toward videogames. Journal of Applied Social Psychology, 27(15), 1316-1334. Barootchi, N, & Keshavarz, M. H. (2002) Assessment of achievement through portfolio and teacher-made tests. Educational Research, 44(3), 279-288. Betz, J. A. (1995-96). Computer games: Increase learning in an interactive multidisciplinary environment. Journal of Educational Technology Systems, 24, 195-205. Blanchard, P. N., Thacker, J. W., & Way, S A. (2000). Training evaluation: Perspectives and evidence from Canada. International Journal of Training and Development, 4(4), 295-304. Bong, M., & Clark, R. E. (1999). Comparison between self-concept and self-efficacy in academic motivation research. Educational Psychologist, 34(3), 139-153. British Educational Communications and Technology Agency. Computer Games in Education Project. Retrieved from http://www.becta.org.uk Brunning, R. H., Schraw, G. J., & Ronning, R R. (1999). Cognitive psychology and instruction (3rd ed.). Upper Saddle River, NJ: Merrill. Game Evaluation 90 Chambers, C., Sherlock, T. D., & Kucik III, P. (2002). The Army Game Project. Army, 52(6), 59-62. Chappell, K. K., & Taylor, C. S. (1997). Evidence for the reliability and factorial validity of the computer game attitude scale. Journal of Educational Computing Research, 17(1), 67-77. Cheung, S. (2002). Evaluating the psychometric properties of the Chinese version of the Interactional Problem-Solving Inventory. Research on Social Work Practice, 12(4), 490-501. Christopher, E. M. (1999). Simulations and games as subversive activities. Simulation & Gaming, 30(4), 441-455. Chuang, S., (in preparation). The role of search strategies and feedback on a computer-based collaborative problem-solving task. Unpublished doctoral dissertation. University of Southern California. Chung, G. K. W. K., O’Neil H. F., Jr., & Herl, H. E. (1999). The use of computer-based collaborative knowledge mapping to measure team processes and team outcomes. Computers in Human Behavior, 15, 463-493. Clariana, R. B., Ross, S. M., & Morrison, G. R. (1991). The effects of different feedback strategies using computer-administered multiple-choice questions as instruction. Educational Technology Research and Development, 39(2), 5-17. Clark, R. E. (1998). Motivating performance: Part 1—diagnosing and solving motivation problems. Performance Improvement, 37(8), 39-47. Game Evaluation 91 Clark, K., & Dwyer, F. M. (1998). Effects of different types of computer-assisted feedback strategies on achievement and response confidence. International Journal of Instructional Media, 25(1), 55-63. Crisafulli, L., & Antonietti, A. (1993). Videogames and transfer: An experiment on analogical problem-solving. Ricerche di Psicologia, 17, 51-63. Day, E. A., Arthur, W, & Gettman, D. (2001). Knowledge structures and the acquisition of a complex skill. Journal of Applied Psychology, 86(5), 1022-1033. Dawes, L., & Dumbleton, T. (2001). Computer games in education. BECT ahttp://www.becta.org.uk/technology/software/curriculum/computergames/docs/repor t.pdf Dempsey, J. V., Driscoll M. P., and Swindell, L. K. (1993). Text-based feedback. In J. V. Dempsey & G.C. Sales (Eds.), Interactive instruction and feedback (pp.21-54). Englewood, NJ: Educational Technology publications. Donchin, E. (1989). The learning strategies project. Acta Psychologica, 71, 1-15 Driskell, J. E., & Dwyer, D. J. (1984). Microcomputer videogame based training. Educational Technology, 11-16. Dugard, P. & Todman, J. (1995). Analysis of pre-test-post-test control group designs in educational research. Educational Psychology, 15(2), 181-198. Dugdale, S. (1998). Mathematical problem solving and computers: a study of learnerinitiated application of technology in a general problem-solving context. Journal of Research on Computing in Education, 30(3), 239-253. Game Evaluation 92 Enman, M., & Lupart, J. (2000). Talented female students’ resistance to science: an exploratory study of post-secondary achievement motivation, persistence, and epistemological characteristics. High Ability Studies, 11(2), 161-178. Faria, A. J. (1998). Business simulation games: current usage levels-an update. Simulation & Gaming, 29, 295-308. Fery, Y. A., & Ponserre S. (2001). Enhancing the control of force in putting by video game training. Ergonomics, 44, 1025-1037. Forsetlund, L., Talseth, K. O., Bradley, P., Nordheim, L, & Bjorndal, A. (2003). Many a slip between cut and lip: Process evaluation of a program to promote and support evidence-based public health practice. Evaluation Review, 27(2), 179-209. Galimberti, C., Ignazi, S., Vercesi, P., & Riva, G. (2001). Communication and cooperation in networked environments: An experimental analysis. Cyber Psychology & Behavior, 4(1), 131-146. Gall, M. D., Gall, J. P., & Borg, W. R. (2003). Educational research. An introduction (7th ed.). New York: Allyn & Bacon. Gopher, D., Weil, M., & Bareket, T. (1994). Transfer of skill from a computer game trainer to flight. Human Factors, 36, 387-405. Gredler, M. E. (1996). Educational games and simulations: A technology in search of a (research) paradigm. In D. Jonassen (Ed.). Handbook of Research for Educational Communications and Technology (pp521-540). New York: Macmillan. Greenfield, P.M., DeWinstanley, P., Kilpatrick H., & Kaye D. (1994). Action video games and informal education: Effects on strategies for dividing visual attention. Journal of Applied Developmental Psychology, 15, 105-123. Game Evaluation 93 Hannafin, M. J., & Reiber, L. P. (1989). Psychological foundations of instructional technologies: Part I. Educational Technology Research and Development, 37(2), 91101. Harrell, K. D. (2001). Level III training evaluation: Considerations for today’s organizations. Performance Improvement, 40 (5), 24-27. Henderson, L., Klemes, J., & Eshet, Y. (2000). Just playing a game? Educational simulation software and cognitive outcomes. Journal of Educational Computing Research, 22(1), 105-129. Herl, H. E., Baker, E. L., & Niemi, D. (1996). Construct validation of an approach to modeling cognitive structure of U.S. history knowledge. Journal of Educational Psychology, 89(4), 206-218. Herl, H. E., O’Neil, H. F., Jr., Chung, G., & Schacter, J. (1999) Reliability and validity of a computer-based knowledge mapping system to measure content understanding. Computer in Human Behavior, 15, 315-333. Hong, E., & O’Neil, H. F. Jr. (2001). Construct validation of a trait self-regulation model. International Journal of Psychology, 36(3), 186-194. Hsieh, I. (2001). Types of feedback in a computer-based collaborative problem-solving Group Task. Unpublished doctoral dissertation. University of Southern California. Isaac, S, & Michael, W. B. (1997). Handbook in research and evaluation for education and the behavioral sciences (3rd ed.). San Diego, CA: EdITS. King, K. W., & Morrison M. (1998). A media buying simulation game using the Internet. Journalism & Mass Communication Education,53(3), 28-36. Game Evaluation 94 Kirkpatrick, D. L. (1994). Evaluating training program. The four levels. San Francisco, CA: Berrett-Koehler Publishers. Kirkpatrick, D. L. (1996, January). Great ideas revisited. Training and Development Journal, 54-59. Kulhavy, R. W., & Wager, W. (1993). Feedback in programmed instruction: Historical context and implication for practice. In J. V. Dempsey & G.C. Sales (Eds.), Interactive instruction and feedback (pp.3-20). Englewood, NJ: Educational Technology publications. Kulhavy, R. W., & Stock, W. A. (1989). Feedback in written instruction: The place of response certitude. Educational Psychology of Review, 1, 279-308. Kulik, J. A., & Kulik, C. C. (1988). Timing of feedback and verbal learning. Review of Educational Research, 58(1), 79-97. Lane, D. C. (1995). On a resurgence of management simulations and games. Journal of the Operational Research Society, 46, 604-625. Malone, T. W. (1981). Toward a theory of intrinsically motivating instruction. Cognitive Science, 4, 333-369. Manning, B. H., Glasner, S. E., & Smith, E. R. (1996). The self-regulated learning aspect of metacognition: A component of gifted education. Roeper Review, 18(3), 217-223. Marsh, H. W., & Roche, L. A. (1997). Making students’ evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist, 52(11), 1187-1197. Martin, A. (2000). The design and evaluation of a simulation/game for teaching information systems development. Simulation & Gaming, 31(4), 445-463. Game Evaluation 95 Mayer, R. E. (1998). Cognitive, metacognitive, and motivational aspects of problem solving. Instructional Science, 26, 49-63. Mayer, R. E. (2001). Multimedia learning. New York: Cambridge University Press. Mayer, R. E. (2002). A taxonomy for computer-based assessment of problem-solving. Computer in Human Behavior, 18, 623-632. Mayer, R. E., & Moreno, R. (1998). A split-attention effect in multimedia learning: evidence for dual processing systems in working memory. Journal of Educational Psychology, 90(2), 312-320. Mayer, R. E., Moutone, P., & Prothero, W. (2002). Pictorial aids for learning by doing in a multimedia geology simulation game. Journal of Educational Psychology, 94(1), 171-185. Mayer, R. E., & Sims, V. K. (1994). For whom is a picture worth a thousand words? Extensions of a dual-coding theory of multimedia learning. Journal of Educational Psychology, 86, 389-401. Mayer, R. E., & Wittrock, M. C. (1996). Problem-solving transfer. In D. C. Berliner, & Calfee, R.C. (Ed.), Handbook of educational psychology (pp. 47-62). New York, NJ: Macmillian Library Reference USA, Simon & Schuster Macmillan. Mehrotra, C. M. (2001). Evaluation of a training program to increase faculty productivity in aging research. Gerontology & Geriatrics Education, 22(3), 79-91. Moreno, R., & Mayer, R. E. (2000). Engaging students in active learning: The case for personalized multimedia messages. Journal of Educational Psychology, 92(4), 724733. Game Evaluation 96 Morris, E. (2001). The design and evaluation of Link: A computer-based teaching system for correlation. British Journal of Educational Technology, 32(1), 39-52. Mulqueen, W. E. (2001). Technology in the classroom: lessons learned through professional development. Education, 122(2), 248-256. Nabors, Martha L. (1999). New functions for “old Macs”: providing immediate feedback for student teachers through technology. International Journal of Instructional Media, 26(1) 105-107. Naugle, K. A., Naugle, L. B., & Naugle, R. J. (2000). Kirkpatrick’s evaluation model as a means of evaluating teacher performance. Education, 121(1), 135-144. Novak, J. D. (1990). Knowledge maps and Vee diagrams: Two metacognitive tools to facilitate meaningful learning. Instructional Science, 19(1), 29-52. Okagaki, L. & Frensch, P.A. (1994). Effects of video game playing on measures of spatial performance: Gender effects in late adolescence. Journal of Applied Developmental Psychology, 15, 33-58. O’Neil, H. F., Jr. (Ed.). (1978). Learning strategies. New York: Academic Press. O’Neil, H. F., Jr. (1999). Perspectives on computer-based performance assessment of problem-solving. Computers in Human Behavior, 15, 225-268. O’Neil, H. F., Jr. (2003). What works in distance learning. Los Angeles: University of Southern California; UCLA/National Center for Research on Evaluation, Standards, and Student Testing (CRESST). O’Neil, H. F., Jr., & Abedi, J. (1996). Reliability and validity of a state metacognitive inventory: Potential for alternative assessment. Journal of Educational Research, 89, 234-245. Game Evaluation 97 O’Neil, H. F., Jr., & Andrews, D. (Eds). (2000). Aircrew training and assessment. Mahwah, NJ: Lawrence Erlbaum Associates. O’Neil, H. F., Jr., Baker, E. L., & Fisher, J. Y.-C. (2002). A formative evaluation of ICT games. Los Angeles: University of Southern California; UCLA/National Center for Research on Evaluation, Standards, and Student Testing (CRESST). O’Neil, H. F., Jr., & Fisher, J. Y.-C. (2002). A technology to support leader development: Computer games. In Day, V. D., & Zaccaro, S. J. (Eds.), Leadership development for transforming organization. Mahwah, NJ: Lawrence Erlbaum Associates. O’Neil, H. F., Jr., & Herl, H. E. (1998). Reliability and validity of a trait measure of selfregulation. Los Angeles, University if California, Center for Research on Evaluation, Standards, and Student Testing (CRESST). O’Neil, H. F., Jr., Mayer, R. E., Herl, H. E., Niemi, C., Olin, K, & Thurman, R A. (2000). Instructional strategies for virtual aviation training environments. In H. F. O’Neil, Jr., & D. H. Andrews (Eds.), Aircrew training and assessment, (pp. 105-130). Mahwah, NJ: Lawrence Erlbaum Associates. Parchman, S. W., Ellis, J. A., Christinaz, D., & Vogel, M. (2000). An evaluation of three computer-based instructional strategies in basic electricity and electronics training. Military Psychology, 12(1), 73-87. Peat, M., & Franklin, S. (2002). Supporting student learning: The use of computer-based formative assessment modules. British Journal of Educational Technology, 33(5), 515-523. Perkins, D. N., & Salomon, G. (1989). Are cognitive skills context bound? Educational Researcher, 18, 16-25. Game Evaluation 98 Petty, R. E., Priester, J. R., & Wegener, D. T. (1994). Handbook of social cognition. Hillsdale, NJ: Lawrence Erlbaum Associates. Pillay, H. K., Brownlee, J., & Wilss, L. (1999). Cognition and recreational omputer games: implications for educational technology. Journal of Research on Computing in Education, 32, 203-216. Pintrich, P. R., & DeGroot, E. V. (1990). Motivational and self-regulated learning components of classroom academic performance. Journal of Educational Psychology, 82, 33-40. Pirolli, P, & Recker, M. (1994). Learning strategies and transfer in the domain of programming. Cognition & Instruction, 12(3), 235-275. Poedubicky, V. (2001). Using technology to promote healthy decision making. Learning and Leading with Technology, 28(4), 18-21. Ponsford, K. R., & Lapadat, J. C. (2001). Academically capable students who are failing in high school: Perceptions about achievement. Canadian Journal of Counselling, 35(2), 137-156. Pridemore, D. R., & Klein, J. D. (1995). Control of practice and level of feedback in computer-based instruction. Contemporary Educational Psychology, 20, 444-450. Quinn, C. N. (1991). Computers for cognitive research: A HyperCard adventure game. Behavior Research Methods, Instruments, & Computers, 23(2) 237-246. Quinn, C. N. (1996). Designing an instructional game: Reflections on “Quest for independence.” Education and Information Technologies, 1, 251-269. Quinn, C. N., Alem, L., Eklund, J. (1997). Retrieved August 30, 2003, from http://www. Testingcentre.come/jeklund/interact.htm. Game Evaluation 99 Rabbitt, P., Banerji, N., Szymanski, A. (1989). Space fortress as an IQ test? Predictions of learning and of practiced performance in a complex interactive video-game. ACTA Psychologica Special Issue: Tge Kearbubg Strategues Origran: An Examination of the Strategies in Skill Acquisition, 71(1-3), 243-257. Rhodenizer, L., Bowers, C., & Bergondy, M. (1998). Team practice schedules: What do we know? Perceptual and Motor Skills, 87, 31-34. Ricci, K. E., Salas, E., & Cannon-Bowers, J. A. (1996). Do computer-based games facilitate knowledge acquisition and retention? Military Psychology, 8, 295-307. Rieber, L. P. (1996). Animation as feedback in computer simulation: Representation matters. Educational Technology Research and Development, 44(1), 5-22. Rieber, L.P. (1996). Seriously considering play: Designing interactive learning environments based on the blending of microworlds, simulations, and games. Educational Technology, Research and Development, 44, 43-58. Ritchie, D., & Dodge, B. (1992, March). Integrating technology usage across the curriculum through educational adventure games. (ED 349 955). Rosenorn, T. Kofoed, L. B. (1998). Reflection in Learning Processes through simulation/gaming. Simulation & Gaming, 29(4), 432-440. Ross, S. M., & Morrison, G. R. (1993). Using feedback to adapt instruction for individuals. In J. V. Dempsey & G.C. Sales (Eds.), Interactive instruction and feedback (pp.177195). Englewood, NJ: Educational Technology publications. Ruben, B. D. (1999, December). Simulations, Games, and experience-based learning: The quest for a new paradigm for teaching and learning. Simulation & Gaming, 30(4), 498-505. Game Evaluation 100 Ruiz-Primo, M. A., Schultz, S. E., and Shavelson, R. J. (1997). Knowledge map-based assessment in science: Two exploratory studies (CSE Tech. Rep. No. 436). Los Angeles, University if California, Center for Research on Evaluation, Standards, and Student Testing (CRESST). Salas, E. (2001). Team training in the skies: does crew resource management (CRM) training work? Human Factors, 43(4), 641-674. Sales, G. C. (1993). Adapted and adaptive feedback in technology-based instruction. In J. V. Dempsey & G.C. Sales (Eds.), Interactive instruction and feedback (pp.159-176). Englewood, NJ: Educational Technology publications. Santos, J. (2002). Developing and implementing an Internet-based financial system simulation game. Journal of Economic Education, 33(1) 31-40. Schacter, J., Herl, H. E., Chung, G., Dennis, R. A., O’Neil, H. F., Jr. (1999). Computer-based performance assessments: a solution to the narrow mearurement and reporting of problem-solving. Computers in Human Behavior, 13, 403-418. Schank, R. C. (1997). Virtual learning: A revolutionary approach to build a highly skilled workforce. New York: McGraw-Hill Trade. Schau, C. & Mattern, N. (1997). Use of map techniques in teaching applied statistics courses. American statistician, 51, 171-175. Schau, C., Mattern, N., Zeilik, M., Teague, K., & Weber, R. (2001). Select-and-fill-in knowledge map scores as a measure of students' connected understanding of science. Educational & Psychological Measurement, 61(1), 136-158. Game Evaluation 101 Schunk, D. H., & Ertmer, P A. (1999). Self-regulatory processes during computer skill acquisition: Goal and self-evaluative influences. Journal of Educational Psychology, 91(2). Scriven, M. (1967). The methodology of evaluation. In R. W. Tyler, R. M. Gagné, & M. Scriven (Eds.), Perspectives of curriculum evaluation (American Educational Research Association Monograph Series on Curriculum Evaluation, No. 1, pp. 3983). Chicago: Rand McNally. Simon, H. A. (1973). The structure of ill structured problem. Artificial Intelligence, 4, 181201. Sternberg, R. J., & Lubart, T. E. (2003). The role of intelligence in creativity. In M. A. Runco (Ed.), Critical Creative Processes. Perspectives on Creativity Research (pp. 153187). Cresskill, NJ: Hampton Press. Stolk, D., Alesandrian, D., Gros, B., & Paggio, R. (2001). Gaming and multimedia applications for environmental crisis management training. Computers in Human Behavior, 17, 627-642. Thomas, P., & Macredie, R. (1994). Games and the design of human-computer interfaces. Educational Technology, 31, 134-142. Thornburg, D. G. & Pea, R. D. (1991). Synthesizing instructional technologies and educational culture: Esxploring cognition and metacognition in the social studies. Journal of Educational Computing Research, 7(2), 121-164. Tkacz, S. (1998). Learning map interpretation: Skill acquisition and underlying abilities. Journal of Environmental Psychology, 18 (3), 237-249. Game Evaluation 102 Urdan, T., & Midgley, C. (2001). Academic self-handicapping: What we know, what more there is to learn. Education Psychology Review, 13, 115-138. van Merrienboer, J. J. G., Clark, R. E., & de Croock, M. B. M. (2002). Blueprints for complex learning: The 4C/ID-Model. Educational Technology Research & Development, 50(2), 39-64. White, B. Y., & Frederiksen, J. R. (1998). Inquiry, modeling, and metacognition: Making science accessible to all students. Cognition and Ind Instruction, 16(1) 3-118. Washbush, J., & Gosen, J. (2001). An exploration of game-derived learning in total enterprise simulations. Simulation & Gaming, 32(3), 281-296. Weller, M. (2000). Implementing a CMC tutor group for an existing distance education course. Journal of Computer Assisted Learning, 16(3), 178-183. Wellington, W. J., & Faria, A. J. (1996). Team cohesion, player attitude, and performance expectations in simulation. Simulation & Gaming, 27(1). West, D. C., Pomeroy, J. R., Park, J. K., Gerstenberger, E. A., Sandoval, J. (2000). Critical thinking in graduate medical education. Journal of the American Medical Association, 284(9), 1105-1110. Westbrook, J. I., & Braithwaite, J. (2001). The health care game: An evaluation of heuristic, web-based simulation. Journal of Interactive Learning Research, 12(1), 89-104. Winne, P. H., & Perry, N. E. (2000). Measuring self-regulated learning. In M. Boekaerts, & P. R. Pintrich (Eds.), Handbook of Self-regulation (pp. 531-566). San Diego, CA: Academic Press Woolfolk, A. E. (2001). Educational Psychology (8th ed.). Needham Heights, MA: Allyn and Bacon. Game Evaluation 103 Ziegler, A., & Heller K. A. (2000). Approach and avoidance motivation as predictors of achievement behavior in physics instructions among mildly and highly gifted eightgrade students. Journal for the Education of the Gifted, 23(4), 343-359. Zimmerman, B. J. (1994). Dimensions of academic self-regulation: A conceptual framework for education. In D. H. Schunk, & B. J. Zimmerman (Eds.), Self-regulation of learning and performance (pp. 3-21). Hillsdale, NJ: Erlbaum. Zimmerman, B. J. (2000). Self-efficacy. An essential motive to learn. Contemporary Educational Psychology, 25(1), 82-91. Game Evaluation 104 Appendix A Self-Regulation Questionnaire Name (please print): _________________________________________________________________ Directions: A number of statements which people have used to describe themselves are given below. Read each statement and indicate how you generally think or feel on learning tasks by marking your answer sheet. There are no right or wrong answers. Do not spend too much time on any one statement. Remember, give the answer that seems to describe how you generally think or feel. Almost Never Sometime s Often Almost Always 1. I determine how to solve a task before I begin. 1 2 3 4 2. I check how well I am doing when I solve a task. 1 2 3 4 3. I work hard to do well even if I don't like a task. 1 2 3 4 4. I believe I will receive an excellent grade in this course. 1 2 3 4 5. I carefully plan my course of action. 1 2 3 4 6. I ask myself questions to stay on track as I do a task. 1 2 3 4 7. I put forth my best effort on tasks. 1 2 3 4 8. I’m certain I can understand the most difficult material presented in the readings for this course. 1 2 3 4 9. I try to understand tasks before I attempt to solve them. 1 2 3 4 10. I check my work while I am doing it. 1 2 3 4 11. I work as hard as possible on tasks. 1 2 3 4 12. I’m confident I can understand the basic concepts taught in this course. 1 2 3 4 13. I try to understand the goal of a task before I attempt to answer. 1 2 3 4 14. I almost always know how much of a task I have to complete. 1 2 3 4 Game Evaluation Almost Never 105 Sometime s Often Almost Always 15. I am willing to do extra work on tasks to improve my knowledge. 1 2 3 4 16. I’m confident I can understand the most complex material presented by the teacher in this course. 1 2 3 4 17. I figure out my goals and what I need to do to accomplish them. 1 2 3 4 18. I judge the correctness of my work. 1 2 3 4 19. I concentrate as hard as I can when doing a task. 1 2 3 4 20. I’m confident I can do an excellent job on the assignments and tests in this course. 1 2 3 4 21. I imagine the parts of a task I have to complete. 1 2 3 4 22. I correct my errors. 1 2 3 4 23. I work hard on a task even if it does not count. 1 2 3 4 24. I expect to do well in this course. 1 2 3 4 25. I make sure I understand just what has to be done and how to do it. 1 2 3 4 26. I check my accuracy as I progress through a task. 1 2 3 4 27. A task is useful to check my knowledge. 1 2 3 4 28. I’m certain I can master the skills being taught in this course. 1 2 3 4 29. I try to determine what the task requires. 1 2 3 4 30. I ask myself, how well am I doing, as I proceed through tasks. 1 2 3 4 31. Practice makes perfect. 1 2 3 4 Considering the difficulty of this course, the 1 teacher, and my skills, I think I will do well in this course. Copyright © 1995, 1997, 1998, 2000 by Harold F. O’Neil, Jr. 2 3 4 32.