THE EFFECTIVENESS OF COMPUTER-ASSISTED INSTRUCTION IN STATISTICS EDUCATION: A META-ANALYSIS by Yung-chen Hsu A Dissertation Submitted to the Faculty of the DEPARTMENT OF EDUCATIONAL PSYCHOLOGY In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY In the Graduate College THE UNIVERSITY OF ARIZONA 20 03 UMI Number: 3089963 UMI UMI Microform 3089963 Copyright 2003 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 2 THE UNIVERSITY OF ARIZONA GRADUATE COLLEGE As members of the Final Examination Committee, we certify that we have read the dissertation prepared by entitled Yung-Chen Hsu Thp. Effectiveness of Computer-assisted Instruction in Statistics Education; A Meta-analysis and recommend that it be accepted as fulfilling the dissertation requirement for the Degree of Stimuli Doctor of Philosophy ¥ '17- 03 Date Date Date Date Date Final approval and acceptance of this dissertation is contingent upon the candidate's submission of the final copy of the dissertation to the Graduate College. I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation requirement. Dissertation Director Date 3 STATEMENT BY AUTHOR This dissertation has been submitted in partial fulfillment of requirements for an advanced degree at The University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library. Brief quotations from this dissertation are allowable without special permission, provided that accurate acknowledgment of source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate College when in his or her judgment the proposed use of the material is in the interests of scholarship. In all other instances, however, permission must be obtained from the author. SIGNED: 4 ACKNOWLEDGEMENTS I would like to express my sincerest gratitude to the following people who have taught me knowledge and skills and supported me on the journey of completing the dissertation and my graduate study. Professor Darrell L. Sabers, my advisor and chairman of my dissertation committee, has supported me from my first statistics course through the last moment of my study, has taught me knowledge in measurement and psychological testing, and has encouraged me not giving up whenever I felt frustrated. Without his strong support and encouragement, the completion of this dissertation would have been impossible. Professor Kenneth J. Smith, the chairman of my minor, has guided me to the field of instruction technology. With taking the courses from Professor Smith, I have learned knowledge in computer technology applied in instruction and have benefit in the section of learning theories in this dissertation. Dr. Patricia B. Jones, a member of my dissertation committee, has taught me the knowledge and skills in applying SAS and SPSS in statistical analysis. The courses I have taken with Dr. Jones inspired me to examine the effectiveness of computer-assisted instruction in statistics education. With knowledge in performing statistical analysis in SAS, I could have successfully completed the dissertation. Dr. Philip E. Callahan and Professor Sarah M. Dinham, as members of my dissertation committee have also taught me technology and statistics and have greatly supported my graduate study. I would also like to thank Ms. Patricia R. Bauerle, a long-term friend since I came to the United States, who read the draft of the dissertation for several times and provided precious comments and suggestions. And, special thanks go to Mrs. Karoleen P. Wilsey for checking the format and the references of this dissertation. And above all, my parents, my husband, my in-laws, my sister, and many friends gave me full support throughout my graduate study. And, my three children, Andy, Alice, and Jasmine all waited patiently for me to finish doing my endless homework. 5 DEDICATION To my parents and parents-in-law 6 TABLE OF CONTENTS LIST OF ILLUSTRATIONS 9 LIST OF TABLES 10 ABSTRACT 11 CHAPTER 1. INTRODUCTION Background Statement of the Problem Research Questions Significance of the Study Definitions 12 12 16 18 19 20 CHAPTER 2. LITERATURE REVIEW Directed Instruction Skinner's Operant Conditioning Theory Information-Processing Theory Gagne's Learning Condition Theory Constructivism Piaget's Cognitive-Development Theory Vygotsky's Cultural-Historical Theory Varieties and Characteristics of Constructivism Computer-Assisted Instruction in Statistics Education Research Synthesis Methods Traditional Review Methods Statistically Correct Vote-Counting Methods Meta-Analysis Methods Meta-Analyses on Computer-Assisted Instruction 22 23 24 25 27 29 30 33 35 37 42 42 44 45 51 CHAPTER 3. METHOD Research Questions Sampling Criteria and Procedure Study Characteristics Publication Year Publication Source Educational Level of Participants Mode of CAI Program Type of CAI Program 55 55 56 58 58 59 59 59 59 7 TABLE OF CONTENTS—Continued Level Of Interactivity of CAI Program Instructional Role of CAI Program Sample Size of Participants Dependent Variable Statistical Analysis Conceptualization of Effect Size Effect Size and Statistical Significance Definition and Calculation of Effect Size Combination of Effect Sizes ANOVA Approach to Test the Moderating Effects of Categorical Study Characteristics Comparisons Among Groups Regression Approach to Study Moderating Effects of Continuous Study Characteristics File Drawer Problem CHAPTER 4. ANALYSIS AND RESULTS Primary Study Selection Reviewing and Coding the Primary Data Examination for Selection Bias Estimate of Overall Effect Size Dependence of Effect Sizes Fail Safe Number Primary Study Characteristics Publication Year Publication Source Educational Level of Participants Mode of CAI Program Type of CAI Program Level Of Interactivity of CAI Program Instructional Role of CAI Program Sample Size of Participants Comparisons Among Groups for Mode of CAI Program CHAPTER 5. SUMMARY, DISCUSSION, CONCLUSIONS, AND RECOMMENDATIONS Summary Discussion Conclusions and Recommendations 60 60 60 61 61 61 63 64 67 69 72 73 73 76 76 78 78 80 81 83 84 84 87 88 89 91 93 94 95 96 99 99 103 106 8 TABLE OF CONTENTS—Continued APPENDIX A. PRIMARY STUDY REVIEW SHEET 109 APPENDIX B. TABLES OF DATA 110 APPENDIX C. FOREST PLOTS FOR EFFECT SIZES GROUPED BY STUDY CHARACTERISTICS 115 REFERENCES 124 9 LIST OF ILLUSTRATIONS 3.1. Graphical representation of effect size 62 4.1. 4.2. 4.3. 4.4. Funnel plot Histogram of effect sizes Normal quantile plot Regression of effect sizes on publication year 79 81 82 85 C.l. C.2. C.3. C.4. C.5. C.6. C.7. C.8. Forest Forest Forest Forest Forest Forest Forest Forest plots plots plots plots plots plots plots plots for effect for effect for effect for effect for effect for effect for effect for effect sizes grouped sizes grouped sizes grouped sizes grouped sizes grouped sizes grouped sizes grouped sizes grouped by publication year by publication source by level of education by mode by type by level of interactivity by instructional role by sample size 116 117 118 119 120 121 122 123 10 LIST OF TABLES 1.1. Enrollments in Introductory Statistics, in Thousand (CBMS) 14 2.1. 2.2. Relationships Between Learning Phases and Instruction Events Findings of 12 Meta-Analyses on Computer-Based Instruction Published between 1978 and 1991 Findings of 12 Meta-Analyses on Computer-Based Instruction Published between 1993 and 2000 28 54 4.1. 4.2. 4.3. 4.4. 4.5. 4.6. 4.7. 4.8. 4.9. 4.10. 4.11. 4.12. 4.13. 4.14. 4.15. 4.16. Statistics of Study Effect Sizes by Year Q Statistics by Year Statistics of Study Effect Sizes by Source Q Statistics by Source Statistics of Study Effect Sizes by Educational Level Q Statistics by Educational Level Statistics of Study Effect Sizes by Mode Q Statistics by Mode Statistics of Study Effect Sizes by Type Q Statistics by Type Statistics of Study Effect Sizes by Level of Interactivity <5 Statistics by Level of Interactivity Statistics of Study Effect Sizes by Instructional Role Q Statistics by Instructional Role Statistics of Study Effect Sizes by Sample Size Q Statistics by Sample Size 86 87 88 88 89 89 90 91 92 92 94 94 95 95 96 96 B.l. B.2. B.3. B.4. Primary Study Data Effect Size Data Primary Study Characteristics Standard Errors and Confidence Intervals 2.3. 53 Ill 112 113 114 11 ABSTRACT The purpose of this study was to investigate the effectiveness of computer-assisted instruction (CAI) in statistics education at the college level in the United States. This study employed meta-analysis to integrate the findings from 25 primary studies which met a specific set of criteria. The primary studies were selected from journal articles, ERIC documents, and dissertations. Results of the meta-analysis produced an overall effect size estimate of 0.43, indicating a small to medium positive effect of applying CAI in teaching college-level introductory statistics on students' achievement. Several study characteristics were examined for the association with the effect magnitude. These characteristics included the publication year, the publication source, the educational level of participants, the mode of the CAI program, the type of CAI program, the level of interactivity of the CAI program, the instructional role of the CAI program, and the sample size. The results of the analogous analysis of variance showed that different modes of CAI programs produced significantly different effects on students' achievement in learning statistics. Expert systems and drill-and-practice programs were the most effective modes and were followed by multimedia, tutorials, and simulations. Computational statistical packages and web-based programs were the least effective modes. The teacher-made CAI programs were significantly more effective than the commercially-developed CAI programs. The effectiveness of CAI program in teaching statistics did not differ significantly according to the study characteristics of the publication year, the publication source, the educational level of participants, the level of interactivity of CAI program, the instructional role of CAI program, and the sample size. 12 CHAPTER 1 INTRODUCTION Background Many people live in societies that depend heavily on information and technology. Issues regarding politics, economics, education, and science are decided and judged on the basis of data. Statistical reports, such as the results of surveys, and observational and experimental studies are reported regularly in the media. Statistical information has affected people's lives in various aspects. Therefore, the ability to understand, interpret, and evaluate statistical findings has become an essential skill for future citizens and workers in society (Ben-Zvi, 2000). The 19th-century prophet H. G Wells predicted that "statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write" (cited in Newmark, 1996, p. 6). The employment of all kinds of jobs has increasingly required workers to have analytical, quantitative, and computing skills. These requirements have placed more pressure on educational systems to prepare and equip students with statistical concepts and quantitative knowledge (Moore, 1997). The teaching and learning of statistics has affected the curriculum in all levels of education. In the United States, students from preschool through grade 12 have been taught how to collect and interpret quantitative information (Derry, Levin, Osana, & Jones., 1998; Friel, Corwin, & Rowan, 1990; Hilton, Grimshaw, & Anderson, 2001; Long, 1998). The Center for Statistical Education has been developing curriculum materials and conducting workshops mostly under the Quantitative Literacy Project (QLP) of the American Statistical Association. The QLP provides instructional materials on probability and statistics that can be used in the pre-college curriculum (Garfield & Ahlgren, 1994). A set of guidelines has 13 been suggested for teaching statistics to the K-12 grade students (Scheaffer, 1990). In addition, the release of the Principles and Standards of the National Council of Teachers of Mathematics (NCTM) includes a content standard that emphasizes statistical reasoning about data analysis and probability (NCTM, 2000). In the past 20 years, the number of statistics courses at the college level has been increasing for most of the disciplines in the United States (Loftsgarrden &: Watkins, 1998: Moore, 1997). More undergraduate and graduate departments require their students to acquire some understanding of statistics. According to Barnet (1999), the first inclusion of statistics courses at the college level started in the late 1800s. Economics and psychology departments usually offered these statistics courses. In 1898, the Department of Mathematics at the University of Illinois became the first mathematics department to offer a statistics course (Walker, 1929). In 1925, the American Statistical Association found that 84 of 125 colleges surveyed offered statistics courses. Currently, the Bureau of Labor Statistics (2002) reports that "About 80 colleges and universities offered bachelor's degrees in statistics in 2000", and "In 2000, approximately 110 universities offered a master's degree program in statistics, and about 60 offered a doctoral degree program" (p. 179). However, there are about 4100 degree-granting colleges and universities (including two-year and four-year public and private institutions) and about 14 million college students (NCES, 2002). Thus, in contrast, there are few colleges and universities that offer degrees in statistics. The web site of the American Statistical Association (http://www.amstat.org/education/) provides detailed information about the departments of statistics in the colleges and universities. During the past decades, society and employment have become more quantitative. In Table 1.1, the increasing needs can be seen from the data of the every-five year census of Conference Board of the Mathematical Science (CBMS) of mathematical sciences departments (Loftsgaarden, Rung, & Watkins, 1997; Scheaffer, 2001). 14 Table 1.1 Enrollments in Introductory Statistics, in Thousand (CBMS) Term Fall, 1990 Fall, 1995 Fall, 2000 *Based on Statistics Mathematics Depts Depts 30 87 49 115 54 136 incomplete data Colleges 2-year 54 72 84* Totals 170 236 274 As the enrollments in the mathematics and statistics departments steadily grow, the departments of the nonmathematical disciplines also increasingly provide introductory statistics courses to meet the needs of students without strong mathematical backgrounds. However, due to the mathematical nature of statistics, students often have a fear of formulas and mathematical reasoning which easily leads to negative attitudes and severe frustration in learning statistics (Hogg, 1992). The question turns to how teachers can help these students learn effectively. A successful introductory statistics course depends on many factors and efforts from both the teacher and the students. Barnet (1999) and Moore (1997) describe how statistics courses have usually been taught in a traditional way. Many people who have taken statistics courses probably have had the experience of sitting in a class in which the teacher stood in front of the blackboard or used the overhead projector to give a lecture with the students taking notes. Then, assignments or projects were given to reinforce students' understanding. And, tests were used to evaluate students' learning. There has been criticism that traditional teaching of statistics focuses on computation, formulas, and procedures rather than on statistical reasoning and the ability to interpret, evaluate, and flexibly apply statistical ideas (Ben-Zvi, 2000). As Hogg (1992) pointed out, a problem with these traditional methods is that they do not adequately equip the students to apply statistics in the real world. The lectures usually focus on learning statistical concepts rather than on 15 the process of using statistical concepts to solve problems (Garfield, 1995). Moreover, because the important fundamental concepts are highly abstract and theoretical, the beginning students usually have difficulty in understanding the lectures (Watts, 1991). Another hypothesis for the explanation of why traditional methods do not work well is that lectures sometimes contribute to a high level of anxiety, fear, and negative attitudes (Barnet, 1999). In a traditional class, the students have a very passive role in learning and often feel helpless when facing a subject that is difficult and intimidating. Also, teachers are often unimaginative in their methods of delivery and unable to use the wide variety of simulations, experiments, and individual or group projects that are possible (Hogg, 1992). There has been a growing feeling that statistics education needs significant changes (Bisgarrd, 1991; Bradstreet, 1996; Hogg, 1991, 1992; Snee, 1993; Watts, 1991). Moore (1997) called for a reform movement in the teaching of statistics to beginners at the university level. The main suggestion was that "the most effective learning takes place when content, pedagogy, and technology reinforce each other in a balanced manner" (p. 124). Also, the American Statistical Association and the Mathematical Association of American have provided the following recommendations (Moore, 1997, p. 127): 1. Emphasize the elements of statistical thinking: (a) the need for data; (b) the importance of data producing; (c) the omnipresence of variability; and (d) the measuring and modeling of variability. 2. Incorporate more data and concepts with fewer recipes and derivations. Whenever possible, automate computations and graphics. An introductory course should: (a) rely heavily on real data; (b) emphasize statistical concepts; (c) rely on computers rather than computational recipes; and (d) treat formal derivations as secondary in importance. 16 3. Foster active learning, througli the following alternatives to lecturing: (a) group problem solving and discussion; (b) laboratory exercises; (c) demonstrations based on class-generated data; (d) written and oral presentations; and (e) projects, either group or individual. There is a great degree of general agreement on the guiding principles for the changes of beginning statistics instruction (Moore, 1997). That is, the teacher should emphasize more on concepts, data analysis, inference, and statistical thinking; foster active learning through various alternatives to lecturing; and use technological tools to automate computations and graphics (Barnet, 1999; Ben-Zvi, 2000; Cobb, 1992; Garfield, 1995; Hogg, 1992; Holcomb h Ruffer, 2000; Moore, 1997). Moore (1997) indicated that the chain of influence begins with technology. The continuing revolution in computing has changed the practice of statistics, and has subsequently changed the tastes of what constitutes interesting research in statistics. Gradually, the combination of technology, professional practices, and research tastes have affected introductory instruction in statistics education. The kinds of technology that are generally used in introductory statistics classes, according to Moore (1997), include television, video, and computing software, graphing calculators, statistical software, simulation tools, and multimedia products. For the purpose of this study, the focus will be on the computer-based technology. Statement of the Problem Computers have increasingly been incorporated in introductory statistics classes at the college and university level. For example, the survey conducted by Castellan (1982) showed that about 50 percent of the respondents reported using computers in courses of statistics and experimental methods in psychology departments in the United States. Couch and Stoloff (1989) documented an 17 increasing use of computers for research methods and statistics at 71 percent and 66 percent, respectively, of a national sample of psychology departments. Mittag (1993) concluded in a study that 49 percent of the instructional time of the non-calculus-based statistics course should be data-based, 28 percent computer-based, 13 percent probability-based and 10 percent based on other approaches. Bartz (2001) indicated that 79 percent of North American undergraduate departments indicated that computers were used in the statistics courses. In most colleges and universities, the amount and quality of technology continues to improve. Students have an increased access to computers, graphing programs, Internet resources, and multimedia. Excellent software programs have been made available for exploring data, presenting statistical concepts, and even tutoring students (Moore, 1997). As advanced computer technology has developed, there have been numerous tutorials, simulations/demonstrations, and computational packages that support statistical instruction and learning (Lee, 1999; Marasinghe & Meeker, 1996; Mills, 2002; West, Ogden, & Rossini, 1998). Becky (1996) found that in the literature the most frequent topics of statistical instruction focuses on teaching approaches and the second most frequent topics focus on using computers in teaching statistics. However, the problem was that most of these studies usually described the development of the computer programs and the implementation in the classes without investigating the effectiveness of these computer tools (e.g., Beins, 1989; Brigg &: Sheu, 1998; Britt, Bellinger, & Stillerman, 2002; Butler &: Eamon, 1985; Derry, Levin, & Schauble, 1995; Eamon, 1992; Hatchett, Zivian, Zivian, & Okada, 1999; Malloy & Jensen, 2001; Mitchell & Jolley, 1999; Rogers, 1987; Sedlmeier, 1997; Strube &: Goldstein, 1995; Walsh, 1993, 1994; Warner & Meehan, 2001). Even though some studies have conducted experiments to test the effects, the results seem to be, at times, inconsistent and contradictory (e.g., Athey, 1987; Gonzales & Birch, 2000; Gratz, Volpe, & Kind, 1993; Hurlburt, 2001; Lane & Aleksic, 2002; 18 Marcoulides, 1990; Porter & Riley, 1996; Ware & Chastain, 1989). As many instructors strive to incorporate computer tools in teaching statistics, an important question that should be asked is if the tools have positive effects on assisting students' statistical learning. There are a number of meta-analyses that synthesize the research of computer-assisted instruction in colleges. For example, Kulik and Kulik (1986) found that when receiving computer-assisted instruction, college students overall outperformed 60 percent (an effect size of 0.26) of other students who were taught with a traditional method. However, few meta-analyses have been conducted for the effectiveness of computer-assisted instruction in the field of statistics education. Research Questions The purpose of this meta-analysis is to examine the following questions; 1. How effective is the use of computer-assisted instruction (CAT) in enhancing the statistical learning of college students as compared with non-computer instructional techniques? 2. Does the effectiveness of CAI differ by the publication year of the study? 3. Does the effectiveness of CAI differ by the source of the study (dissertation, journal article, or ERIC document)? 4. Does the effectiveness of CAI differ by students' level of education (undergraduate or graduate)? 5. Which modes of computer-assisted instruction (CAI) techniques are the most effective for statistical instruction for college students? For example, there are drill-and-practice, tutorials, multimedia, simulations, computational statistical programs, expert systems, and web-based programs. 19 6. Does the effectiveness of CAI differ by the software type (commercial or teacher-made)? 7. Does the effectiveness of CAI differ by the level of interactivity of the program (interactive-PC, interactive-mainframe, or batch-mainframe)? 8. Does the effectiveness of CAI differ by the role of the program (supplement or substitute)? 9. Does the effectiveness of CAI differ by the sample size of the participants? Significance of the Study This meta-analysis synthesized and integrated the results from various experimental studies that investigated the effectiveness of using computers to teach college-level introductory statistics courses during the last twenty years. As computers have increasingly been incorporated into teaching statistics, the investigation of the effect of the computer programs and instruction has become important. While there have been numerous studies and reports regarding a diversity of computer programs in statistics education, there have been relatively a small number of studies that have conducted experiments to examine the effect of these computer programs and methods. In addition, there have been positive and negative results of the application of computer programs in teaching statistics. This meta-analysis provides a systematic and quantitative analysis by integrating the results from the selected primary experimental studies and examining the effectiveness of various variables. This meta-analysis identifies the most advantageous modes of computer programs and provides an overall effect size as well as detailed effect sizes according to various variables. Moreover, this study contributes to the literature that contains few meta-analytic studies investigating the effectiveness of applying CAI in statistics education. 20 Definitions For the purposes of this study, the following terms are defined. Computer-Assisted Instruction (CAI) can be defined as the use of computers to assist in instructional activities. CAI is synonymous with computer-assisted learning, computer-based instruction, computer-based education, computer-based learning, and computer-enhanced instruction. The students can receive some or all of their course materials or instruction by interfacing with computer programs on microcomputer, mainframe computer, or through the Internet. These instructional methods usually includes various modes (Jonassen, 1996; Robyler & Edwards, 2000; Steinberg, 1991): 1. Drill-and-practice provides the students with problems in increasing complexity to solve. With drill, the students are presented with relatively easy problems to which they can answer quickly with practice; the students will answer more complex problems which may require more problem-solving activities. This method allows the students to work problems or answer questions at their own pace and obtain immediate feedback on correctness. 2. Tutorials act like tutors by providing the information and instructional activities students need to master a topic. Tutorials usually present information summaries, explanations, practice routines, feedback, and assessment. 3. Simulations & games model real-world phenomena or artificial environments. These programs require students to respond to computer-driven and changing situations that allow the students to predict outcomes based on their input and to explore or discover new information. 4. Problem solving programs perform complex numerical, algebraic, or symbolic calculations. These programs assist students in understanding the principles 21 and rules through explanation or practice. The steps involved in solving problems can help students acquire problem-solving skills by providing the opportunities to solve problems. 5. Expert systems act as advisors and offer suggestions to assist students' decision-making process. These systems employ artificial intelligence to computerize human expertise in a specific domain. 6. Multimedia is software that connects elements of a computer system (e.g., texts, movies, pictures, and other graphics). It includes hypermedia which contains hypertext links. Hypertext consists of text elements such as keywords that can be cross-referenced with other occurrences of the same words or with related concepts. Effect size (ES) refers to an estimate of the magnitude of an effect or, more generally, the size of the relationship between two variables. The most basic form is the standardized mean difference of the outcomes between an experimental group and a control groups (Rosenthal, 1991). Meta-analysis is a statistical analysis of a collection of analysis that results from individual studies for the purpose of integrating the findings. It is a quantitative method or research synthesis, that uses various measurement techniques and statistical analyses to aggregate and analyze the descriptive and inferential statistics of primary studies focusing on a common topic of interest. 22 CHAPTER 2 LITERATURE REVIEW This chapter first reviews the learning theories which have influenced various forms of computer-assisted instruction since the 1970s. During these years, as computer technology has increasingly and dramatically developed, more educators have supported the appropriate use and the instructional role of computers in education (e.g., Jonassen, Peck, & Wilson, 1999; Lamb, 1992; Newby, 1996; Vogel k. Klassen, 2001). The content of higher education has also changed so rapidly that content knowledge sometimes becomes outdated before students even graduate. Thus, educational goals should not be confined to specific knowledge or skills. Rather, the goals should focus on providing students with creative problem-solving skills, teaching students with pro-active approaches, and equipping students with abilities to adapt to the social needs (Newby, 1996; Vogel & Klassen, 2001). Most educators seem to agree that changes are needed in education. However, disagreements among learning theorists have centered on which strategies will be more effective in achieving today's educational goals (Roblyer & Edwards, 2000). The brief review provides a general overview of the learning theories for directed instruction and constructivism and the influences of these learning theories on the development of computer-assisted instruction. A review of research synthesis methods is also provided in this chapter. As compared with a single study, research syntheses can increase statistical power as a result of increased sample sizes. Research syntheses can also be effective in identifying interactions between the treatment and the study (Light, 1984). In education, the ability and effectiveness to identify interaction can benefit students and assist policy decision-making regarding various educational programs. These 23 benefits include: (a) helping to match instructional methods with individual student needs, (b) explaining the most effective treatment features, (c) explaining inconsistent or conflicting learning outcomes, (d) determining critical performance outcomes, (e) assessing the stability of treatment effectiveness, and (f) assessing the importance of research design (Light, 1984). Both research synthesis and learning theories provide important information for education policy and educational practice. The third section reviews the research studies that have applied meta-analysis methods to investigate the effectiveness of computer-assisted instruction in general education and statistics education in colleges and universities. Directed Instruction Learning can occur in many situations and might be the result of deliberate efforts or unintended circumstances. Bower and Hilgard (1981) defined learning as "the change in a subject's behavior or behavior potential to a given situation brought about by the subject's repeated experiences in that situation, provided that the behavior change cannot be explained on the basis of the subject's native response tendencies, maturation, or temporary states" (p. 11). Newby (1996) also defined learning as "a change in human performance potential that results from practice or other experience and endures over time" (p. 25). Learning is basically concerned with a change in the possession of knowledge. There are different views on teaching and learning. Roblyer and Edwards (2000) have provided two basic categories. One is called directed instruction, and the other is constructivism. Directed instruction is grounded primarily in behaviorist learning theories and the information processing of the cognitive learning theories. Constructivism evolved from branches of cognitive theory. A few computer applications such as "drill and practice" and tutorials are associated with 24 directed instruction. Most others, such as problem-solving, multimedia applications, and telecommunications can either facilitate directed instruction or constructivist environments, depending on how the teacher implements the applications. Behavioral theories and information-processing theories have contributed to the development of directed instruction (Roblyer &: Edwards, 2000). Behaviorists have concentrated on the changes of observable behavior in performance as indicators of learning which is not the result of maturation. There have been many important behavioral theories, such as Edward L. Thorndike's connectionism, Ivan Petrovich Pavlov's classical conditioning, Edwin R. Guthrie's contiguous conditioning, Clark Hull's systematic behavior theory, B. F. Skinner's operant conditioning, and William K. Estes's stimulus sampling theory (Bower k Hilgard, 1981). Skinner's Operant Conditioning Theory Among these behavioral theorists, B.F. Skinner generated much of the experimental data that laid the basis of behavioral learning theory (Roblyer & Edwards, 2000). In Skinner's view, learning is behavioral change. Learning is defined as "a change in the likelihood or probability of a response" (Gredler, 2001, p. 90). Skinner's operant conditioning model postulated three essential elements of learning; discriminative stimulus, response, and the reinforcing stimulus. He distinguished responses into two classes: respondents and operants. Respondents are the reflex actions elicited by a given stimulus, and operants are emitted responses without any obvious stimulus, which is attributed to internal processes in the brain. Operants act on the environment to have different kinds of consequences which affect the person and change future behavior (Gredler, 2001). For example, singing a song may operate on the environment to produce consequences like praise, applause, or money. 25 Within Skinner's model, the major job of the teacher is to modify students' behavior by setting up situations to reinforce students when they show desired responses and also teach the students to exhibit the same response in such situations (Roblyer & Edwards, 2000). Skinner emphasized that teaching occurs when a response is evoked for the first time and is then reinforced. Therefore, the design of effective instruction requires careful attention in the selection of the discriminative stimuli and the use of reinforcement (Gredler, 2001). In 1954, Skinner started to invent a mechanical device to assist teaching math, reading, spelling, and other subjects. Skinner's devices and other models were called "teaching machines" or "autoinstructional devices", and the materials were called programs (Bower & Hilgard, 1981). The teaching machines provided contingent reinforcement for right answers in the form of (a) confirmation for correct answers, (b) a move forward to new materials, and (c) operating the equipment by the students. The students move forward at their own pace. Skinner (1989) described the teaching machine as a mechanical anticipation of the computer. He also considered the computer as the ideal teaching machine because computers can bring aspects of real life into the classroom and also expand the range of potential reinforcers. The characteristics of the computer make it especially appropriate for use in tutorials, drill and practice, and simulation/gaming instructional modes (Kuchler, 1998). Information-Processing Theory Behaviorists have only paid attention to external, directly observable indicators of human learning. However, many people have found the explanation insufficient to guide instruction. During the 1950s and 1960s, some cognitive theorists started to propose the internal mental processes. Information-processing theorists hypothesized that processes inside the brain allow people to learn and 26 remember. Based on a model of memory and storage proposed by Atkinson and Shiffrin (cited in Roblyer & Edwards, 2000), the model proposed that the brain contains certain structures that process information much like a computer does. It hypothesizes that the human brain has three kinds of memory or stores: (a) sensory registers-the part of memory that receives all the information a person senses through five senses (sees, hears, feels, tastes, or smells); (b) short-term memory (working memory)-the part where new memory is held temporarily until it is lost or placed into long-term memory; and (c) long-term memory-the part that has an unlimited capacity and can hold information indefinitely. In this model, learning occurs through a process. First, sensory registers receive information and hold it for a very short time, after which it either enters short-term memory or is lost. If the person does not pay attention, the information may be lost before going to short-term memory. Then, the information stays in short-term memory for 5 to 20 seconds. At this time, the person needs to practice or process the information which then is stored in long-term memory. Otherwise, the information is lost. The theorists also believed that the new information needs to be linked in some way to prior knowledge (existing schema) in long-term memory. The information-processing views provide the basis for instruction, which use a variety of methods to increase the chances that students will pay attention to new information and transfer the information to long-term memory. Some processing aids, such as advance organizers, instructional-based aids, and learner-generated cues for encoding and recalling, are suggested for improving learning (Gredler, 2001). The analogy of the human learning process as a computer information processing system has also inspired computer-assisted instruction to develop simulation, gaming to foster problem solving skills, and has guided the artificial intelligence (AI) application to simulate human thinking and learning behaviors. In another way, students can use programming languages to instruct the computer to solve complex problems. Computers can be used as a mindtool to enhance learning. 27 Gagne's Learning Condition Theory Robert M. Gagne built on the behavioral and information-processing learning theories in developing practical instructional guidelines that teachers could implement with directed instruction (Roblyer & Edwards, 2000). He defined learning as "the set of cognitive processes that transforms the stimulation from the environment into the several phases of information processing necessary for acquiring a new capability" (Gagne & Briggs, 1979, p. 43). He believed that learning is an important causal factor in development and is cumulative. Students must have all the prerequisite skills they need to learn a new skill. Low-level skills provide a foundation for higher-level skills. In order to acquire an intellectual skill, a student has to go through a process of a "learning hierarchy". For instance, students must possess the skills of number recognition, number facts, simple addition and subtraction, multiplication, and simple division before they can work long division problems (Roblyer & Edwards, 2000). Gagne identified several varieties of learning outcomes when the student acquires the knowledge. In general, they could be classified into five types: intellectual skill, cognitive strategies, verbal information, motor skills, and attitudes. The five distinct skills reflect the capabilities that the student acquires as a result of learning. Gagne further subdivided the intellectual skills into four subcategories: concept learning, discrimination learning, higher-order rule learning, and procedure learning. Gagne (1985) identified the internal states required in the student to acquire the new skills. These states are called the internal conditions of learning. However, Gagne thought that learning new skills also depends on the interactions between the external environment. These environmental supports are called the external conditions of learning (Gagne, 1985). They are also referred to as the events of instruction. 28 Table 2.1 Relationships Between Learning Phases and Instruction Events Description Preparation for learning 1. 2. 3. Acquisition and performance Transfer of Learning Learning Phase Attending Expectance 1. 2. 5. 6. 7. Retrieval to working memory Selective perception of stimulus features Semantic encoding Retrieval and responding Reinforcement 5. 6. 7. 8. 9. Cueing retrieval Generalizing 8. 9. 4. 3. 4. Instruction Event Gaining attention Informing learner of lesson objective Stimulating recall of prior learning Presenting distinctive stimulus features Providing guide Eliciting performance Providing informative feedback Assessing performance Enhancing retention and learning transfer Note. Adapted from Gredler, 2001, p. 149 Gagne (1985) applied the internal processes of the information processing theories to analyzing learning. He identified nine stages of the learning process that are fundamental to learning and must be executed in sequential order. The nine stages are called phases of learning and can be categorized into three stages: (a) preparation for learning, (b) acquisition and performance, and (c) transfer of learning (see Table 2.1). Gagne believed that learning can occur whether or not instruction is present. However, each of the learning process might be influenced in some way by events external to the learner (Gredler, 2001). Gagne also proposed a set of nine "events of instruction" that teachers could follow to arrange optimal conditions of learning. Gagne's principles focus on instruction rather than on simply teaching. His instructional design uses the systems approach that is characterized by three major features. First, instruction is designed for specific goals and objectives. Second, the development of instruction uses media and other instructional technologies. Third, 29 pilot tryouts, material revisions, and field testing of the materials are an important part of the systems design process (Gredler, 2001). The specific instructional plan should be based on a detailed learning task analysis. Next, the instructor should select the appropriate media which are compatible with the intended learning outcomes, the students' characteristics, and the capabilities of the different instructional media. The use of media can include low technology options (e.g., the teacher's voice, printed texts, and real objects) or high technology options (e.g., computer-assisted instruction, instructional television, videocassette recording, and mechanized delivery systems). Computer-assisted instruction may incorporate the necessary external events of instruction which promote the corresponding internal processes as outlined in Table 2.1. The most common modes of computer-assisted instruction are drill-andpractice, simulation, gaming, and tutorials (Gagne, Wager, & Rojas, 1981). Drill-and-practice computer programs provide two external instructional events which can help elicit a learner's performance and ofi'er informative feedback. For simulation and gaming computer programs, two additional events are included informing lesson objectives to the students and presenting stimuli with distinctive features. The tutorial mode may be the most comprehensive mode in which all nine external instructional events can be included (Gagne, Wager, & Rojas, 1981). Constructivism For the teaching methods in statistics education, Moore (1997) stated that "the central idea of the new pedagogy is the abandonment of an information transfer model in favor of a constructivist view of learning" (p. 124). Although there are different opinions on how introductory statistics should be taught, most still agree with the current thinking in the field of education that relies on the theory of constructivism and the use of active learning situations in the classroom (Barnet, 30 1999; Dokter & Heimann, 1999). Basically, constructivist strategies are based on the principles of learning that have been rooted in cognitive theories. The common principle among these theories is that "learners construct knowledge themselves rather than simply receiving it from knowledgeable teachers" (Roblyer & Edwards, 2000, p. 67). Several early educators have contributed some of the fundamental thinking to constructivism. As early as 1897, John Dewey argued that "education must be conceived as a continuing reconstruction of experience that occurs through the stimulation of the child's powers by the demands of the social situation in which he finds himself" (cited in Newby, 1996, p. 34). Dewey emphasized the need to center instruction around activities that are related and meaningful to students' own experience. Constructivist . learning theory has its primary foundations in the work of Jean Piaget, Lev Vygotsky, and others (Howe & Berv, 2000; Maddux & Cummings, 1999). The following sections will briefly describe the main ideas of these theorists. Piaget's Cognitive-Development Theory Central to Piaget's stage theory is the idea that cognitive development from infancy through adulthood is brought about by the individual's efforts to adapt to the environment with specific goals (Gredler, 2001). Piaget proposed that children pass through four stages of cognitive development (a) the sensorimotor stage (from birth to about age 2), where children's innate reflexes interact with the environment; (b) preoperational stage (from about age 2 to about age 7), where children begin basic concept formation; (c) concrete operational stage (from about age 7 to about age 11), where children use interiorized actions or thoughts to solve problems in their immediate experience; and (d) formal operations stage (from about age 12 to about age 15), where children can think through complete hypothetical situations 31 (Hergenhahn, 1988). The ages at which children experience these ages might vary. In the sensorimotor stage, children explore the environment with senses and motor activities. Then, children begin to understand the relation of cause and effect. In the preoperational stage, children develop greater abilities in language and engage in symbolic activities like drawing objects and playing imagination. Also, children begin to classify things in certain classes of similarity but still make mistakes. However, one of the most interesting characteristics of the preoperational stage is that children fail to develop the concept of conservation. That is, they do not understand that number, length, substance or area remains constant when these things are presented in different ways. In the stage of concrete operation, children develop some specific mental operations such as: conservation (i.e., ability to perform reversible mental operation), class inclusion (i.e., ability to reason about a part and the whole), seriation (i.e., ability to arrange things according to some quantified dimension), decentration (i.e., ability to take another's point of view, and relational thinking (i.e., ability to compare two or more objects simultaneously). In the formal operations stage, children have developed the abilities of reasoning and logical thinking. Children can also form and test hypotheses to organize information, and reason scientifically, show abstract thinking, use higher-order operations to solve problems, and think about one's own thoughts (Gredler, 2001). Piaget believed that a child's cognition develops from one stage to another through a gradual process of interacting with the environment. When the child confronts new and unfamiliar features of the environment which do not fit his or her current views of the world, the situation of "disequilibration" occurs and then the child finds ways to resolve these conflicts through one of the two processes of adaptation. One way is assimilation when the child attempts to modify or integrate the new experiences into his or her existing view of the world. The other way is accommodation when the child changes his or her schema to incorporate the new experiences. As the child assimilates or accommodates the new situation, the state 32 of equilibration is gradually established (Roblyer & Edwards, 2000). In Piaget's stage development theory, the goal of the learner is to move successively and successfully through the lower stages of development to the highest stage, the formal operations stage. Piaget indicated that formal operational thinking cannot be acquired through direct transmission of knowledge. He recommended "the use of active methods that require the learner to rediscover or reconstruct the truths to be learned" (Gredler, 2001, p. 256). That is, the changes in the cognitive structures (or schema) and the development of problem solving skills cannot be brought about directly by an instructor. The only way for the child to successfully achieve the formal operations stage is through repeated experimentation and reinvention of the related rules. Direct teaching of ideas usually hampers the learner's initiative and motivation in the construction of knowledge. Piaget's theory is strongly oriented to scientific and mathematical thinking. He emphasized that the use of actions and operations is important in mathematics education. The use of activities and self-directed experimentation should be provided as much as possible for science education. Collaboration and interchange among the students themselves is crucial for the development of learning. However, with the restriction of the cost or the availability of the experimentation equipment, the computer is a convenient and important tool and has great potential to provide the students with opportunities to go through a variety of activities and exercises in actions (Gredler, 2001). One of Piaget's famous pupils, Seymour Papert, had a profound influence on applying technology in instruction (Roblyer & Edwards, 2000). Papert was a mathematician. After studying with Piaget in Geneva from 1959 to 1964, he joined the Artificial Intelligence Laboratory at the Massachusetts Institute of Technology and conducted experiments with Logo, a programming language. Papert published a book entitled Mindstorms: Children, Computers, and Powerful Ideas, which raised national concerns about the potential role of technology in providing alternatives in 33 educational methods. This book also became the first widely recognized constructivist view of educational practice with technology resources. In this book, Papert viewed Logo as a resource for encouraging learning because Logo is graphics-oriented and allows children to see cause-and-elfect relationships between the logic of programming commands and the pictures. These Logo activities make possible "microworlds" to incubate for knowledge and allow children to pose and test hypotheses (Papert, 1980). Unlike Piaget who was not concerned with instructional methods or curriculum matters and did not try to accelerate the stage of cognitive development, Papert felt that "children could advance in their intellectual abilities more quickly with the right kind of environment and assistance" (Roblyer & Edwards, 2000, p. 64). Vygotsky's Cultural-Historical Theory The work of the Russian philosopher and educational psychologist Lev Vygotsky also contributed great support to constructivist approaches. He had more influence on the development of educational theory in the United States than in Russia (Gredler, 2001). The primary goal of his work was to "reformulate psychology as a part of a unified social science and to create a comprehensive analysis of psychological function" (Gredler, 2001, p. 277). He felt that cognitive development was directly related to and based on social development. Human mental abilities develop through the interactions of the individual with the world. That is, cognitive development is based on social interaction and experiences. He designated "signs and symbols" as "psychological tools", which direct the mind and change the process of thinking. These psychological tools are different across cultures and throughout human history (Gredler, 2001). His concepts of "scaffolding" and the "zone of proximal development" are important. He explained that the student represents one end of the continuum of understanding and the 34 teacher represents the other. The gap between the two ends is called the "zone of proximal development". Scaffolding is the process of bridging the gap between teacher-supervised work and independent work. Computer technology can be used as a "psychology tool" for students (Dixon-Krauss, 1996). Technological devices and media are designed to facilitate instruction that is developed slightly ahead of the student's development. Social processes can also be facilitated through or imitated by the computer, which acts as the more competent peer to enhance the zone of proximal development and artificially provide a sociocultural means of learning. For example, multimedia which gathers any combination of heading text, word-processed text, clip art, animation, sound, graphics, movie clips, and control buttons into a format can be highly interactive with the student (Dixon-Krauss, 1996). The advantage of multimedia is for the user to choose the appropriate path and pace based on the student's preferences and prior knowledge. Also, multimedia allows the student to function at a high level on his or her own, as well as on a higher level in interacting with the tool. That is, the tool acts as a "scaffold between superordinate and subordinate concepts, linking the learner's prior knowledge to new knowledge" (Dixon-Krauss, 1996, p. 180). Vygotsky believed that the goal of education is to develop children's personalities. The human personality has its creative potential, and education should assist to discover and develop this potential to its fullest. With proper activities, students can master their inner value and cognitive development. Teachers can direct and guide the activities but cannot force their will on the students. The most valuable methods for teaching are those that can meet individual student's developmental stages and needs. Therefore, these methods cannot be the same for every student (Roblyer &; Edwards, 2000). Vygotsky's ideas have shown great influence on constructivist thought. 35 Varieties and Characteristics of Constructivism According to Phillips (2000), constructivism refers to at least two different things. For the first type, constructivism describes "the thesis about the disciplines or bodies of knowledge that have been built up during the course of human history" and " these disciplines are human constructs, and that the form that knowledge has taken in these fields has been determined by such things as politics, ideologies, values, the exertion of power and the preservation of status, religious beliefs, and economic self-interest" (Phillips, 2000, p. 6). Many theorists believe that the origin of human knowledge is to be explicated using sociological tools. This broad area of constructivism is often called "social constructivism" or sometimes "social constructionism". There were different degrees of belief within social constructivism, such as radical, progressive, conservative, reactionary, and so on. The most extreme version was developed by a group known as "Edinburgh School" of sociologists of knowledge. This school believes that "the form that knowledge takes in a discipline can be fully explained, or entirely accounted for, in sociological terms" (Phillips, 2000, p. 8). Lev Vygotsky's social emphasis of constructing knowledge has great influence in social constructivism. Some of the contemporary famous theorists include David Bloor, Barry Barnes, Steve Woolgar, Bruno Latour, and Kenneth Gergen (Phillips, 2000). Among them, Kenneth Gergen is the representative figure of the radical end of social constructivism. Regarding the second type, constructivism refers to a set of views about how individuals learn (and about how those who help them to learn ought to teach). Simply, this type of constructivist view posits that "learners actively construct their own sets of meanings or understandings; knowledge is not mere copy of the external world, nor is knowledge acquired by passive absorption or by simple transference from one person (a teacher) to another (a learner)". In sum, "knowledge is made. 36 not acquired" (Phillips, 2000, p. 7). Phillips (2000) expressed this type as "psychological constructivism". However, he clarified that not all psychological constructivists are psychologists. The focus of the psychological constructivism is on the way that individuals construct their own psychological understanding. Rand Spiro, Ernst von Glasersfeld, and a group of researchers developed a constructivist theory. These constructivists described their radical view of psychological constructivism and label their position as "radical constructivists" (Roblyer & Edwards, 2000; Phillips, 2000). Among them, Glasersfeld was the representative figure. The main idea of radical constructivism is that "human knowledge cannot consist of accurate representation or faithful copying of an external reality, that is, of a reality which is nonphenomenal, existing apart from the subject's experiences" (McCarty & Schwandt, 2000, p. 43). Also, knowledge is in the heads of persons who have no alternative but to construct the knowledge based on their subjective experiences. There is no way to know if two persons' experiences are exactly the same (Phillips, 2000). Radical constructivism holds the belief that individuals can only really know their own constructions of reality. They can construct truth that needs no corroboration from outside (Howe & Berv, 2000). Piaget had a great influence on radical constructivism. From the constructivist perspective, learning is also a process of assimilation and accommodation. The achievement of equilibration helps to develop complex levels of learning. Learning usually occurs without any formal instruction (McCarty & Schwandt, 2000). Both social and psychological constructivists have had great influence and implications on education. They stress the importance of students being active learners. Students construct knowledge themselves rather than passively receive knowledge from teachers. Students need to have the abilities to solve real-life practical problems rather than learning "inert knowledge" that they cannot use in authentic situations. Also, students work in cooperative groups rather than individually. However, radical constructivists expect a teacher to know every 37 student's mental constructions. The teacher needs to establish an environment in which experiential and conceptual differences can support learning. For social constructivists, the teacher is expected to be a coordinator, facilitator, or resource advisor in assisting students to adapt to various social environments (McCarty k. Schwandt, 2000). Constructivism has had a significant influence in literature, art, social science, religious education, and particularly in contemporary science and mathematics education (Matthews, 2000; Phillips, 2000). There have been a number of special issues in journals, such as Educational Studies in Mathematics, Journal for Research in Mathematics Education, and Educational Research devoted to constructivism (Matthews, 2000). The curriculum has also been influenced by the constructivist theory. For example, the revised 1994 National Science Education Standards illustrated that science is a mental representation constructed by the individual (cited in Matthews, 2000). The influence has also spread to statistics education as Moore (1997) has advocated. Computer-Assisted Instruction in Statistics Education When used in teaching statistics, computers are usually helpful in three broad areas: (a) reducing the need for lengthy manual calculations, (b) facilitating graphical data analysis, and (c) illustrating statistical concepts by means of simulation experiments (Snell & Peterson, 1992). For example, when using computational packages, such as SAS (Statistical Analysis System), SPSS (Statistical Package for the Social Science), and Minitab, the students can save time in tedious computation work. However, the package sometimes hides some useful information necessary for understanding. Manual computations are still needed to enhance the learning process and to reinforce the statistical concepts and techniques (Khamis, 1991). Statistical graphics, such as histograms, boxplots, stem-and-leaf 38 diagrams, sampling distributions, and graphical presentations are helpful and important for learning statistics (Krieger & James, 1992; Snell & Peterson, 1992). Simulations are excellent tools in presenting some abstract concepts in dynamic and interactive ways. For example, students can simulate the central limit theorem and construct their understanding in the process with computer graphics (Krieger & James, 1992). Simulations allow students to investigate phenomena in a simplified and concrete setting (Barnet, 1999). In the past years, computers have been used in teaching statistics. The history of computer-assisted instruction is briefly described in the remainder of this section. The use of computers in teaching statistics began as early as the 1960s. Grubb and Selfridge (1964) developed a computer-based teaching machine using the IBM 650 RAMAC System to tutor the students in learning statistics. They mentioned that "a computer seems to be the only all-encompassing efficient tutorial device in the growing teaching machine movement" (p. 20). Johnson (1965) developed a program using the Michigan Algorithm Decoder to generate quasi-normally distributed random numbers for teaching statistics. Cooley (1969) used computers for laboratory exercises, generating random numbers, empirical theoretical distributions, Monte Carlo studies, and computing means. A report of the Computer Science Conference emphasized that computers will alter the curriculum of some fields which include statistics in fundamental ways (Lockard, 1967). In the early 1970s, more work was presented using computers in statistics education. Most of these studies applied the techniques of graphical displays, simulations, computational aids, drill-and-practice exercises, and tutorials on mainframe (e.g., Duchastel, 1974; Edgar, 1973; Erickson & Jacobson, 1973; Lehman, 1972; Mead, 1974; Moore, 1974; Skavaril, 1974; Tanis, 1973; Thomas, 1971; Wegman, 1974). In 1973, Minitab was developed for an introductory pre-calculus statistics course offered at Pennsylvania State University. At that time, Minitab was 39 command driven and did not provide help or advice (Ryan, Joiner, & Ryan, 1976). SAS, SPSS, and BMDP (Biomedical Computer Program) were also increasingly used in analyzing data (Pollane & Schnittjer, 1977; Schnittjer, 1976; Wimberley, 1978). There were more studies in developing and applying a variety of statistical computational programs (e.g., Cerny & Kaiser, 1978; Conard & Lutz, 1979; James, 1979; Milligan, 1979; Steiger, 1979; Sterrett & Karian, 1978; Thompson & Frankiewicz, 1979). Some tutorials (Knief & Cunningham, 1976; Scalzo & Hughes, 1976) and simulations (Snyder, 1977) were used in assisting statistical instruction. Computers were also used as a problem-solving tool to teach statistics (Tubb, 1977). The IBM personal computer was introduced in 1981 and Apple Macintosh in 1984 (Kidwell & Ceruzzi, 1994). In the 1970s, the main statistical packages such SPSS, SAS, BMDP and Minitab, were run on the DEC-11 minicomputer. In the 1980s, these packages all had personal computer versions (Evans & Newman, 1988). The interactive mode on the personal computer allowed users to enter commands one at a time in the process of executing a program. There were also more computer statistical programs developed to assist statistical learning and teaching (e.g., Bajgier, Atkinson, & Prybutok, 1989; Butler & Neudecker, 1989; Cake & Hostetter, 1986; Collis, 1983; Dambolena, 1986; Emond, 1982; Furtuck, 1981; Goodman, 1986; Gordon & Gordon, 1989; Mausner, Wolff, Evans, DeBoer, Gulkus, D'Amore, et al., 1983; O'Keeffe & Klagge, 1986; Olson & Bozeman, 1988; Rogers, 1987; Stemmer & Berger, 1985; Stockburger, 1982; Ware & Chastain, 1989). Butler and Eamon (1985) evaluated 17 microcomputer statistical packages and indicated that the emphasis of these packages was usually not on analysis of research data but helping students learn statistics concepts or procedures. In general, the microcomputer packages were easier to learn and to use than were mainframe packages. Couch and Stoloff (1989) conducted a national survey of microcomputer used by academic psychologists and found that the most commonly used types of software were statistical packages (31%), and the most valued courses of computer 40 use were research methods and statistics. In the early 1980s, the idea of creating intelligent statistical software was presented (Hahn, 1985). The purpose was to have the knowledge of statistical experts in computer programs provide guidance on which analyses to conduct and how to interpret the results (Hahn, 1985). Pregibon and Gale (1984) developed an expert system called REX to provide guidance, interpretation, and instruction for doing regression analysis. Athey (1987) developed a knowledge-based mentor system to assist statistical decision-making and to stimulate students' learning of data analysis. In the 1990s, as computer technology continued to be upgraded and improved, the benefits and power of computers also attracted more development and applications of computer programs and packages in teaching college-level statistics. An increasing number of studies has been published in related journals and publications. For example, tutorial programs were developed to demonstrate statistical concepts (e.g., Mitchell & Jolley, 1999; Sedlmeier, 1997; Strube, 1991; Strube & Goldstein, 1995). A large number of simulation programs were used in presenting difficult and abstract statistical concepts (e.g., Albert, 1993; Bradley, Hemstreet, & Ziegenhagen, 1992; Derry, Levin, & Schauble, 1995; Marasinghe & Meeker, 1996; Perry &: Kader, 1995; Sterling & Gray, 1991). Computational statistical packages such as SPSS, SAS, Minitab, and others were also frequently used in assisting students in data analysis and interpretation (e.g., Christmann & Badgett, 1997; Eamon, 1992; Gilligan, 1990; Gratz, Volpe, & Kind, 1993; High, 1998; Hollowell & Duch, 1991; Stephenson, 1990; Walsh, 1994; Wang, 1999). In addition, an increasing number of applications of expert-system programs has been used to support statistical decision-making and analysis (e.g., Marcoulides, 1990; Sandals & Pyryt, 1992; White, 1995). As multimedia technology was developed and used in different educational settings, attempts of applying multimedia tools in teaching statistics were also vigorously applied (e.g., Carpenter, 1993; Dorn, 1993; Gonzalez & Birch, 2000; 41 Hassebrock & Snyder, 1997; Koch h Gobell, 1999; Moore, 1993). Velleman and Moore (1996) indicated that multimedia has had a dramatic influence on education as well as statistics education. Multimedia offers a highly interactive and individualized environment with texts, sound, images, full-motion video, animations, and computer graphics for students to manipulate animations to respond to questions and work independently on newly learned concepts. Finally, another important development of computer technology is the prevalence of the Internet and the fast growing applications on the World Wide Web. The Internet is popular because it is widely available, easy to use, and highly visual and graphic (Roblyer & Edwards, 2000). Naturally, statistics teachers have been trying to take advantage of the Internet and build web-based computer-assisted tools for statistical teaching (e.g., Aberson, Berger, Emerson, & Romero, 1997; Britt, Sellinger, &: Stillerman, 2002; Brigg & Sheu, 1998; Lane, 1999; Leon & Parr, 2000; Malloy k. Jensen, 2001; Romero, Berger, Healy, & Aberson, 2000; West, Ogden, & Rossini, 1998). With the review of computer technology applied to assist statistics education in the past 40 years, computers have shown to play an important role in learning and teaching statistics. Teachers, learning theorists, and computer specialists have placed great effort in developing a wide range of programs and packages. In particular, during these recent years, the newer technologies used to create Web-based collaborative programs, intelligent expert systems, simulations, and multimedia tools are based on socio-cultural theories, constructivist theories, and cognitive theories (Schacter & Fagnano, 1999). The questions are: Do computer technologies really have effects on improving students' statistical achievement and learning? What types of programs are the most effective ones? During these years, some researchers have conducted experimental studies to evaluate the effectiveness of different types of computer programs and methods (e.g., Aberson, Berger, Healy, Kyle, & Romero, 2000; Athey, 1987; Christmann & Badgett, 1997; Earley, 2001; Dorn, 1993; Gilligan, 1990; Gonzalez & Birch, 2000; Gratz, Volpe, & Kind, 1993; 42 High, 1998; Hollowell & Duch, 1991; Hurlburt, 2001; Kao & Lehman, 1997; Koch &: Gobell, 1999; Jones, 1999; Lane, 2002; Lane & Tang, 2000; Marcoulides, 1990; McBride, 1996; Myers, 1989; Olson & Bozeman, 1988; Porter & Riley, 1996; Raymondo & Garrett, 1998; Rosen, Feeney, & Petty, 1994; Sterling & Gray, 1991; Varnhagen & Zumbo, 1990; Wang, 1999; Ware & Chastain, 1989). There are different results and conclusions from these empirical studies. Research Synthesis Methods In the social and behavioral sciences, a single experiment or a single study can rarely provide definitive answers to research questions. In fact, conducting a few studies may not even resolve a minor issue. After the accumulation and refinement of a set of studies, literature reviews of empirical research are important to summarize and clarify the research findings (Cooper & Hedges, 1994; Glass, 1978; Hunter & Schmidt, 1990; Wolf, 1986). Methods of combining results across studies have existed since the early 1900s (Cooper & Hedges, 1994; Olkin, 1990). For example, in 1904 "Pearson took the average of estimates from five separate samples of the correlation between inoculation for typhoid fever and mortality" (Cooper & Hedges, 1994, p. 5). Some other early work for combining estimates include papers by Tippett (1931), Birge (1932), Fisher (1932), Pearson (1933), Cochran (1937), and Yates and Cochran (1938). Traditional Review Methods Prior to the late 1960s, the primary studies on any specific education or social science topics were still not common (Hunter & Schmidt, 1990). Consequently, the traditional narrative review of the small number of studies was satisfactory for synthesizing the results. They are usually described as "literary", "qualitative", "nonquantitative", and "verbal" (Hunter & Schmidt, 1990, p. 468). 43 When there are few studies, the researcher uses the results of each study and attempts to find explanations. If the number of studies is large, the studies will never be comparable. The results will usually become "pedestrian reviewing where verbal synopses of studies are strung out in dizzying lists" (Glass, 1976, p. 4). In addition, the researcher may use unrepresentative studies to simplify the integration by excluding some other studies which do not agree with the chosen ones. These traditional narrative approaches have been criticized with the potential problems which include (a) the researcher's subjective view to include the studies, (b) differential weighting in the interpretation, (c) misleading interpretations, (d) failure to examine characteristics as potential explanations for different or consistent results across studies, and (d) failure to examine moderate variables (Wolf, 1986). Another approach was the traditional "vote-counting" method. Light and Smith (1971) were the first to propose a method for taking a vote of study results. The researcher categorized the findings of the relationship between the independent variable and the dependent variable of all relevant studies into three outcomes (i.e., positively significant, negatively significant, or no specific relationship in either direction). The number of studies of each category was simply counted. The modal category was used as the best estimate of the relationship between the independent and dependent variables (Light & Smith, 1971). Hedges and Olkin (1985) demonstrated the inadequacy of the traditional vote-counting approach to detect treatment effects when the amount of primary studies increases. Wang and Bushman (1999) summarized the problems of the vote-counting approach. First, this approach does not incorporate sample size into the vote. When sample size increases, the probability of obtaining a statistically significant result increases. Second, this approach does not provide any effect size estimate. Third, this approach has low power for the range of sample sizes. That is, as an example in Hunter and Schmidt (1990) about the correlation of general intelligence and proficiency in clerical work, the proficiency measures cannot be 44 obtained on all the applicants and performance can be measured only on those hired. The restriction of range of sample lowers the effectiveness of the vote-counting methods. Statistically Correct Vote-Counting Methods As described above, the traditional vote-counting method is statistically inadequate (Hedges & Olkin, 1985). There are methods of integrating research results based on vote-counting that are statistically correct (Hunter & Schmidt, 1990). Hedges & Olkin (1980) proposed procedures to solve the statistical problems. Vote-counting procedures are used when studies do not provide enough information to calculate an estimate of effect size, but do contain information about the direction or statistical significance. One type of vote-counting method uses only significance levels. Basically, the researcher uses a count to decide the proportion of studies which report statistically significant results and test the proportion against the proportion expected under the null hypothesis (Hunter & Schmidt, 1990). Another type of vote-counting method can yield estimates of effect sizes if the sample sizes are known for all studies. The effect size can be estimated from the proportion of positive results or from the proportion of positive significant results. The detailed procedures and formula can be found in Bushman (1994), Hedges and Olkin (1980), and Wang and Bushman (1999). When synthesizing research studies, the researcher usually collects the studies with needed information to calculate the effect sizes. However, it is not unusual to find some studies without enough information. One method of dealing with this problem is to omit these studies. Another method is to apply the vote counting methods to estimate the effect sizes to avoid losing these studies in the count. 45 Meta-Analysis Methods In 1952, Hans Eysenck argued that psychotherapy had no positive effects on patients in chnical psychology (Eysenck, 1952) and started a strong debate. By the mid-1970s, many studies of psychotherapy had produced positive, null, and negative results. To assess Eysenck's argument, Smith and Glass (1977) integrated 375 psychotherapy studies by statistically standardizing and averaging treatmentcontrol differences. Glass (1976) coined the term "meta-analysis" to refer to "the analysis of analyses" and "the statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the finding" (p. 3). At the same time when Glass was developing his meta-analysis method, several applications of meta-analytical techniques also attracted the attention of the contemporary social science research community to the importance of systematical synthesis and evaluation across studies. Some included among these applications were Schmidt and Hunter's (1977) validity generalization of employment tests, Rosenthal and Rubin's (1978) integration of interpersonal expectancy effect, and Glass and Smith's (1979) synthesis of the literature on class size and achievement. The early meta-analytic research basically involved three types of procedures: (a) summarizing relationships, (b) determining moderator variables, and (c) establishing relationships by aggregate analysis (Rosenthal, 1991). The first type estimated the average correlation or the combined p level associated with that correlation for all the studies. The second procedure calculated a correlation between some characteristic of the studies and an index of the effect size determined in the primary studies. And, the third type of procedure correlated mean data obtained from each study with other mean data or with other characteristics found in each study. More recent work of meta-analysts have added to the variety of approaches (Rosenthal, 1991). An essential element of meta-analysis is the "effect size". Meta-analysis 46 represents each study's findings in the form of effect size. It is a common metric for measuring "the degree to which the phenomenon is present in the population," or "the degree to which the null hypothesis is false" (Cohen, 1988, pp. 9-10). The purpose of using the effect size is to standardize the different findings in numerical values which are interpretable in a consistent way across all the variables and measures (Lipsey &: Wilson, 2001). During these years, the importance of effect size has been increasingly emphasized in reporting experimental results in publications (Thompson, 1994, 2001). For example, the American Psychological Association (APA) Task Force on Statistical Inference emphasized "Always provide some effect-size estimate when reporting a p value" and there are at least 19 journals requiring effect size reporting (Wilkinson k APA Task Force on Statistical Inference, 1999, p. 599). The fifth edition of Publication Manual of the American Psychological Association (2001) also includes "failure to report effect sizes" (p. 5) as a kind of defect in the design and reporting of research. The advocacy of reporting effect sizes results from the deficiency of statistical hypothesis testing in interpreting the research results (Thompson, 1999). For a long time, statistical significance hypothesis testing has been criticized to be (a) overly dependent on sample size, (b) misinterpreting p as the probability that the null hypothesis is false, (c) testing for assumption versus testing for the research hypothesis, and (d) making some nonsensical comparisons (Anderson, Burnham, & Thompson, 2000; Cohen 1994; Hunter & Schmidt, 1997). Schmidt (1996) strongly advocated that we must "abandon the statistical significance test" and we must teach "point estimates of effect sizes and confidence intervals around these point estimates. For analysis of data from multiple studies, the appropriate method is meta-analysis" (Schmidt, 1996, p. 116). Different measures of effect size have been developed over several decades. Cohen (1988) describes several dimensionless entities that result in specific experimental effect size statistics. Kirk (1996), Rosenthal (1994), and Snyder and 47 Lawson (1993) provided useful and practical summaries of these measures. The variety of effect size measures can be categorized into two broad families as: group mean differences {d family) and association strength (r family) (Elmore & Rotou, 2001; Maxwell & Delaney, 1990; Rosenthal, 1994). In 1969, Cohen proposed d, which is the difference between population means divided by the average population standard deviation (Hedges & Olkin, 1985). In 1976, Glass proposed the metric A, which is defined as the mean difference between the experimental group and the control group divided by the control group standard deviation. Hedges (1981) presented another index of effect size g as the mean difference between the experimental group and the control group divided by the pooled standard deviation which is an approximately unbiased estimate of the population standard deviation. The Pearson product moment correlation r usually involves a finding that deals with the strength of association between two variables. Rosenthal (1984) presented r as the effect size index with the Binomial Effect Size Display (BESD) and explained that BESD is a way to show the practical importance of the correlation index. "The correlation is shown to be the simple difference in outcome rates between the experimental and control groups in a standard table in which column and row totals of which always add up to 100" (p. 242). The BESD can be produced from any effect size r and compute the treatment condition success rate as 0.50 plus r/2 and the control condition success rate as 0.50 minus r/2. For example, an r of .22 will obtain a treatment success rate of 0.50 + 0.22/2 = 0.61 and a control success rate of 0.50 — 0.22/2 = 0.39. There are other correlation indices, such as the correlation between two dichotomous variables $, the correlation between one continuous variable and one dichotomous variable rp^, and the correlation between two ranked variables p. In addition, there are some squared effect size indices including, r^, and rf'. However, because directionality is lost when squaring indices of effect size and their 48 magnitudes are hard to interpret, researchers usually avoid using them in meta-analysis (Rosenthal, 1994). Cohen (1988) also offered a variety of effect size indices depending on the specific application. For example, the effect size q represents the difference between correlation coefficients. The effect size g is (population probability — 0.50). The effect size h is the difference between proportions. When Glass proposed meta-analysis methods in 1976, Hunter and Schmidt were unaware of Glass's work and developed their meta-analysis methods in validity generalization. They applied their methods on the empirical data set from personal selection research in the field of industrial psychology (Schmidt & Hunter, 1977). The meta-analysis methods of Hunter and Schmidt emphasize effect sizes as does Glass. Effect sizes in their methods are usually expressed as correlations. Unlike Glass's meta-analysis. Hunter and Schmidt corrected the mean effect size by "testing the hypothesis that the variance of observed effect sizes is entirely due to various statistical artifacts" (Hunter & Schmidt, 1990, p. 484). These artifacts include (a) sampling error, (b) error of measurement in the dependent and independent variables, (c) range restriction in the independent variable, (d) instrument validity, and (e) computational, transcription, and typing errors. There are many sources of errors that may decrease the obtained effect sizes. With the development of meta-analytical methods, the integration of research studies becomes objective, systematic, and scientific (Wolf, 1986). Through appropriate use of statistical techniques, useful information can be obtained from primary studies, and population parameters can be estimated by the objective and accurate methods. Also, the relationship among the study characteristics can be simultaneously examined. Researchers can explore possible moderator variables when there is a weak or inconsistent relationship between the independent variable and the dependent variable. That is, the interaction between the treatment and studies can be effectively investigated. In addition, an analysis of outliers, which 49 may contribute to the heterogeneity of findings among studies, may allow researchers to obtain more understanding of the topic of interest (Wolf, 1986). Meta-analysis has received some criticism. Wolf (1986) summarized these criticisms into four categories. The first criticism is related to the quality of the studies. Poorly designed studies are generally included along with results from good studies, which makes the results of meta-analysis hard to interpret. One way to handle the problem is by coding the quality of the design of each study and examining how the results differ for poor and good studies (Wolf, 1986). The second criticism is the "apples and oranges" problem. Any synthesis of results from multiple studies usually involves a combination of studies dissimilar in some respects, such as measuring methods, variable definitions, populations, or research design. The critics argue that it is not logical to draw conclusions by combining studies that are operationized differently or measured with different metrics. Hall, Tickle-Degnen, Rosenthal, and Mosteller (1994) argued that "some degree of mixing apples and oranges must occur in the tidiest of studies. Even when studies are intended to be direct replication, exact replication probably cannot occur." (p. 20). However, researchers need to be sensitive to the degree of the dissimilarity. Wolf (1986) suggested that this problem can be examined by coding the characteristics in each study and statistically testing if these differences are related to the results of the meta-analysis. The third criticism is the "file drawer" problem. Studies published in the behavioral and social sciences are likely to be a "biased sample" of the actual studies that are conducted (Rosenthal, 1991). Usually, published research is biased in favor of significant findings, and nonsignificant findings are proportionally published less. Although meta-analysts make efforts to obtain comprehensive and representative studies, the samples of empirical primary studies are still likely to be biased. Wolf (1986) mentioned an approach to review results in books, dissertations, and unpublished papers presented at professional conferences and compare them to 50 the results of published studies. Begg (1994) also suggested to attempt to track down relevant unpublished studies on the topic by following up on published abstracts and contacting the investigators in the fields. Cooper (1979) proposed to calculate an estimate of the number of unpublished studies with insignificant findings that can be used to reverse a significant result in a meta-analysis. The number is known as the "Fail Safe Number" (Wolf, 1986). If the number is large, the concern of publication bias may be reasonably reduced. Finally, the fourth criticism is the problem of including multiple effect sizes from the same experimental study in one meta-analysis. Hunter and Schmidt (1990) pointed out that the method of accumulating multiple eff'ect sizes depends on the nature of the research design of the study. In general, three kinds of replication are considered: (a) fully replicated designs, (b) conceptual replication, and (c) analysis of subgroups. First, a fully replicated design occurs when a study can be divided into several parts that are conceptually equivalent but statistically independent. For example, if data are collected at difi^erent sites, the outcomes from each site are statistically independent and can be treated as different studies. Second, a design of conceptual replication occurs "when more than one observation that is relevant to a given relationship is made on each subject" (Hunter & Schmidt, 1990, p. 451). One example is replicated measurement that uses multiple indicators to assess a variable. In this design, every part can be calculated to have a different result, and these results can be accumulated within the study. Or, the results can be combined to be a single measure. And, third, the design of subgroups is with subgroups such as race or gender in the study. Usually, the subgroup estimates can be used as independent results. Hunter and Schmidt (1990) provided more discussion in the different designs and appropriate methods for including the multiple effect sizes in meta-analysis. Although there are some problems associated with meta-analysis methods, these methods have greatly clarified the confusion and have answered the questions in many topics. 51 Meta-Analyses on Computer-Assisted Instruction Since meta-analysis methods have been developed, many researchers have applied these methods to examine the effectiveness of various computer-related instruction methods in some aspects of learning for various disciplines and different levels of education. Kulik (1994) summarized findings of 12 meta-analyses on computer-based instruction conducted from 1978 to 1991 ranging from elementary to adult education. The range of the obtained average effect sizes was between 0.22 and 0.57 (see Table 2.2). Some general conclusions were made from the findings of these meta-analyses: (a) students usually learn more when receiving computer-based instruction; (b) students learn faster with computer-based instruction; (c) students like the classes more when receiving assistance on computers; (d) students develop more positive attitudes toward computers; and (e) computers do not, however, have positive effects in every area in which they are used (Kulik, 1994). In the last decade, there have been a number of meta-analyses to investigate the effect of computer-based instruction on students in different levels and in various areas. Some meta-analysis studies are listed in Table 2.3. The results of these studies show that the range of these average effect sizes is between 0.127 and 0.51, which indicates that computer-assisted instruction has an overall moderate effect on students' statistical learning. The results of some meta-analyses also reveal that CAI has different effects for different subject areas for different levels of students. For example, Christmann & Badgett (1997) compared the academic achievement of secondary students who received traditional instruction supplemented with CAI and those who received only traditional instruction across eight subject areas. Although the combined effect size of the primary studies is 0.209, the comparative effectiveness of CAI was very different among the eight subject areas: science, 0.639; reading, 0.262; music, 0.23; special education, 0.214; social studies, 0.205; math, 0.179; vocational education, -0.08; and English, -0.42. 52 Among the literature on applying meta-analytical methods to investigate the effectiveness of CAI, there have been a handful of studies in mathematics education. However, the subject of statistical learning and teaching has rarely been examined. Only one meta-analysis has been found to address the effect of CAI in statistics education. Christmann and Badgett (1999) integrated nine studies to examine the comparative effectiveness of using various microcomputer-based software packages on statistical achievement. The nine primary studies were conducted from 1987 to 1997. The computer statistical software packages include Minitab, SPSS/PC, MyStat, TruStat, expert mentoring system, HyperCard problem-solving program, and statistical exercises. There were 14 effect sizes produced from the nine primary studies. The average effect sizes of these studies ranged from -0.555 to 0.708. The overall mean effect size was calculated to be 0.256. The meta-analysis concluded that the typical student moved from the 50th percentile to the 60th percentile when exposed to microcomputer-based software. This study also categorized the software into three types: CAI, problem solving, and statistical software packages. The mean effect size was 0.929 for the type of problem-solving, 0.651 for CAI, and 0.043 for statistical software packages (MyStat, Minitab, and SPSS). In addition, the study correlated the effect sizes with the progressive time span in years and found that there was no significant correlation. The correlation was -0.052 between mean effect size and years. This meta-analysis urged that more continuing studies are needed to examine the effectiveness or lack of effectiveness of computers in college-level statistics education (Christmann & Badgett, 1999) 53 Table 2.2 Findings of 12 Meta-Analyses on Computer-Based Instruction Published between 1978 and 1991 Meta-Analysis Instructional Level No. of Studies Average Effect Size Bangert-Drowns, Kulik, k Kulik (1985) secondary 51 0.25 Burns & Bozeman (1981) elementary and secondary 44 0.36 Cohen & Dacanay (1991) health professions education 38 0.46 Fletcher (1990) higher education & adult 28 0.50 Hartley (1978) elementary & secondary math 33 0.41 Kulik & Kulik (1986) college 119 0.29 Kulik, Kulik, & Shwalb (1986) adult education 30 0.38 Kulik, Kulik, k. Bangert-Drowns (1985) elementary 44 0.40 Niemiec & Walberg (1985) elementary 48 0.37 Roblyer (1988) elementary to adult education 82 0.31 Schmidt, Weinstein, Niemiec, & Walberg (1985) special education 18 0.57 Willett, Yamashita, & Anderson (1983) precollege science 11 0.22 Note. Adapted from Kulik, 1994, p. 12 54 Table 2.3 Findings of 12 Meta-Analyses on Computer-Based Instruction Published between 1993 and 2000 Meta-Analysis Instructional Level Content No. of Studies Average ES Bayraktar (2000) secondary & college science education 42 0.273 Chadwick (1997) secondary mathematics 41 0.51 Christmann (1995) secondary mixed 24 0.233 Christmann & Badgett (1997) secondary eight curricular areas 42 0.209 Christmann & Badgett (1999) college statistical achievement 9 0.256 Christmann & Badgett (2000) college 18 0.127 Fletcher-Flinn & Gravatt (1995) elementary to college 120 0.24 Khalili & Shashaani (1994) elementary to college mixed 36 0.38 Kuchler (1998) secondary mathematics 61 0.32 Liao (1998) kindergarten to college hypermedia 35 0.48 Ouyang (1993) elementary mixed 79 0.495 Susman (1998) elementary to college cooperative learning 23 0.413 mixed mixed 55 CHAPTER 3 METHOD Cooper and Hedges (1994) divided the process of research synthesis into five stages: (a) the problem formulation stage; (b) the collection stage: searching the literature; (c) the data-evaluation stage: coding the literature; (d) the analysis and interpretation stage; and (e) the public presentation stage. For this meta-analysis, the purpose was to integrate the individual primary research studies concerning using computers in assisting introductory statistics teaching at the college level. The process of conducting this synthesis study includes the following five tasks: (a) determining and specifying the sampling criteria to select the primary studies to be included in and excluded from the meta-analysis, (b) identifying the characteristic variables which might be related to the effect of the outcomes, (c) coding these data, (d) calculating individual results from these primary studies and analyzing these outcomes by the appropriate characteristics, and (e) interpreting and reporting the results of the analysis. This chapter presents the meta-analysis method in four sections. First, the research questions provided in Chapter 1 are restated. Second, the sampling criteria and procedure used in this meta-analysis study are described. Third, the study characteristics which might influence the outcome effects are determined and described. And, fourth, the procedures of the statistical analysis are presented. Research Questions This meta-analysis seeks to examine the following questions: 1. How effective is the use of CAI in enhancing the statistical learning of college students as compared with non-computer instructional techniques? 56 2. Does the effectiveness of CAI differ by the publication year of the study? 3. Does the effectiveness of CAI differ by the source of the study (dissertation, journal article, or ERIC document)? 4. Does the effectiveness of CAI differ by students' level of education (undergraduate or graduate)? 5. Which modes of CAI techniques are the most effective for statistical instruction for college students? For example, there are drill-and-practice, tutorials, multimedia, simulations, computational statistical programs, expert systems, and web-based programs. 6. Does the effectiveness of CAI differ by the software type (commercial or teacher-made)? 7. Does the effectiveness of CAI differ by the level of interactivity of the program (interactive-PC, interactive-mainframe, or batch-mainframe)? 8. Does the effectiveness of CAI differ by the role of the program (supplement or substitute)? 9. Does the effectiveness of CAI differ by the sample size of the participants? Sampling Criteria and Procedure Two sampling criteria were applied for selecting an adequate sample of primary studies in this meta-analysis. The first required that a complete and representative sample of primary studies which have addressed the questions of interest be located and selected. The second required that the participants and treatment in the primary studies represent the participant population and treatment populations of interest (Hedges, 1986). 57 Studies selected in this meta-analysis were those that have investigated the use of computers in assisting college level students in introductory statistics instruction between 1985 and 2002 in the United States. College level includes 2-year and 4-year colleges and universities and both undergraduate and graduate students. These studies must have reported sufficient descriptive and inferential statistical data to be included in the meta-analysis. For example, studies need to provide means, standard deviation, variance, t test,or F test to allow the calculation of effect sizes. The main sources for searching these primary studies were published journals and books, as well as unpublished dissertations and conference papers. The major computer databases used consist of Dissertation Abstracts International, the Educational Resources Information Center (ERIC), and Psychological Abstracts (PsycINFO). Descriptive search phrases were used to identify related materials, including the combination of "computer-assisted instruction", "computer-based instruction", "computer-based learning", "computer-based education", "computer-enhanced instruction", "statistics education", "statistics teaching", and "statistics learning". Manual literature searches were also used to examine some relevant journals, such as the American Statistician, Behavioral Research Methods, Instruments, & Computers, College Teaching, Computers in the Schools, Educational Researcher, Journal of Educational Computing Research, Journal of Educational Research, Journal of Research on Computing Education, Teaching of Psychology, and Journal of Research on Computing in Education. In addition, the references in the primary studies and the relevant studies identified through computer databases and manual searches were used as another source to make more comprehensive searches. 58 Study Characteristics One of the purposes of research synthesis is to integrate empirical research for creating generalizations (Cooper & Hedges, 1994). Another important purpose is to identify study characteristics that may be moderator variables that are associated with effect magnitudes (Rosenthal, 1991). Inconsistent findings may imply that some variables moderate the treatment effect. The selection of study characteristics for inclusion in this study follows some guidelines (e.g., Hall, Tickle-Degnen, Rosenthal, & Mosteller, 1994; Lipsey, 1994; Rosenthal, 1991) and some studies regarding the use of computers to teach statistics (e.g., Christmann &: Badgett, 1997; Langdon, 1989; Liao, 1998; Roblyer & Edwards, 2000; Schram, 1996). The review sheet to record the statistical data and study characteristics is shown in Appendix A. The reasons for selecting these characteristics are provided in the following sections. Publication Year The first study characteristic was publication year. The primary studies selected for this meta-analysis are from 1985 to 2002. In 1980, the IBM microcomputer was created and gradually became popular. The quality of the computer programs for teaching statistics prior to 1980 was very poor (Langdon, 1989) and empirical research on investigating the effect of computer use to teach statistics was rare. With rapid expansion of the microcomputer market, the enhancements of software and the development of diverse applications, programming languages and web applications, the quality of CAI in statistics teaching may have changed from the time of the initial studies. 59 Publication Source The second study characteristic was the source of the study. There are three sources of the primary studies: published journal articles, ERIC documents, and dissertations. Since the unpublished manuscripts or studies have not been obtained, the file drawer problem is investigated to examine the possibility of selection bias (Wolf, 1986). The file drawer problem will be discussed in a later section of this dissertation. Educational Level of Participants The third study characteristic was the educational level of participants in the primary studies. Introductory statistics courses are a common element of curriculum for undergraduate and graduate students in colleges and universities.. The use of computers may have different effects on the two student types: undergraduate and graduate. Mode of CAI Program The fourth study characteristic was the mode of computer-assisted instruction. The major modes of computer-assisted instruction for statistics instruction include drill and practice, tutorial, simulation/gaming, problem solving, computation packages, hypermedia, and expert systems (Roblyer & Edwards, 2000). The effectiveness of CAI may differ among these modes. Type of CAI Program The fifth study characteristic was the type of computer software used in instruction. Over the years, there have been an increasing number of commercially developed computer applications and programs designed to enhance student 60 learning of statistical concepts, to facilitate computational skills, to assist selecting correct statistical analysis, and to present graphical statistical results. In addition, there have been many computer programs specifically designed or developed by instructors to meet different purposes. Consequently, software type was a promising variable for this study to examine the effect. Level Of Interactivity of CAI Program The sixth study characteristic was the interaction between the CAI program and the students. Steinberg (1991) indicated that interaction is an important feature of CAI, and one of the main functions of interaction is to foster learning. In most CAI programs, interactions consist of a sequence of question-response-feedback. However, "interaction is not synonymous with learning" (Steinberg, 1991, p. 100). While most computer programs operated on microcomputers are in an interactive mode, programs on mainframe computers are in both interactive and batch modes. To investigate if the effectiveness of CAI programs differ by the the level of interactivityse of the program, the variable was included in three categories (interactive PC, interactive mainframe, and batch mainframe). Instructional Role of CAI Program The seventh study characteristic was the role of the CAI program in teaching statistics. There are two common roles of CAI. One is a supplement to traditional instruction and the other is a substitute for traditional instruction. This variable was included in this study. Sample Size of Participants The eighth study characteristic was the total sample size. Cohen (1988) indicated that the sample size of the treatment group influences the reliability of 61 statistical tests. Therefore, this variable was used to determine if the effect of CAI differs by the sample size. Dependent Variable The dependent variable investigated by this meta-analysis was students' achievement in statistics. The outcomes include statistical concepts in diverse topics, computational skills, problem-solving skills, and programming skills. Statistical Analysis Various meta-analytic methods, as presented by Cooper and Hedges (1994), Glass, McGaw, and Smith (1981), Hedges and Olkin (1985), Hunter and Schmidt (1990), Rosenthal (1991), and Wolf (1986), have been applied to integrate the primary studies with consideration of specific statistical data reported in individual studies. Different statistics or metrics (e.g.. Glass's A, Hedges' g and d, Cohen's d, q, g, and h, and correlation r) have been used to calculate the effect size for each primary study and for each result within each study. For this meta-analytical study, the purpose was to examine the effect of CAf in college-level statistics instruction across studies. Hedges and Olkin's (1985) methods provide an appropriate framework and meaning. In this study, all statistics are converted to Hedges' d. This effect size index allows understandable comparisons of measures of the effectiveness of different treatments among primary studies for answering the research questions. Conceptualization of Effect Size Hedges (1986) distinguished three fundamentally different means of onceptualizing effect size. One is that effect size is an index of overlap between distributions, which was emphasized by Glass (1976) and Glass, McGaw, and Smith 62 0.85 therapy group control group 80th percentile of control group Figure 3.1 Graphical representation of effect size (1981). This conceptualization of effect size is illustrated in Figure 3.1, which has been adapted from Glass, McGaw, and Smith (1981, p. 29). This figure can be explained as representing an effect size of .85 for a treatment that improves the effect of the therapy group by .85 standard deviation compared to the control group. That is, the treatment increases the mean of therapy group to the 80th percentile of the control group. Hedges (1986) emphasized that this conceptualization of effect size is important because "overlap between distributions is a concept that has the same interpretation regardless of whether the distributions are the distributions of measures of the same (or similar) construct" (p. 366). The conceptualization is appropriate when combining effect sizes from a broad range of outcome constructs. The second way of conceptualizing effect size is as a scale-free index of treatment effect, which means that "its value does not change under linear transformations of the original observation" (Hedges, 1986, p. 367). The scale-free characteristic of effect size is important because it does not depend on the particular outcome measure used. Therefore, effect size is a way of placing treatment effect from different studies on the same scale. Hedges (1986) also emphasized that the 63 interpretation of effect size depends on the outcomes of different studies which measure the same construct. Effect size analyses can also be viewed as an analogue to pooling raw data from all k studies into a single analysis of two treatment by k studies (Hedges, 1986). The third way of conceptualizing effect size is as one of many equivalent methods to express the magnitude of a relationship between variables (Hedges, 1986). Cohen (1988) presented many different effect size indices for various conditions. Rosenthal (1984, 1991) developed transformations to convert effect sizes to correlation coefficients. Rosenthal (1991) also demonstrated the Binomial Effect Size Display (BESD), a contingency table to illustrate the magnitude of the relationship for a given effect size. Expressed in BESD, a treatment effect which looks small in an effect size or a correlation coefficient often appears to be much larger. Effect Size and Statistical Significance In assessing the relationship between two variables, two parts must be included. One is the estimate of the magnitude of the effect (the effect size). The other is the indication of the reliability or accuracy of the estimate of effect size, which is usually used with the result of the significance test of the difference between the observed and expected effect sizes for the null hypothesis involving the two variables (Rosenthal 1991). The general relationship between the effect size and the test of significance can be simply expressed as: Test of significance = size of effect x size of study Rosenthal (1991) illustrated this relationship with some examples for independent and for correlated observations. 64 Definition and Calculation of Effect Size Since an effect size refers to the strength of a relationship between the treatment and the outcome variable, the effect size estimate is often expressed as the magnitude of the difference between two group means in standardized terms as i= a (3.1) I where ( e and //c are the population means for experimental and control groups, respectively, and a is the population standard deviation. In 1969, Cohen proposed index d as an estimator of 5. Cohen's d is mathematically expressed as (Rosenthal, 1994, p. 237) d= (3 2) (Spooled where Me and Mc are the means of the experimental group and control group, respectively, and apooied is the population standard deviation estimator, which is the pooled sums of squares divided by N, the sum of sample sizes for the experimental and control groups in the study. In 1976, Glass proposed the metric A which is defined as the mean difference between the experimental group Me and the control group Mc divided by the control group standard deviation Sc- This effect size index is mathematically expressed (Rosenthal, 1994, p. 237) as A= Mp — Mr (3.3) When more than one experimental groups are compared to a common control group or when the standard deviations of the experimental and control group populations are surely different, Glass's A is appropriate (Wang & Bushman, 1999). Hedges (1981) presented another index of effect size g as the mean difference between the experimental group and the control group divided by the pooled 65 standard deviation Spooled which is an approximately unbiased estimate of the population standard deviation. Hedges and Olkin (1985) indicated that it often assumes that the population variances do not differ for the experimental and control groups. Under the assumption of equal variance, a more precise estimator of 5 can be obtained by pooling the variances of the experimental and control groups. Thus, Hedges's g is defined by (Rosenthal, 1994, p. 237) 9 = -4 — Mn 2 (3.4) Spooled with the pooled sample standard deviation (Hedges & Olkin, 1985, p. 79) A u e - l ) S l + (nc - l)^c " nE + nc-2 /o where ue and nc are the sample sizes of the experimental and control groups, and Se and Sc are the standard deviations of the experimental and control groups, respectively. Under the equal variance assumption, Hedges's g or Cohen's d provides more precise estimators over Glass's A. Hedges's g is generally preferred because of a smaller sample variance than Cohen's d. However, using a pooled standard deviation of the experimental and control group with heterogeneity of variance will lead to biased estimates of the effect sizes (Gleser & Olkin, 1994). Hedges's g has been shown to be an overestimate of the population effect size for small sample sizes, particularly samples with n less than 20 (Hedges, 1981). To correct the small sample bias. Hedges (1982) proposed a new estimator d to correct the biased 5' by a correction factor Cm- The unbiased estimator d is given by Me - Mc J _ d — Cjfng — . (3-6) Spooled An exact and an approximate expression of the correction factors bias are provided as — 2 V 2 Am — 1 for the sample 66 where m = ue + nc — 2, and r(.) is the gamma function. In this meta-analysis study, Hedges's d , as in Equation 3.6, is the basic effect size index used in the statistical analysis. A positive (negative) effect size indicates that the experimental group mean is greater than (less than) the control group mean. When a parameter is estimated, the distribution of the estimator is important. T h e d i s t r i b u t i o n o f H e d g e s ' d i s a p p r o x i m a t e l y n o r m a l w i t h i t s m e a n ( 6 ) and variance (a^). The estimated (T i d ) is given as (Hedges & Olkin, 1985, p. 86) _''^E + nc ^ — d'^ h uetic r 2[nE + Tic) (3-8) when the sample sizes of both experimental and control groups are large. A 100(1 — a)% confidence interval for the population parameter 6 is given by (Hedges & Olkin, 1985, p. 86) d - Za/2^ < S < d + Zai2^, (3.9) where Za/2 is the two-tailed critical value of the standard normal distribution. Although calculating effect size is straightforward, some studies did not report m e a n s a n d s t a n d a r d d e v i a t i o n s . T h e m o s t c o m m o n a l t e r n a t i v e s t a t i s t i c s a r e t or F, along with the experimental and control group sample sizes. These inferential statistics can be converted into Hedges' g through Equation 3.10 (Rosenthal, 1994, p. 238) / n ^ ^ ^ ^ /n^ + nc V nEUc ^ V So far, the effect size estimates discussed here index the magnitude by the standardized difference between independent experimental and control groups. There are situations in which only a single group is in the experiment, i.e., a repeated-measures design in which each observation receives pre- and 67 post-treatments assessments. Rosenthal (1991, 1994) presented the test of significance t for correlated observations as D rt — — X y/n / X (3.11) where ^ is analogous to Hedges' g. Cruz and Sabers (1996) responded to Ritter and Low (1996) and argued that effect sizes from between-groups and repeated-measures designs should not be combined without appropriate adjustment because the effect sizes are inherently different for the two designs. Rosenthal (1991) listed examples of the relationship between tests of significance and effect size estimates for independent and for correlated observations. In addition to calculating a Hedges's effect size d for each primary study, a Hedges's weighted effect size dyj (Hedges & Olkin, 1985, p. 302) was also calculated for each study by dw = wd (3.12) where effect size estimate is weighted by the inverse of its variance w = ^. (7^ (3.13) The reason for calculating weighted effect size is because each study has different sample sizes, and experiments with larger sample sizes produce more precise estimates of the population effect size. When combining the effect size from individual primary studies, the effect size estimates of studies with large sample sizes should be given more weight (Hedges & Olkin, 1985). Combination of Effect Sizes There are some methods used to combine the independent effects to estimate the population effect size. One method is simply to calculate an arithmetic mean (Glass, McGaw, & Smith, 1981). However, in a fixed-effects meta-analysis, a 68 preferred strategy is to use unbiased weighted estimators for the population effect size. The pooled estimator of the population effect size across the k studies is mathematically represented by (Hedges &: Olkin, 1985, p. Ill) k 1 k = —k P.14) with each effect size estimate di is weighted by, Wi, the reciprocal of its corresponding variance a^{di). The variance of k k i=l i=l £-) ^ is (Hedges & Olkin, 1985, p. 113) ^ (5s«) and the 100(1 — a)% confidence interval is d + - ^a/20-(C?+) < ^ < d + + Z„/2Cr(d+), (3.16) where Za/2 is the two-side critical value from the standard normal distribution. In meta-analyses, effect size estimates from the primary studies are expected to be representative of a normally distributed population. The data should be examined before performing other statistical analyses. When the results differ greatly, it may not be appropriate to combine the results into a single effect size estimate. One way to investigate if the distribution of effect size estimates are approximately normal is to use a normal quantile plot (Wang & Bushman, 1999). The quantiles of an observed distribution are plotted versus the quantiles of the standard normal distribution. The points on the plot will be close to the line X = Y if the observed data have a standard normal distribution. If the data are not normally distributed, the data might be from different populations (Wang & Bushman, 1999). 69 Hedges and Olkin (1985) proposed a homogeneity test for testing whether all studies can reasonably be described as sharing a common effect size before combining the estimates of effect size from individual primary studies. For a series of k primary studies, a statistical test is for the homogeneity of effect sizes is a test of the hypothesis (3.17) Hq — Si — 62 — • • • — Sk versus the alternative hypothesis that at lease one of the effect sizes differs from the others. The test is based on the statistic (3.18) where is the weighted estimator of effect size based on using the sample estimate of S and is given by Equation 3.14 (Hedges & Olkin, 1985, p. Ill) and a'^{di) is given in Equation 3.8. The statistic Q is "the sum of squares of the di about the weighted mean c?+, where the ith square is weighted by the reciprocal of the estimated variance of d" (Hedges & Olkin, 1985, p. 123). The Q statistic is compared with the percentage values of the distribution with k — 1 degrees of freedom. If the Q statistic exceeds the critical value of the distribution at a = .05, then the hypothesis that the studies share a common effect size is rejected. ANOVA Approach to Test the Moderating Effects of Categorical Study Characteristics When the studies do not share a common effect size, a test for treatment-by-study interactixon needs to be investigated. Sometimes it is also helpful to categorize the effect sizes into groups according to similarity. Then, one can test the moderating effect of one categorical study characteristic that had been overlooked previously. Hedges and Olkin (1985) proposed partitioning the Q statistic into two independent homogeneity statistics Qb and Qw and then 70 conducting an analogous ANOVA comparison. The Qb represents the between-group homogeneity statistic and the Qw represents the within-group homogeneity statistic. Assume that the studies are sorted into p groups and there are mi, m2, •••, rup studies in the p groups. The Qb statistic is used to test the null hypothesis that the average effect size does not differ across groups, Ho '• ^1+ = ^2+ = • • • = 5p_|_. (3.19) The Qb is essentially a weighted sum of squares of weighted group mean effect size estimates about the overall weighted mean effect size and is mathematically expressed by (Hedges & Olkin, 1985, p. 154) where ^ ^ 1 (3.21) i=l the weight Wij = 1/a (dij), (3.22) di+ = H 12' i=i Wij is the weighted mean of the effect size estimates in the ith group, and P mi (3-23) Wi+ i=l j=l 1=1 is the grand weighted mean. The distribution of the distribution with {p — 1) degrees of freedom. If Qb statistic approximates a Qb exceeds the 100(1 — «)% critical 71 value of the distribution with p — 1 degrees of freedom, the hypothesis that the mean group effect sizes from the p groups are equal is rejected. The Qw/ statistic is used to test the hypothesis that effect sizes are homogeneous within groups of the studies, ^11 — • • • - \ ^pl — '• • — — \ ^prup <^1+ ; , (3.24) — ^p+ The sum of the homogeneity statistics is calculated for each of the p groups and is mathematically expressed as (Hedges & Olkin, 1985, p. 155) = 'I i=i j=i The Qw has an approximate Qw is greater than the critical ^ = - d,+f. (3.25) i=i j=i distribution with Y^i=i degrees of freedom. If value, the null hypothesis that effect sizes within each of the p groups are equal is rejected. Hedges and Olkin (1985) presented a graphical method to plot the effect size estimates with the correspondent confidence intervals on a set of horizontal lines for identifying deviant effect sizes. This plot provides a simple visual representation of the individual effect size and the amount of variation for each study. A useful technique is to sort the studies into groups based on the proposed study characteristics and to rank order studies by the effect size estimate. The plots also display if the confidence interval includes the value zero and how some effect sizes deviate from the others (Hedges & Olkin, 1985). This type of plot is frequently called a forest plot or tree plot in medical research (Egger, Smith, &: Altma, 2001; Lewis & Clarke, 2001) 72 Comparisons Among Groups The results obtained from the between groups homogeneity test reveal if the mean effects are equal among the groups. However, when there are more than two groups to be compared, the Qb statistic gives no insight about which groups are associated with the largest effect size. If priori knowledge or significant value of Qw leads to the conclusion that the effect sizes are not the same among groups, the methods of contrast or comparisons can be used to explore the difference among group means. Such comparisons are analogous to contrasts in ANOVA. (Hedges & Olkin, 1985). A contrast 7 is defined as linear combination of population mean effect (5j_|_ where the coefficient Cj sum to zero: p 7 = ^Ci(5i+ (3.26) i=l where — 0- The contrast coefficients are chosen to reflect a comparison of interest. The contrast 7 is estimated by a linear combination of sample effect size means, p 7 = J^Ci(ii+ (3.27) i=l where is the weighted average effect size for the ith group. The estimated variance is and the 100(1 — a)% confidence interval is 7 - Zaiio{f) < 7 < 7 + Z„/20-(7), (3.29) where Za.12 is the two-side critical value from the standard normal distribution (Hedges & Olkin, 1985, p. 159). 73 Regression Approach to Test Moderating Effects of Continuous Study Characteristics Regression approach can be used to model the relation between continuous study characteristics and estimates of effect size for testing the moderating effect (Hedges, 1994; Wang & Bushman, 1999). Suppose that di, - • • ,dk are k estimates of effect size with estimated variances cr^((ii), • • • , a'^{dk). The regression model can be stated as follows: p Si = Po + y ^ Pj^ij + (3.30) i=i where i = 1, • • • , k, 5^ is the zth population effect size, Xn, • • • , Xip are p study characteristics, /3o,/5i, • • • , Pp are regression parameters, and Si are independent random error term of normal distribution A''(0, af). The weighted least squares estimates of regression parameters /5o, /?i, • • • , f3p are bo,bi,... ,bp, where the weight for the effect-size estimate di is defined as the reciprocal of the variance as in Equation 3.13. The 100(1 — q;)% confidence interval for the regression coefficient Pj is given by b, - < p, < bj + (3.31) where a(bj) is the standard error of bj, M S E is the error mean square for the regression model, and Za/2 is the two-side critical value from the standard normal distribution (Hedges, 1994; Wang & Bushman, 1999). File Drawer Problem One source of publication bias is the "file drawer problem" which affects the results of meta-analysis. The problem means that studies with statistically significant results are more likely to be published than those with nonsignificant results (Begg, 1994; Rosenthal, 1991). An extreme case of this problem would be if the publications are filled with the five percent of the studies that show Type I 74 errors and the file drawers of the researchers are filled with 95 percent of the studies that show nonsignificant results for situations in which no population effect exists (Rosenthal, 1991). The primary studies of meta-analysis are generally more likely to be retrieved from published than unpublished materials. One way to detect possible publication bias and to investigate if all the studies come from a single population is to use a "funnel graph", a plot of sample size versus the effect sizes from the individual primary studies (Begg, 1994). With effect size graphed on the horizontal axis and the sample size graphed on the vertical axis, the plot should be shaped like a symmetrical funnel with the spout pointing up if there is no selection bias. If the plot is skewed or shows holes in its center, the selection bias is suspected. The funnel plot is based on the statistical principle that sampling error decreases as sample size increases (Wang & Bushman, 1999). However, funnel plots have limited use when the number of studies included in a meta-analysis is very small; it is very difficult to determine if the plot is shaped like a funnel. One technique for handling the file drawer problem is to calculate a "Fail Safe" number, which estimates the number of unpublished primary studies with null results on tests of the significance that would be needed to raise the overall probability of a Type I error rate to any desired level of significance (Hedges & Olkin, 1985; Wolf, 1986). This study computes the "Fail Safe" number which was proposed by Orwin (1983) and is expressed in Equation 3.32 to examine if the number of unpublished studies threatens the overall results of this meta-analysis. N„ = (3,32) where N is the number of the studies in the meta-analysis, d is the average effect size for the studies synthesized, and dc is the criterion value selected that d would equal when some knowable number of hypothetical studies {Nfs) were added to the 75 meta-analysis. (Wolf, 1986, p. 39). Cohen (1988) suggests d — 0.2 (small effect), d = 0.5 (medium effect), and d = 0.8 (large effect). For this study, the small effect size d = 0.2 is used as the criterion value to compute the "Fail Safe" number. 76 CHAPTER 4 ANALYSIS AND RESULTS This chapter presents the analysis and the results for this meta-analysis. Five sections are included. The first section describes the process and selection of the primary studies. The second section describes the process of reviewing and coding the primary data. The third section examines the possibility of selection bias. The fourth section presents the overall effect size estimator, the methods to handle the dependency of effect sizes, and computation of the fail safe number for the file drawer problem. Finally, the fifth section presents the findings of analyzing the study characteristics for answering the research questions. Primary Study Selection Through intensive and exhaustive search of electronic databases ERIC and PsylNFO as well as a review of manual literature and study references, 25 experimental studies conducted between 1985 and 2002 regarding the use of computer-assisted instruction (CAI) to teach college-level introductory statistics courses in the United States met the criteria and were selected for this meta-analysis. These 25 studies with primary data are listed in Table B.l in Appendix B. In addition, there were five experimental studies that did not provide complete quantitative information for meta-analysis but concluded the effect of the use of CAI in teaching statistics to be significant or non-significant. These studies were Lane and Tang (2000), Mausner et al. (1983), Stephenson (1990), Stockburger (1982), and Varnhagen and Zumbo (1990). Three of these five studies reported 77 significant differences between the computer group and the group using traditional teaching method. The studies are briefly described in the following section. Lane and Tang (2000) examined the effectiveness of a simulation program training on transfer of statistical concepts. This study compared computer and traditional groups and concluded that the difference was significant with F(l, 107) = 5.84, p = 0.017 without reporting the subject numbers of the two groups. Mausner et al. (1983) developed a simulation computer program based on the DELTA project at the University of Delaware. This program contained 16 instructional units of descriptive and inferential statistics and was run on the mainframe. This study compared the test results from students in the computer-based group with workbook-based group and concluded that the difference was significant with t = 9.8 with d/ = 46. The sample sizes of the two groups were not reported. Stephenson (1990) conducted an experimental study to investigate student reaction to the use of Minitab on the mainframe in an introductory statistics course. There were 22 students in the control group and 23 students in the computer group. The study concluded that the difference between exam scores for students in the computer group and the control group was not statistically significant without reporting the group means or significance test results. Stockburger (1982) evaluated three simulation exercises in an introductory statistics course. The study concluded that the difference between the treatment group and the control group was significant for two of the simulation exercises with F(l,47) = 11.692, and F(l,46) = 21.254, respectively. However, no sample size of the two groups was reported. Varnhagen and Zumbo (1990) evaluated two CAI programs run on PLATO systems and their relationship with attitudes and performance in statistics instruction. The study investigated the relationship by path analysis. There were one control group with 49 students and two experimental groups with 41 and 44 students, respectively. The study measured the performance and mentioned that there were no significant differences among the groups without 78 reporting detailed statistics. Since these studies did not provide adequate data and did not meet the criteria, this meta-analysis did not include them in this study. Reviewing and Coding the Primary Data After the 25 primary studies were selected for this meta-analysis, a review sheet was prepared in Appendix A. Detailed information in each study was recorded in the review sheet. The process of examining the correctness of the data was repeated three times. The primary data and the study characteristics were then typed and coded as a text data file. SAS was used to analyze the data and plot the graphs. Examination for Selection Bias This meta-analysis selected the studies that met the criteria and provided adequate quantitative data for statistical analysis. Since there were only 25 primary studies included in this study, the estimates from these studies for the population efi^ect size based may be biased. In order to examine possible selection bias, a funnel plot was employed in Figure 4.1. The funnel plot is a scatterplot where the sample sizes of the primary studies are graphed against their respective effect size estimates (Wang & Bushman, 1999). The funnel plot is based on the principle that sampling error decreases as sample size increases. The standard deviations of the effect sizes obtained from these primary studies are listed according to the sample sizes of the studies in Table B.4 in Appendix B. This table shows that as the sample size increases the standard deviation decreases. If the sampled studies come from a single population, the plot would look like a funnel with the width of the funnel becoming smaller as the sample size increases. The funnel plot in Figure 4.1 shows more data on the bottom part because there are three studies with extremely large sample sizes. There are 79 1200 1100 1000 900 800 <s> to •2 700 0) 1 eS 600 B o 6«>» 500 400 300 200 100 0 - 1 0 1 2 Effect Size Estimate Figure 4.1 Funnel plot. also more studies with positive effect sizes than the ones with negative effect sizes. The center of the funnel falls on the value greater than zero. In addition, the effect sizes of the three large-sample studies are 0.49, 0.49, and 0.36, respectively. They are close to a value of the population estimator 0.43 (which is presented in the following section). In this funnel plot, there is a small bite on the left side of the bottom which shows that fewer small-sample studies with negative effects were available for this meta-analysis. The funnel plot suggests that a slight selection bias 80 may exist in the sample of primary studies. Estimate of Overall Effect Size The overall corrected unbiased effect size d for the estimation of a population effect size S is 0.43 by combining the 31 effect sizes obtained from the 25 primary studies. This result indicates that the use of computer to assist teaching statistics for college students increases the mean of the experimental group to the 67th percentile of the control group. The lower bound of the 95% confidence interval is 0.37 and the upper bound is 0.49. The confidence interval does not include the value zero, indicating that the overall effect size estimate is significant different zero. The effect size 0.43 reaches Cohen's (1988) criterion value for a medium effect size. This estimate of the overall effect size of suggests that computer-assisted instruction has a small to moderate positive effect on teaching statistics at the college level. Prior to further statistical analysis, the effect size estimates should be examined. The range of the effect sizes is from -0.267 to 1.77 with the median value of 0.52. Their individual standard error and 95% confidence interval levels are reported in Table B.4 in Appendix B and are displayed in the forest plots by groups of study characteristics in Appendix C. Although the combined effect size indicates a small to moderate positive effect for CAI in statistics education, 19 of the confidence interval levels of the 31 effect sizes contain zero (see Table B.4 in Appendix B). The histogram distribution of the effect sizes is shown in Figure 4.2, which indicates an approximate normal shape. A normal quantile plot can be also used to examine the normality of the distribution of the effect sizes (Wang & Bushman, 1999). The normal quantile plot compares an observed distribution against the quantiles of the standard normal distribution. If the observed data have a normal distribution, the points on the plot will be close to the line X = Y. The normal quantile plot is shown in Figure 4.3. 81 50 i 25- 20- P e r 0 e n IS- t 10- 6- -0.6 -0.2 0.2 0.6 1.0 1.4 1.8 Effect Size Normal(Mxi-0.S141 S%gma=0.4687) Figure 4.2 Histogram of effect sizes. The points on the plot show an approximate straight line, which indicates that the data are distributed normally. In addition, statistical tests were performed to examine the normality of the effect sizes. The Anderson-Darling normality test {p — 0.35) and the Shapiro-Wilk test (p = 0.96) retain the hypothesis that the distribution of the effect sizes is sufficiently normal. Dependence of Effect Sizes The problem of dependence of effect sizes occurs in four ways (Hedges, 1986): first, there are multiple effect sizes calculated from different measures on the same subjects; second, when there are several experimental groups to compare with one control group in a study; third, when there are several different samples in the same study used to calculate several effect size estimates; and fourth, when the same 82 2.0 •{ 1.S- 1.0«> + ++ 0.5- 0.0- -3 0 -2 1 2 3 Normal (tuantilea NormalLine: Uu=0.S141, Sigma^O.4687 Figure 4.3 Normal quantile plot. researchers or investigators conduct a series of studies and generate several related effect sizes from these studies. In this meta-analysis, for the first situation of dependency, six of 25 primary studies calculated more than one effect size from different measures on the same subjects. These studies are Gratz and Kind (1993), Koch and Gobell (1999), Myers (1996), Wang (1999), Ware and Chastain (1989), and White (1986). Hedges (1986) suggested taking the median of the multiple effect sizes to avoid correlation among effect sizes and that was used to obtain one effect size from each of the above six studies. For the second situation of dependence concerning multiple treatment groups compared with the same control group, three studies have that problem. Gonzalez and Birch (2000) applied two types of CAI (Computation statistical program and 83 Tutorial) as two experimental groups to compare with one traditional group. Lane and Aleksic (2002) applied a simulation program to three groups of students in three consecutive semesters and compared the effect with the same control group. And, Marcoulides (1990) implemented two types of programs (expert systems and simulation) to two groups and compared the effect with the same control group. Since there are only a few such correlated estimates, the dependence can be cautiously ignored (Hedges, 1986). The third type of dependence is regarding several different samples used in the same study. In this meta-analysis, Athey (1987) and Dorn (1993) applied computer programs to different groups of samples and generated more than one effect size. Because there are only two such studies, the dependence can be cautiously ignored. As to the fourth type of dependence, there are no studies conducted by the same researcher selected in this meta-analysis. Fail Safe Number The file drawer problem is handled by calculating the fail safe number for this meta-analysis. The fail safe number {Nfg) is a number used to estimate the additional primary studies which result in a finding of no effect to be included in this meta-analysis to overturn the result of this analysis. Using Equation 3.32, the criterion value of 0.20 (Cohen's definition for a small effect size), and the calculated population effect size estimator (5 = 0.43), the fail safe number is 25 X (0.43 — 0.2)/0.2 = 28.75; that is, at least 29 additional studies are needed to decrease the overall effect size estimator to be 0.20 or less. Over the period from 1985 to 2002, there might possibly have been 29 unpublished studies with non-significant results for using computers in teaching statistics. Therefore, the results obtained from the small samples of studies by this meta-analysis need to be interpreted with caution. 84 Primary Study Characteristics The population effect size estimator is calculated from the 31 effect sizes of the 25 primary studies to indicate the magnitude of the overall effect of using CAI in statistics education in this meta-analysis. There are some study characteristics that might have contributed or moderated the effect. The following sections investigate eight study characteristics, which consist of the publication year, the publication source, the educational level of participants, the mode of CAI program, the type of CAI program, the level of interactivity of CAI program, the instructional role of CAI program, and the total sample size. Publication Year Computer technology has had tremendous change and development over the past twenty years (Wurster, 2001). Does the effectiveness of CAI in statistics education differ by the publication year? Since this a continuous variable, a weighted regression analysis was used to study the relationship with the estimates of effect size. The scatterplot of the effect sizes versus publication year with a regression line is presented in Figure 4.4. The regression weights for the intercept and year are 13.86752 and -0.00672, respectively. The equation for the regression line is in Equation 4.1, where d = 13.86752 - 0.00672 x year. (4.1) The standard errors of intercept and publication year are 20.81896 and 0.01042, respectively. The mean square error from ANOVA for the regression is 2.36297. The 100(1-q;)% confidence interval for the weight of publication year is given by -0.00672 - 1.96 X 0.01042 \/2.36297 < /3 < -0.00672 + 1.96 X 0.01042 V2.36297 (4.2) 85 2- I 1984 ' I 1986 ' I 1988 ' —• I ' 1990 I 1992 ' I ' 1994 I 1996 • I 1998 • I 2000 ' I 2002 ' r 2004 Publioation Year Figure 4.4 Regression of effect sizes on publication year. or -0.02 < / 3 < 0.0066. (4.3) The confidence interval level includes the value zero, indicating that the weight of publication year is not significantly different from zero. That is, the effect sizes do not change as the publication year changes. The Q statistic proposed by Hedges and Olkin (1985) was calculated to test the homogeneity of the effect sizes of the primary studies. The Q statistic was calculated using Equation 3.18 and was obtained as 69.511 with degrees of freedom of 30. The Q statistic of 69.511 was compared with the critical value (43.773) of the distribution with 30 degrees of freedom at o; = .05. The homogeneity of the effect sizes was rejected. Hedges and Olkin (1985) proposed a method analogous to ANOVA to decompose the total Q statistic as the Qb statistic for the between 86 Table 4.1 Statistics of Study Effect Sizes by Year Group 1985-1989 1990-1994 1995-1999 2000-2002 N 7 9 8 7 (i_|_ 0.55834 0.47382 0.24495 0.43366 a{d+) 0.12875 0.08217 0.10120 0.04022 lower 0.30599 0.31277 0.04660 0.35483 upper 0.81069 0.63486 0.44331 0.51249 groups and the Qw statistic for the within group. For examining the between and within Q statistics of the effect sizes by the publication year, the effect sizes of the studies were divided into four groups: 1985-1989, 1990-1994, 1995-1999, and 2000-2002. During the 1980s, microcomputers become increasingly popular and were more frequently used in teaching. In the early 1990s, Internet and multimedia appeared and were applied in teaching. Since 1995, the World Wide Web has been widely used in various aspects of teaching. With different stages and the development of computer technology and tools, the effects of using computer in statistics teaching may have differed across the four periods of time. Table 4.1 summarizes the group data from Figure C.l in Appendix C presents the number of studies in each group, weighted means, standard errors, and 95% confidence interval levels for the four groups. Figure C.l presents the forest plots for the effect sizes of the studies grouped by the publication year. For the two groups of 1985-1989 and 1990-1995, the weighted means of the effect sizes of most studies are greater than zero. From the regression plot, one stduy (McBride, 1996) seems to have an extremely large effect size compared to the rest. In order to examine the influence of this study, a weighted regression was analyzed by excluding this study. Since this study was a repeated measured design with ten participants, the results of the weighted regression did not differ much and had the same conclusion. In order to test the difference among the groups, the Q statistics for the 87 Table 4.2 Q Statistics by Year Source Between Groups Within Groups 1985-1989 1990-1994 1995-1999 2000-2002 Corrected Total df 3 27 6 8 7 6 30 Q Stat 4.627 64.884 8.809 21.150 24.474 10.451 69.511 p-value 0.201 0.000 0.185 0.007 0.001 0.107 0.000 difference among groups and within groups are computed and presented in Table 4.2. The result of Qb is p{Qb{^) — 4.544) = 0.208. The Qw is 64.884 {p < 0.001) which shows significant variation within some of the four groups. The Qw statistics of the groups of 1990-1994 and 1995-1999 indicate the sources of significant variation within the two groups. Publication Source In order to examine if there is publication bias and if the effect of using computers differs according to the source of publication, the source of publication was used as a variable for the analogous ANOVA. In general, journal articles are likely to report significant results. Unpublished studies are more likely to include non-significant studies. The primary studies in this meta-analysis were selected from three sources: dissertations, journals, and ERIC documents. Table 4.3 shows that there are 10 dissertations, 13 journal articles, and 8 ERIC documents. The means of the effect sizes are 0.58, 0.43, and 0.41, respectively. The standard errors for the three groups are 0.11, 0.07, and 0.04. Figure C.2 presents the forest plots for the effect sizes of the studies grouped by the publication source. In Table 4.4, the Q b is 2.27 { p — 0.321), indicating that the means of the three groups of study sources do not differ. The Qw statistic for the dissertation 88 Table 4.3 Statistics of Study Effect Sizes by Source Group Dissertation Journal ERIC N 10 13 8 0.58545 0.43055 0.40843 a{d+) 0.11091 0.07463 0.03883 lower 0.36807 0.28428 0.33233 upper 0.80283 0.57681 0.48453 Table 4.4 Q Statistics by Source Source Between Groups Within Groups Dissertation Journal ERIC Corrected Total df 2 28 9 12 7 30 Q stat 2.270 67.240 5.215 39.772 22.253 69.511 p-value 0.321 0.000 0.815 0.000 0.002 0.000 group is 5.21 { p = 0.815) which shows that the 10 dissertation studies are homogeneous. The Qw statistic for the 13 journal articles is 39.772 {p < 0.001), which shows that the effect sizes from the journal articles vary significantly. And, the Qw statistic for the ERIC documents is 22.253 {p = 0.002), which shows that the effect sizes of the studies from ERIC documents also have significant variation. Educational Level of Participants In colleges and universities, the introductory statistics courses are generally offered to both undergraduate and graduate students. Among the studies selected for this meta-analysis, there are 23 effect sizes from the studies in which the computer programs were used for undergraduate students, five effects sizes from the studies for graduate students, and three effect sizes for both undergraduate and graduate students. Table 4.5 presents the weighted means of the three groups to be 0.43, 0.53, and 0.31. Figure C.3 presents the forest plots for the effect sizes of the 89 Table 4.5 Statistics of Study Effect Sizes by Educational Level Group Undergraduate Graduate Mixed N 23 5 3 0.43078 0.53147 0.31102 a{d+) 0.03432 0.18133 0.14957 lower 0.36351 0.17607 0.01787 upper 0.49804 0.88686 0.60417 Table 4.6 Q Statistics by Educational Level Source Between Groups Within Groups Undergraduate Graduate Mixed Corrected Total df 2 28 22 4 2 30 Q stat 0.944 68.567 50.455 10.167 7.946 69.511 p-value 0.624 0.000 0.001 0.038 0.019 0.000 studies by the three groups. In Table 4.6, the Qb is 0.944 (p = 0.624), which reveals that the magnitudes of the estimates of effect size of undergraduate, graduate, and combination of students do not differ. The within group heterogeneity test with the Qw=68.567 statistic indicates that the variation of the effect sizes is significant within the three groups of primary studies. The Qw statistic is 50.455 {p = 0.001) for the undergraduate group, is 10.167 (p = 0.038) for the undergraduate group, and is 7.946 {p = 0.019) for the combination group, respectively. The individual heterogeneity test results also show the relatively heterogeneity effects for each group. Mode of CAI Program This variable examines the different modes in which CAI is used to teach statistics. Seven specific modes are drill-and-practice, tutorials, computational 90 Table 4.7 Statistics of Study Effect Sizes by Mode Group Drill Tutorial Computation Simulation Multimedia Web-based Expert systems N 1 2 12 5 3 4 4 0.92296 0.68748 0.11950 0.48417 0.73997 0.23294 0.98945 lower a{d+) 0.33698 0.26250 0.20353 0.28857 0.07560 -0.02867 0.04166 0.40251 0.20537 0.33744 0.11092 0.01555 0.16867 0.65886 upper 1.58342 1.08640 0.26767 0.56582 1.14250 0.45033 1.32005 programs, simulations, multimedia. Web-based programs, and expert systems. Table 4.7 indicates that there is only one effect size in the drill-and-practice group, two effect sizes in the tutorial groups, three in the multimedia group, four in Web-based program group, five in simulation group and 12 in computational program group. The weighted means of the effect sizes from the seven groups are 0.99 for expert systems, 0.92 for drill-and-practice, 0.74 for multimedia, 0.69 for tutorials, 0.48 for simulations, 0.23 for Web-based programs, and 0.12 for computational programs. Figure C.4 presents the forest plots for the effect sizes of the studies by the seven groups. Only the confidence level of the group of computational program includes zero, which indicates the computational programs have no significant effect in teaching statistics. In Table 4.8, the Q b is 38.73 { p < 0.001), indicating that the means of the seven groups of effect sizes differ significantly. The Qw is 30.778 {p = 0.16) which indicates that the variation of the effect sizes within the seven groups is homogeneous. The Qw statistic for the drill-and-practice group is 0 because there is only one study and the effect size does not have variation. The Qw for the tutorial group is 1.269 {p = 0.26) which also shows that the effect sizes within this group do not differ significantly. The Qw statistic for the computational program group is 17.169 {p = 0.103) which also indicates that the effect sizes within the group do not 91 Table 4.8 Q Statistics by Mode Source Between Groups Within Groups Drill Tutorial Computation Simulation Multimedia Web-based Expert systems Corrected Total df Q Stat 6 38.733 24 30.773 0 0.000 1 1.269 11 17.169 4 3.652 2 1.250 3 5.456 3 1.982 30 69.511 value 0.000 0.160 0.260 0.103 0.455 0.535 0.141 0.576 0.000 have significant variation. The remaining four groups have similar result as present in Table 4.8. Type of CAI Program The two types of computer programs for teaching statistics are usually developed by commercial professionals or by teachers with capable computer knowledge and skills. In this meta-analysis, 12 effect sizes were obtained from the studies that applied commercial-developed statistical packages. For example, Christmann and Badgett (1997) used MYSTAT, Gilligan (1990) used Minitab,and Gratz, Volpe, and Kind (1993), High (1998), Rosen, Feeney, and Linda (1994), and Wang (1999) used SPSS. However, some teachers or researchers were interested in developing statistical programs that focus on specific topics to address different purposes and needs. For example, Aberson, Berger, Healy, Kyle, and Romero (2000) developed a Web-based interactive tutorial (WISE) to teach many topics in introductory statistics courses. Athey (1987) and Olsen (1988), and Marcoulides (1990) developed expert systems to teach statistics. The results of the effect size for the two types of programs are listed in 92 Table 4.9 Statistics of Study Effect Sizes by Type Group Commercial Teacher-made N 12 19 d+ 0.14279 0.49745 a{d+) 0.074495 0.036662 lower -0.00321 0.42559 upper 0.28880 0.56930 Table 4.10 Q Statistics by Type Source Between Groups Within Groups Commercial Teacher-made Corrected Total df Q stat 1 18.246 29 51.265 11 18.987 18 32.278 30 69.511 p-value 0.000 0.007 0.061 0.020 0.000 Table 4.9. The weighted mean of the commercial group is 0.143 with 95% confidence level includes zero, reflecting that the measure of simply using commercial statistical program may not have significant effect in teaching statistics. Instead, in the group of teacher-made program, which the weighted mean is 0.497 and the 95% confidence level does not include zero, indicates a positive effect. Figure C.3 presents the forest plots for the effect sizes of the studies by the two groups. In Table 4.10, the Q b is 18.25 { p < 0.001) which indicates that the means of the effect sizes differ significantly between the two groups. The within groups homogeneity test with Qw is 51.265 {p = 0.007) is also significant. The Qw for the teacher-made group is 32.278 (p — 0.02), indicating that the means of the effect sizes within this group are significantly different. The Qw for the commercial program group is 18.987 {p = 0.061), showing the means of the effect sizes do not differ significantly within this group. 93 Level Of Interactivity of CAI Program This variable was included to examine if the effect differs by the level of interactivity of computer programs. Three groups were used for this variable: interactive PC, interactive mainframe, and batch mainframe. With popularity and ease of use of microcomputer, more statistical programs are run on microcomputer with interactive mode. The results in Table 4.11 shows that there are 28 effect sizes in the interactive-PC group, two in the interactive-mainframe group, and only one in the batch-mainframe group. The means of the three groups are 0.434, 0.424, and 0.211, respectively. The standard errors are 0.034, 0.204, and 0.207 for the three groups. The 95% confidence interval levels for the interactive-PC and interactive-mainframe groups do not include zero, which shows a positive effect in using interactive mode in teaching statistics. However, the 95% confidence interval level for the batch-mainframe mode includes zero, which indicates no significant effect. Figure C.6 presents the forest plots for the effect sizes of the studies by the three groups. Because there is only one effect size in the batch-mainframe group and two effect sizes in the interactive-mainframe group, the results need to be explained with caution. In Table 4.12, the Q b is 1.138 { p = 0.566) which indicates that the means of the three groups of effect sizes have no significant difference. The Qw statistic of 68.373 (p < 0.001) is significant. The Qw for the interactive-PC group is 68.291 (p < 0.001), which shows the variance is significantly different among the 28 effect sizes in the interactive-PC group. The Qw statistic for the interactive-mainframe group is 0.082 {p = 0.775), which shows that the means of the effect sizes are not significantly different in this group. Since there is only one effect size in the batch-mainframe group, it has no variation. 94 Table 4.11 Statistics of Study Effect Sizes by Level of Interactivity Group Interactive PC Interactive mainframe Batch mainframe N 28 0.43420 2 0.42436 1 0.21064 lower a{d+) 0.03377 0.36802 0.20481 0.02294 0.20689 -0.19485 upper 0.50039 0.82578 0.61614 Table 4.12 Q Statistics by Level of Interactivity Source Between Groups Within Groups Interactive PC Interactive mainframe Batch mainframe Corrected Total df 2 28 27 1 0 30 Q Stat p-value 1.138 0.566 68.373 0.000 68.291 0.000 0.082 0.775 0.000 69.511 0.000 Instructional Role of CAI Program This variable was included to address if the effect differs by the instructional role of the computer program. Some programs were used as supplement or adjunct to the traditional method of teaching statistics while others were used as substitute for traditional instructional methods in the primary studies. Table 4.13 presents means for the 22 effect sizes in the supplement group and nine effect sizes in the substitute group. The means for the two groups are 0.439 and 0.355 with standard errors of 0.035 and 0.092, respectively. The 95% confidence interval levels for both groups do not include zero, indicating positive effects on teaching statistics. Figure C.7 presents the forest plots for the effect sizes of the studies by the two groups. In Table 4.14, the Q b is 0.73 { p = 0.393), which indicates that the means of the two groups of effect sizes do not differ significantly. The Qyy statistic is 68.781 {p < 0.000) which indicates significant variation within the groups. The Q w 95 Table 4.13 Statistics of Study Effect Sizes by Instructional Role Group Supplement Substitute N 22 9 0.43902 0.35466 a{d+) 0.035207 0.092272 lower 0.37001 0.17381 upper 0.50802 0.53551 Table 4.14 Q Statistics by Instructional Role Source Between Groups Within Groups Supplement Substitute Corrected Total df 1 29 21 8 30 Q stat 0.730 68.781 54.552 14.230 69.511 p-value 0.393 0.000 0.000 0.076 0.000 statistic for the supplement group is 54.552 { p < 0.001), which indicates the means of the effect sizes are significantly different in the supplement group. The Qw statistic for the substitute group is 14.23 {p = 0.076), showing that the means of the effect sizes do not significantly in the substitute group. Sample Size of Participants This variable was included to examine if the sample size of study changes the effect size. Table 4.15 shows means for the 16 effect sizes obtained from the studies with fifty or fewer participants, nine effect sizes obtained from studies of size 51-100, and six effect sizes obtained from the studies with more than 100 participants. The weighted means for the three groups are 0.577, 0.360, and 0.418, and the standard errors are 0.091, 0.076, and 0.040, respectively. None of the 95% confidence interval levels for the three groups include zero, indicating positive effects for using computers in teaching statistics. Figure C.8 presents the forest plots for the effect sizes of the studies by the three groups. In Table 4.16, the Q b is 3.559 { p = 0.169), indicating no significant difference 96 Table 4.15 Statistics of Study Effect Sizes by Sample Size Group 0-50 51-100 101+ N 16 9 6 0.57724 0.35975 0.41768 a{d+) 0.090555 0.077513 0.039658 lower 0.39976 0.20783 0.33995 upper 0.75472 0.51167 0.49540 Table 4.16 Q Statistics by Sample Size Source Between Groups Within Groups 0-50 51-100 100+ Corrected Total df 2 28 15 8 5 30 Q Stat 3.559 65.952 22.357 29.910 13.685 69.511 p-value 0.169 0.000 0.099 0.000 0.018 0.000 in effect sizes. The Q w statistic is 65.952 { p < 0.001) which indicates significant variation within the groups. The Qw statistic for the fifty or fewer group is 22.357 (p = 0.099), which indicates the means do not differ significantly among the 16 effect sizes within this group. The Qw statistic for the group of size 51-100 is 29.91 {p < 0.001), which shows that the effect sizes differ significantly within this group. And, the Qw statistic for the group with more than 100 participants is 13.685 (p = 0.018), which shows that the effect sizes differ significantly within this group. Comparisons Among Groups for Mode of CAI Program The results obtained from the between groups homogeneity tests for eight study characteristics show that the mean effect sizes of various modes of CAI programs and the two types of CAI program are significantly differ. Since there are more than two groups for the mode of CAI program, the differences among these groups can be explored by employing the comparisons which are analogous to the 97 contrast in ANOVA. Among the different modes of CAI programs, the results of the groups of computational statistical packages and the Web-based programs have smaller mean effect sizes than the other groups. The reason might be that computational statistical packages were usually used to facilitate computation. This type of program did not emphasize statistical concepts and understanding. For the Web-based programs, students used the Web to reach the content of the statistics course and the Web was used as a tool to reach the information. To contrast the effect sizes of the two groups of computational statistical packages and the Web-based programs with the other five groups, the contrast coefficients were chosen to be -0.5 and -0.5 for the groups of computational packages and Web-based programs, and 0.2 for each of the other five groups. Then, a linear combination of the sample effect means in Equation 3.27 was calculated to estimate the contrast parameter 7 as 7 = (-0.5) X 0.1195 +(-0.5) X 0.23294+(0.2) X 0.92296+(0.2) X 0.68748-f (0.2) X 0.48417+ (0.2) x 0.73997 + (0.2) x 0.98945 = 0.59, (4.4) with an estimated variance of (J2(7) = (-0.5)2 ^ (0.0756)2 ^ (-0.5)2 ^ (0.ii092)2 + (0.2)2 ^ (0.33698)2 + (0.2)2 ^ (0.20353)2 + (0.2)2 ^ (o.o4166)2 + (0.2)2 ^ (0.20537)2 + (0.2)2 ^ (0.16867)2 = 0.0136. (4.5) 98 The 95% confidence interval for 7 is 0.59 - 1.96 X VO.0136 < 7 < 0.59 + 1.96 x ^0.0136 (4.6) 0.36 < 7 < 0.82. (4.7) or Because this confidence interval does not include zero, the contrast is significant at the a = 0.05 level. That is, the mean effect size of the combined groups of computational statistical packages and Web-based programs is significantly different from the mean effect size of the combined group of drill-and-practice, tutorials, simulations, multimedia, and expert systems. 99 CHAPTER 5 SUMMARY, DISCUSSION, CONCLUSIONS, AND RECOMMENDATIONS This chapter provides a summary of the results for answering the research questions, a discussion of the significant findings in Chapter 4, the conclusions, and some recommendations for using CAI in teaching introductory statistics at the college level. Summary The literature of using CAI in teaching statistics has shown that various computer programs have been popular and beneficial to both undergraduate and graduate students. Moore (1997) and Ben-zvi (2000) pointed out that an introductory statistics course should emphasize more on concepts, data analysis, inference, and statistical thinking; foster active learning through different alternatives to lecturing; and use technology and computers to automate computations and graphics. This meta-analysis reviewed the research of CAI in statistics education during the past few decades and performed a quantitative synthesis of 25 primary studies with experimental results comparing the effectiveness of CAI and traditional methods in teaching statistics. This section includes the research questions with the results summarized. The first question was "How effective is the use of computer-assisted instruction (CAI) in enhancing the statistical learning of college students as compared with non-computer instructional techniques?" The overall estimate of population effect size 5 is 0.43 for the 25 primary studies with 31 effect sizes, suggesting a medium effect according to Cohen's (1988) criterion. This result 100 indicates that the use of computer programs to assist teaching statistics for college students increases the mean of the experimental group to the 67th percentile of the control group. The standard error of this overall effect size estimate is 0.033. The lower bound of the 95% confidence interval is 0.36 and the upper bound is 0.49, which does not include zero. This effect size estimate suggests that computer-assisted instruction has a medium positive effect on teaching statistics at the college level. Examination of normality plots as well as statistical tests for normality (Anderson-Darling and Shapiro-Wilk) indicated that the distribution of the effect sizes was sufficiently normal. The Q statistics were used to determine whether study effects were influenced by eight moderating variables. The results indicate that the effect sizes were not homogeneous. The analogous ANOVA was used to examine the eight variables. However, conclusions drawn from these results should be tempered by the fact that there are only 25 primary studies in this meta-analysis and the funnel plot detected a slight selection bias. The second question was "Does the effectiveness of CAI differ by the p u blication year of the study?" The results of the analogous ANOVA {QB) show that there is no significant difference in the effect sizes among the four publication year categories. However, the results from the Qw show that the effect sizes within the year categories of 1990-1994 and 1995-1999 are significantly different, indicating the substantial variation within categories reduces the power to detect significant differences between categories. The weighted regression approach also provides evidence that the effect size estimates do not change as the publication year changes. The variable of publication year does not affect the estimates of the effectiveness of CAI in teaching statistics. The third question was "Does the effectiveness of CAI differ by the source of the study (dissertation, journal article, or ERIC document)?" The results of the analogous ANOVA show that the effectiveness of CAI does not differ according to the three sources of studies from dissertations, journal articles, and ERIC 101 documents. In general, published articles are expected to have greater effects than unpublished reports. However, in this study, the effect sizes from journal articles are not significantly different from the effect sizes from dissertations and ERIC documents. The Qw statistics indicate substantial variation within the journal articles and ERIC documents. The variable source of study does not appear to affect the estimates of effect size. The fourth question was "Does the effectiveness of CAI differ by students' level of education (undergraduate or graduate)?" The results indicate there is no significant difference in the magnitude of effect sizes among the studies with graduate students, those with undergraduate students, and those with both graduate and undergraduate students. However, more studies in this meta-analysis used CAI with undergraduate students. One reason might be that more introductory statistics courses are offered for the undergraduate than for graduate students. The Qw statistics indicate substantial variation within the three groups. The fifth question was "Which modes of computer-assisted instruction (CAI) techniques are the most effective for statistical instruction for college students?" For example, there are drill-and-practice, tutorials, multimedia, simulations, computational statistical programs, expert systems, and Web-based programs. The results show that there are significant differences among the seven modes of CAI programs. The comparison of the mean effect sizes of the group combining the expert systems, the drill-and-practice program, the tutorials, the simulations, and the multimedia programs with the group of the computational statistical programs and the Web-based programs conclude that the mean effect sizes of the two contrast groups differ significantly. The Qw statistics indicate no significant variation within any of the seven modes. However, the small number of studies in each of these modes may limit inferences of these results. The sixth question was "Does the effectiveness of CAI differ by the software type (commercial or teacher-made)?" The results suggest that the mean effect size 102 of the group of the teacher-made programs is significantly greater than the mean effect size of the group of the commercial programs. This result might be explained by the rationale that the teachers who could design and develop statistical computer program usually are knowledgeable in computer programming and skills and can design the specific programs to meet their goals of instruction. They may be more more involved in the teaching process as well as have more commitment to teaching the CAI course. The commercial programs are usually more general and the teachers who use them might not have used the programs as well as those teachers who developed their own programs. The results of the Qw statistics indicate the effect sizes have no significant variation within the group of commercial programs. However, the effect sizes have significant variation within the group of teacher-made programs. The seventh question was "Does the effectiveness of CAI differ by the level of interactivity of the program (interactive-PC, interactive-mainframe, or batch-mainframe)?" The results show that there are no significant differences among the programs run on PC or mainframe in interactive mode and on mainframe in batch mode. However, the number of the studies run on the interactive-PC exceeds the number of the programs run on mainframe. The statistical computer programs were implemented more on microcomputers than on the mainframe. And, the interactive mode of programs have been more widely used than the batch mode. The results of the Qw statistics indicate the effect sizes have significant variation within the group of interactive PC. The eighth question was "Does the effectiveness of CAI differ by the role of the program (supplement or substitute)?" The results show that there is no significant difference between the programs used as a supplement to the traditional instructional method or as a substitute for the traditional method for the primary studies in this meta-analysis. The Qw statistics indicate the effect sizes have significant variation within the supplement group but no significant variation within 103 the substitute group. The ninth question was "Does the effectiveness of CAI differ by the sample size of the participants?" The results indicate no significant differences among the four groups of different sample sizes. There are 16 effect sizes with sample in the group of 0-50, 9 effect sizes in the group of 51-100, and 6 effect sizes in the group of 101+. The Qiv statistics indicate the effect sizes have no significant variation within the group of 0-50, but have significant variation within the groups of 51-100 and 101+. Discussion The combined overall effect size is estimated to be 0.43 from the 25 primary studies in this meta-analysis. A similar meta-analysis conducted by Christmann and Badgett (1999) compared the effectiveness of some microcomputer-based software packages on statistical achievement. Christmann and Badgett (1999) selected only nine primary studies from 1987 to 1997 and generated 14 effect sizes from these studies. Among the 14 effect sizes, ten were from the studies using computational statistical software to teach statistics. The effect size estimate for that group of programs is 0.043. Two of the 14 effect sizes were from the studies using expert systems and statistical exercises. The effect size for that group is 0.651. The other two effect sizes were from a study using HyperCard to teach statistics. The effect size for the two different group is 0.929. The overall effect size estimate is 0.256 for Christmann and Badgett's (1999) meta-analysis. The present meta-analysis included eight studies from Christmann and Badgett's (1999) studies and excluded one study that was conducted with Korean college students. The estimate of overall effect size for this meta-analysis is 0.43, which is larger than 0.256 obtained by Christmann and Badgett (1999). One reason might be that the present meta-analysis includes more modes of CAI programs which have larger effect sizes 104 than the computational statistical programs. From the examination of the relationship between the eight study characteristics and the effectiveness of CAI programs, two characteristics show significant results according to the analogous ANOVA method proposed by Hedges and Olkin (1985). The two variables are the modes of the CAI programs and the type of the CAI programs (commercial or teacher-made). The results of the analogous ANOVA show that the modes of CAI programs have significant differences on the effects in teaching statistics. The effectiveness can be examined with the group means of the seven modes, which found that the means of the modes of expert systems, drill-and-practice, multimedia, and tutorial are larger than the means of computational programs. The mean of the simulation is at the medium level. However, the result shows that only one study (Porter, 1996) in this meta-analysis which used a drill-and-practice program was included. In the early years, drill-and-practice computer programs were the common mode used in teaching but were not evaluated by experimental studies. This mode of program is based on the behavioral learning theory. As the paradigm of learning theories expanded to include constructivism, educators might have used this mode of drill-and-practice program less frequently. However, in this meta-analysis the effect size obtained from this drill-and-practice program is 0.92 and shows a large effect in assisting learning statistics. While Gagne, Wager, and Rojas (1981) observed that drill-and-practice, simulations, and tutorials are the most common modes of CAI, this meta-analysis reveals that statistical computational packages are the most used in the 25 primary studies. There are 12 effect sizes obtained in this meta-analysis. This result might be due to the computational needs of teaching and learning statistics. However, the effect size for this group is only 0.12 and is not significantly different from zero. One reasonable explanation for this result may be that computational statistical packages usually provide students with tools to perform statistical analyses and 105 facilitate computing skills rather than enhance statistical concepts or achievement. When the students use the statistical packages, they sometimes cannot fully understand the computational procedures and spend much time and effort in performing the computer tasks. Although the effect is not significant, the use of computational packages is very important in learning statistics at the college level. The students using computer packages may have learned computational and computer skills which are important and required by employment in the professional market. For the group of the Web-based programs, the mean effect size is also small. One reason may be that the Web is a tool of assessing information and facilitating communication and the use of the Web does not contribute much in learning of statistics. Expert systems have also been increasingly used in teaching statistics. The effect size for this mode of program is about 0.98 which is large. Athey (1987) developed a mentor expert system incorporating expertise for guiding and teaching statistical decision-making. Olsen and Bozeman (1988) used an expert system to assist selection of appropriate statistical procedures. These programs enhance higher order thinking and emphasize statistical reasoning. Students using this type of program may have better problem solving and application skills in statistics. Simulation programs have also been frequently used in teaching statistics. They provide opportunities for students to become actively involved in the process of simulating some abstract statistical concepts, such as the central limit theorem. In this meta-analysis, the effect size for simulation programs is at a medium level. Tutorial programs also have the same level of effect on teaching statistics. The advantage of tutorials is that they provide students with graphical displays of abstract and complicated statistical concepts and allow students to proceed at their own pace and learning style. Multimedia programs also play a role in teaching statistics. They integrate various media to display graphic, text, and sound in computer programs. In this study, the effect size is 0.76 that demonstrates the effect 106 close to a large effect. In recent years with the rapid development of the Internet and World Wide Web, some Web-based programs have also been used to teach statistics. However, in this meta-analysis, the effect size of the Web-based programs is 0.23, a small effect size. The results indicate that the Web presentation may not be an important factor in the effectiveness of teaching statistics. Among the eight study characteristics, the type of CAI program was another variable that showed a significant difference on the effectiveness in teaching statistics. The other six variables did not provide significant results. For the type of CAI program, the analysis showed that the mean effect size of the programs designed or developed by the teachers was significantly higher than the commercial statistical programs. This result might be explained with the rationale that the teachers who could design and develop statistical computer program usually are more capable and knowledgeable in computer programming and skills. The teachers could design the specific programs to meet the special goals of instruction. Kuchler's (1998) meta-analysis for the effectiveness of using computers to teach secondary school mathematics found that teacher-made software is more effective for mathematics instruction. The effect was also explained as teachers may be more committed to teaching than those who use commercial software. Conclusions and Recommendations The overall results of this meta-analysis indicate a positive medium effect, implying that using CAI can increase students' statistical achievement to a moderate extent. The examination of the selected characteristics show that the different modes have significant differences in teaching statistics. The computational statistical packages and the Web-based programs are the least effective modes. However, the commercially-developed statistical packages are most commonly used 107 in statistics courses. In spite of the results of this meta-analysis, the adequate skills of students to use statistical packages to perform data analysis should still be enforced and emphasized because the ability is required for the job in the real world. The other nonsignificant mode is the Web-based programs for teaching statistics. This result implies that the effect of learning statistics may not differ by whether the computer programs are delivered on the Web or not. Computer programs of drill-and-practice, tutorials, and simulations are effective as well as expert systems and multimedia. These programs convey statistical concepts and emphasize comprehension. However, these programs are usually not available for general use in statistics courses. They are mostly developed to address specific topics. Teachers need to invest more efforts to obtain these programs or to develop by themselves, which requires more commitment and cost. This reason may explain why these programs are more effective. There have been a number of teacher-made programs to teach specific topics or objectives. The problem is that it is difficult to evaluate these programs and to distribute some good programs to other students. A recommendation is made that an online outlet can be established to collect successful CAI programs for interested teachers to locate these programs and to share teaching experiences and ideas. Computer programs are usually used in the statistics courses as a supplement to lectures. In this meta-analysis, most of the primary studies used computer as an instructional aid to reinforce students' understanding in statistics. Although the result does not show significant difference between the two instructional roles of supplement and substitute, it is reasonable to suggest that lectures are important to provide explanation and responses while questions are raised in learning statistics. As the computer technology has advanced in the past few decades, the effectiveness of CAI in teaching statistics does not differ significantly. This result implies that learning statistics may not depend on the development of technology. However, as computers become more popular and available to most students. 108 students' abilities in performing various tasks become stronger than students in the past years. When using computers in teaching statistics, teachers can focus better on learning statistics rather than computer skills. Since the 1980s, microcomputers have been introduced and available to many students, and many CAI programs in teaching statistics are available on microcomputers and are performed interactively. Students expend less effort to to learn to use a statistical package on a microcomputer than on the mainframe. The interactive mode is typical in most computer programs. A common problem of using CAI in teaching statistics is that not all statistics teachers are competent in computer technology and familiar with the computer programs. Under this circumstance, the effect of teaching statistics is affected by this factor. Especially, as the technology has changed dramatically in these years, statistics teachers need to constantly learn new technology as well. The result of this meta-analysis presents a positive medium effect size of using CAI in teaching statistics. However, many factors may affect the effectiveness of a computer program. Can the use of computers improve students' learning in statistics as Moore (1997) expected? The answer is "yes". 109 APPENDIX A PRIMARY STUDY REVIEW SHEET Author (s): Data: M e '• Se riEt: Mc• Scnc'. F: Repeated-measures design: Yes or No Study characteristics: 1. Year of publication: 2. Source of study: 3. Level of education: 4. Mode of application: 5. Commercial/teacher-made of program: 6. Interactive/batch on pc/mainframe: 7. Supplement/substitute: 8. Total sample size: 9. Others: 110 APPENDIX B TABLES OF DATA Primary study data are listed in Table B.l. Effect size data are listed in Table B.2 Primary Study Characteristics are listed in Table B.3 Standard Errors and Confidence Intervals are listed in Table B.4 Ill Table B.l Primary Study Data Year 2001 1987 1987 1997 1985 1993 1993 1990 2000 2000 1993 1998 1991 2001 1999 1999 2002 2002 2002 1990 1990 1996 1989 1988 1996 1998 1994 1991 1999 1989 1986 Author Aberson Athey Athey Christmann Dinkins Dorn Dorn Gilligan Gonzalez Gonzalez Gratz High Hollowell Hurlburt Jones Koch Lane Lane Lane Marcoulides Marcoulides McBride Myers Olsen Porter Raymondo Rosen Sterling Wang Ware White HE n c 55 56 12 13 10 9 36 14 9 9 17 18 19 20 36 42 15 14 14 14 27 28 43 44 52 81 36 116 33 56 12 14 140 340 681 340 776 340 43 44 41 44 10 10 23 29 15 13 20 19 36 51 25 25 38 28 12 14 55 41 10 10 ME 6.73 3.08 3.7 85.67 18.67 12.9 15.2 16.98 4.9 5.8 11.7 71.7 2.846 0.09 80.9 2.5 Mc 6.79 2.07 2.88 83.93 13.44 10 13.5 15.55 4.3 4.3 11.8 75.5 2.938 -0.03 75.2 1.75 SDE SDc t F 0 1.11 1.45 9.63 1.32 2.7 3 3.09 2.5 2.8 2.8 13.8 1.38 1.16 8.35 4.16 2.6 2.4 3.96 2.6 2.6 2.7 14.4 0.96 1 1.07 1.10 2.005 1.367 3.838 0.5 0.62 7.14 13.56 56.06 57.69 42.48 40.71 21.7 72.37 11.4 37.54 37.54 14.2 61.966 7.23 4.19 5.4 7.04 18.65 2.12 5.28 5.28 5.63 20.43 3.56 5.84 8.65 81.68 72.8 75.508 9.92 4.53 16.4 82.68 70.7 62.692 10.14 4.145 14.6 14.58 15.7 10.745 1.56 1.95 3.41 13.41 11.8 18.301 1.61 1.61 2.95 0.75 112 Table B.2 Effect Size Data ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Aut h or Aberson Athey Athey Christmann Dinkins Dorn Dorn Gilligan Gonzalez Gonzalez Gratz High Hollowell Hurlburt Jones Koch Lane Lane Lane Marcoulides Marcoulides McBride Myers Olsen Porter Raymondo Rosen Sterling Wang Ware White Year Hedges' g 2001 0.00000 1987 0.80279 1987 0.54566 1997 0.18708 1985 1.27933 1993 1.09477 1993 0.62764 1990 0.39881 2000 0.23542 2000 0.55517 1993 -0.03637 1998 -0.26938 1991 0.08885 2001 0.11829 1999 0.58640 1999 0.69038 2002 0.36978 2002 0.49719 2002 0.49398 1990 1.03507 1990 0.59384 1996 1.84677 1989 0.52902 1988 1.45010 1996 0.94221 1998 -0.07192 1994 0.15000 1991 0.88853 1999 -0.13860 1989 0.21234 1986 0.56456 Hedges' d 0.00000 0.77627 0.59266 0.18414 1.21825 1.06967 0.61482 0.39486 0.22881 0.53898 -0.03585 -0.26699 0.08834 0.11769 0.58133 0.66854 0.36920 0.49683 0.49365 1.02590 0.58845 1.76855 0.52104 1.40780 0.92296 -0.07129 0.14764 0.87807 -0.13422 0.21064 0.54065 Weighted d 0.000 4.505 2.690 1.850 4.624 8.183 5.720 7.509 1.646 3.641 -0.493 -5.755 2.795 3.229 11.613 4.093 36.104 109.663 113.770 19.717 11.971 6.357 6.467 7.866 8.128 -1.503 1.841 12.937 -0.865 4.921 2.608 113 Table B.3 Primary Study Characteristics Year Author Inter N Source Level Mode Type Instr 2001 Aberson journal mixed web teacher interpc subs 111 1987 Athey under expert disser teacher interpc subs 25 1987 Athey disser grad expert teacher interpc subs 19 1997 Christmann journal grad comp comm interpc supp 50 1985 Dinkins disser grad tutorial teacher interpc supp 9 1993 Dorn disser mixed multi teacher interpc supp 35 1993 Dorn disser mixed multi teacher interpc supp 39 supp 1990 Gilligan disser under comp comm intermn 49 2000 Gonzalez disser under comp teacher interpc subs 29 2000 Gonzalez disser under multi teacher interpc subs 28 under comp comm interpc supp 55 1993 Gratz ERIC 1998 High under comp comm interpc supp 87 ERIC under comm interpc supp 133 1991 Hollowell ERIC comp under web teacher interpc subs 152 2001 Hurlburt ERIC 1999 Jones under web teacher interpc subs 89 ERIC journal under web teacher interpc supp 26 1999 Koch under simu teacher interpc supp 480 2002 Lane ERIC under simu teacher interpc supp 1021 2002 Lane ERIC under simu teacher interpc supp 1116 2002 Lane ERIC supp 87 teacher interpc 1990 Marcoulides journal under expert supp 85 1990 Marcoulides journal under tutorial teacher interpc comm interpc supp 10 1996 McBride journal under comp under simu comm interpc subs 52 1989 Myers disser subs 28 1988 Olsen journal grad expert teacher interpc teacher interpc supp 39 1996 Porter journal under drill 87 comm interpc supp 1998 Raymondo journal under comp 25 comm interpc supp 1994 Rosen journal under comp teacher interpc supp 66 1991 Sterling journal under simu supp 26 comp comm interpc 1999 Wang journal grad comm batchmn supp 96 1989 Ware journal under comp supp 20 under comp comm intermn 1986 White disser Note. Source: disser=dissertation; Level: mixed=undergraduate and graduate, under=undergraduate, grad=graduate; Mode: web=web-based programs, comp=computation statistical programs, expert=expert systems, multi=multimedia, simu==simulation, drill=drill-and-practice; Type: teacher=teacher-made, comm=commercial; Inter: interpc=interactive pc, intermn=interactive mainframe, batchmn=batch mainframe; Instr: subs=substitute, supp=supplement. Table B.4 Standard Errors and Confidence Intervals ID 5 22 3 31 2 27 16 29 10 24 9 6 7 25 8 4 23 11 28 21 12 20 26 15 30 1 13 14 17 18 19 Author Dinkins McBride Athey White Athey Rosen Koch Wang Gonzalez Olsen Gonzalez Dorn Dorn Porter Gilligan Christmann Myers Gratz Sterling Marcoulides High Marcoulides Raymondo Jones Ware Aberson Hollowell Hurlburt Lane Lane Lane Hedges' d 1.21825 1.76855 0.59266 0.54065 0.77627 0.14764 0.66854 -0.13422 0.53898 1.40780 0.22881 1.06967 0.61482 0.92296 0.39486 0.18414 0.52104 -0.03585 0.87807 0.58845 -0.26699 1.02590 -0.07129 0.58133 0.21064 0.00000 0.08834 0.11769 0.36920 0.49683 0.49365 SE, N 0.51327 9 0.52744 10 0.46942 19 0.45531 20 0.41510 25 0.28323 25 0.40417 26 0.39384 26 0.38477 28 0.42306 28 0.37282 29 0.36156 35 0.32784 39 0.33698 39 0.22932 49 0.31551 50 0.28385 52 0.26975 55 0.26052 66 0.22171 85 0.21539 87 0.22810 87 0.21775 87 0.22374 89 0.20689 96 0.18984 111 0.17778 133 0.19090 152 0.10112 480 0.06731 1021 0.06587 1116 lower 0.21226 0.73479 -0.32738 -0.35174 -0.03731 -0.40747 -0.12362 -0.90613 -0.21515 0.57861 -0.50191 0.36103 -0.02773 0.26250 -0.05460 -0.43424 -0.03530 -0.56455 0.36746 0.15391 -0.68915 0.57883 -0.49807 0.14281 -0.19485 -0.37208 -0.26010 -0.25647 0.17100 0.36491 0.36455 upper 2.22424 2.80232 1.51271 1.43304 1.58985 0.70276 1.46071 0.63769 1.29311 2.23698 0.95953 1.77831 1.25737 1.58342 0.84432 0.80253 1.07737 0.49284 1.38869 1.02299 0.15517 1.47298 0.35549 1.01985 0.61614 0.37208 0.43678 0.49186 0.56740 0.62875 0.62276 115 APPENDIX C FOREST PLOTS FOR EFFECT SIZES GROUPED BY STUDY CHARACTERISTICS Forest plots by publication year are displayed in Figure C.l. Forest plots by publication source are displayed in Figure C.2. Forest plots by level of education are displayed in Figure C.3. Forest plots by mode of CAI program are displayed in Figure C.4. Forest plots by type of CAI program are displayed in Figure C.5. Forest plots by level of interactivity of CAI program are displayed in Figure C.6. Forest plots by instructional role of CAI program are displayed in Figure C.7. Forest plots by sample size are displayed in Figure C.8. 116 11 13 27 7 28 20 6 12 29 26 4 15 16 25 22 18 10 67 92 77 24 o o 1 14 9 17 19 21 52 59 54 78 22 41 56 04 09 15 39 59 61 88 03 07 47 27 13 07 18 12 23 37 49 50 54 43 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1[ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 max 2.8023190765 * [*] 0 [--0-* — -] ] [0-— * — ] [ -0— — - * [--0— — * —] ] [0-— — * 0[ —* [ 0 0 [-*-] [~ *0— — ] [ --0*-- ] [ -0-*— -3 -*--—] 0 [— -] [0-— -* * —- ] 0 [[ — * - —] 0 *0 [0 [-* - ] -*--0-] [ — *-0— — - ] [- *0— - ] [— -0-*— —] 0[- — - * — ] [-0— — — ^ —] DC ] 0 [[ 0 [-*-] [ -*— - ] [ --0*- — ] ] -0-*— [0[ ] [- *] 0 [- *] 0 [ --0— — * -] [*] 0 1 1 o 8 21 0 0 0 0 0 0 1 1 0 -0 0 0 0 0 0 0 1 1 0 -0 -0 -0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 min -0.906125548 1 * \ 1 1 1 2 5 24 *Total Ware Myers Athey White Athey Dinkins Olsen *1985-1989 Gratz Hollowell Rosen Gilligan Marcoulides Dorn Sterling Marcoulides Dorn *1990-1994 High Wang Raymondo Christmann Jones Koch Porter McBride *1995-1999 Aberson Hurlburt Gonzalez Lane Lane Lane Gonzalez *2000-2002 OC 30 23 3 31 d CJl 00 Author(s) Figure C.l Forest plots for effect sizes grouped by publication year. 117 8 23 3 10 31 7 2 6 5 29 26 1 27 4 30 21 16 28 25 20 24 22 12 11 13 14 17 19 18 15 *Total Gonzalez Gilligan Myers Athey Gonzalez White Dorn Athey Dorn Dinkins •Dissertation Wang Raymondo Aberson Rosen Christmann Ware Marcoulides Koch Sterling Porter Marcoulides Olsen McBride *Journal High Gratz Hollowell Hurlburt Lcine Lane Lcine Jones *ERIC 0 0 0 0 0 0 0 0 0 1 1 0 -0 -0 0 0 0 0 0 0 0 0 1 1 1 0 -0 -0 0 0 0 0 0 0 0 43 23 39 52 59 54 54 61 78 07 22 59 13 07 00 15 18 21 59 67 88 92 03 41 77 43 27 04 09 12 37 49 50 58 41 min -0.906125548 0 1 1 1 1 1 1 1 * ] -—] - ] — — ] - ] ] — [0 0[ [- - * - ] 0 —*-0— — - ] *- [- — -*o— - ] 1 [ —>lt— [ —0-* — ~] [ - - —0-* — —] [ —0-* — 0 [— -*— ] 1 1 1 -- - ] -- 1 1 1 - ] 1 1 1 1 1 1 1 — ] — [ — — 0— — -* [ — 0— — * * c- — 0— [0-— - * —* [0- 1[ 1 1 1 ] — [0-- 1 1 1 1 1 ] [- - - —0-* 1 1 1 1 1 max 2.8023190765 * 1 9 d t—1 O1 Author(s) [ — ] [-0-— *— - - ] [0 *— ] 0 [[-—*- — ] 0 —* [ 0 [ 0 0 [ - *- ] -*-0-] — ] [-— -*o— — ] [ —0*[ —0*- — ] 0[ ] [0 [0 - - ] * ] * ] —] 0[- — 0 [-* - ] Figure C.2 Forest plots for effect sizes grouped by publication source. 118 1 7 6 1 1 1 1 1 * O 1 1 1 1 1 p—1 1 I O 1 * 1 1 1 1 L_J ] 1 1 1 1 1 1 1 [ 0-* ] 0 [-*-] [0 * ] 0 [-*] 0 [-*] [0 * ] [__o * ] 1 1 o[ 0[ * * 1 1 1 1 1 1 1[ 1 1 1 1 1 1 1 1 1 ] ] 1 29 4 3 5 24 0-* nr 1 20 22 [ 1 1 1 1 1 1 25 1 1 1 1 1 1 1 1 O 1 2 28 * 21 16 [-0*—] 1 1 1 1 1 1 1 1 1 1 31 15 1 1 1 1 1 10 0 [*] [_ *__o-] [ *0 ] O 23 1 1 1 1 18 43 27 07 04 09 12 15 21 23 37 39 49 50 52 54 54 58 59 67 78 88 92 03 77 43 13 18 59 22 41 53 00 61 07 31 1—1 1 1 1 1 O 1 1 8 19 0 -0 -0 -0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 -0 0 0 1 1 0 0 0 1 0 * 13 14 27 30 9 17 *Total High Raymondo Gratz Hollowell Hurlburt Rosen Ware Gonzalez Lane Gilligan Lane Lane Myers Gonzalez White Jones Marcoulides Koch Athey Sterling Porter Marcoulides McBride *Undergraduate Wang Christmann Athey Dinkins Olsen *Gradu Aberson Dorn Dorn *Mixed max 2.8023190765 * min -0.906125548 r—i 12 26 11 d r—I Author(s) [0 * ] 0 [ * ] 0 [ * ] 0 [ * ] 0 [ 0 [*] *_o ] [ 0-* ] [ 0 * ] 0[ * 0 [ *— 0 [—* ] [—*—] [0 * ] 0 [ * [-*—] Figure C.3 Forest plots for effect sizes grouped by level of education. 119 Author(s) 1—1 1 1 1 1 0 1 * 1 1 1 1 111 o-^ [-_o-* ] 1 1 1 1 1 0 1 * 1 1 1 1 1 1 1 111 ] to to r—1 * ] 1 1 1 1 1 1 O r—1 I 1 1 1 1 1 1 1 * 1 111 1 1 1 [0 —] 0 [ [©•-] 0 [-•-] 0 [-•] 0 [-•] [0 • 1 1 1 0 [__o [0 [ 1 • ] 1 1 1 1 1 1 ] •—] 1 1 1 1 1 -] -] * 1 1 1 1 1 1 1 O 1 J_L 1 ] -] * o[ —] [--•-] 1 1 —] 0] 0 [ *—] O 1 1 1 *- 1—r 1 1 1—1 o 24 OC OC 2 20 [ * 3 * 16 *0 ] [-O^—] 1 o o 1 14 15 ] 1 1 1 1 1 1 1 1 6 —] * • J—I 7 1 1 [ O 10 * 28 ] f-n 1 1 1 1 1 1 1 * 1 18 23 1 1 O 17 19 [ • 1—r 22 0[ 0[ 0 [ *__o-] ] ] 1 1 1 1 1—1 1 11 1 1 0 O 1 1 1 1 1 1 1 1 i__i 1 1 1 1 111 8 31 * 11 13 27 4 30 9 * 26 1 1 1 1 1 1 1 1 1 1 1 11 12 29 [*] 1 1 1 1 1 1 1 11 5 0 43 0 92 0 92 0 59 1 0 69 0 27 0 13 0 07 0 04 0 09 0 15 0 18 0 21 0 23 0 39 0 54 1 77 0 12 0 37 0 49 0 50 0 52 0 88 0 48 0 54 0 61 1 07 0 74 0 0 12 0 58 0 67 0 23 0 59 0 1 1 41 0.99 max 2.8023190765 • oo 21 *Total Porter *Drill Marcoulides Dinkins •Tutorial High Wang Raymondo Gratz Hollowell Rosen Christmann Ware Gonzalez Gilligan White McBride •Computation Lane Lane Lane Myers Sterling •Simulation Gonzalez Dorn Dorn •Multimedia Aberson Hurlburt Jones Koch •Web Athey Athey Marcoulides Olsen •Expert Systems * * 25 mm -0.906125548 • 1 0 [ 0 [0 0 0 0 —] • * ] [ •-—] — * [ [—•— -] Figure C.4 Forest plots for effect sizes grouped by mode. —] 120 8 1—11 11 1 •X0 11 1 ] —] 1 1 1 1 1 1 1 [—0 * —] -] l __l 0[ * ] -] [0 * —] [-0 * ] [0 * 0 [ *—~] ] 0 [ *-—] 0 [ *- 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 1 o 92 07 22 41 50 [ 0 [-*-] 0 [-*] 0 [-*] 0 * 1 — t 1 1 1 [ 0 [ [*-] * ] [-0* ] 1 1 1 1 1 1 1 1 11 5 24 ] * O 20 6 * 25 1 1 1 1 00 16 2 28 * 1 7 [0 [0 1 —1 1 1 1 1 1 1 1 1 O11 1 1 1 21 * 15 77 14 00 12 23 37 49 50 59 54 58 59 61 67 i __i 1 3 10 *0 ] [-0*—] [ 0-* ] [ 0-* ] 1 1 * 18 1 1 [ 1 11 11 11 11 11 O11 11 1 14 9 17 19 1 1 J __L 22 0 [*] *__o-] *_o ] OL * 23 31 1 1 [ 1[ L_J 11 13 27 4 30 27 13 07 04 09 15 18 21 39 52 max 2.8023190765 * p—11 011 * 11 11 26 0 -0 -0 -0 -0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 min -0.906125548 00 00 29 *Total High Wang Raymondo Gratz Hollowell Rosen Christmann Ware Gilligan Myers White McBride *Conimercial Aberson Hurlburt Gonzalez Lane Lane Lane Athey Gonzalez Jones Marcoulides Dorn Koch Athey Sterling Porter Marcoulides Dorn Dinkins Olsen *teacher-made o CO 12 d CO Author(s) c — ] [ [*] —*— Figure C.5 Forest plots for effect sizes grouped by type. — ] 121 Author(s) 12 29 26 11 1 13 14 27 4 9 17 19 18 23 3 10 15 21 7 16 2 28 25 20 6 5 24 22 8 31 30 d •Total 0.43 High -0.27 Wang -0.13 Raymondo -0.07 Gratz -0.04 Aberson 0.00 Hollowell 0.09 Hurlburt 0.12 Rosen 0.15 Christmann 0.18 Gonzalez 0.23 Lane 0.37 Lane 0.49 Lane 0.50 Myers 0.52 Athey 0.59 Gonzalez 0.54 Jones 0.58 Marcoulides 0.59 Dorn 0.61 Koch 0.67 Athey 0.78 Sterling 0.88 Porter 0.92 Marcoulides 1.03 Dorn 1.07 Dinkins 1.22 Olsen 1.41 McBride 1.77 •Interactive PC 0.43 Gilligan 0.39 White 0.54 *Inter Main 0.42 Ware 0.21 *Batch Main 0.21 mm -0.906125548 * 0 [ max 2.8023190765 * [*] *—0-] [ *_o [ [ ] *0—] *0 ] [-—*—] [—0*—] [—0* [ [ [ ] 0-* 0-* 0-* ] ] ] 0 [-*-] 0 [-*] 0 [-*] [0 [ * 0 [__0 0[ 0 [ [0 ] * ] * * * [_0 ] ] * ] * ] * [ ] * * ] ] * [ ] ] * [ * [*] [0—* [___0 ] ] * [0 0 [ 0 [ 0 0 [ 0 [ 0 0 0 ] * ] * ] C-—*—-3 [—0-* ] [—0-* ] Figure C.6 Forest plots for effect sizes grouped by level of interactivity. ] 122 12 29 26 11 13 27 4 30 17 8 19 18 31 22 7 16 28 25 20 6 5 22 1 14 9 23 3 10 15 0 -0 -0 -0 -0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 43 27 13 07 04 09 15 18 21 37 39 49 50 54 59 61 67 88 92 03 07 22 77 44 00 12 23 52 59 54 58 78 41 35 min -0.906125548 [*] 0 1 1 [— - *--0-] — -] - -*-0— 1[ 1 [- — -*o— -] 1 [-- - --*o— —] [ —0*---] 1 1 [— —0-* — - - ] 1 [— —0-* — —] [—0-* — - ] 1 0[ ] 1 —-] [01 0 [-*] 1 0 [-*] 1 — —] [-—0— — * 1 -* -] 0 [— 1 —- ] [0- — - * 1 — —] — * — [-0-- — 1 — >l< [--] 0 1 *— ] 0 [-1 —] [— 0 1 0 [1 * 0[ 1 0 1 [*] 0 1 1 [— !|< - ] [—0*- —] 1 ] 1 [-— —0-* — -] [0— — * 1 —— ] 1 [— —0— — -* —- ] — s|c [—0— 1 -* -] — 1 *-— ] — 1 [— — —* — 0 1 0[ ] 1 max 2.8023190765 * 1 1 o 2 24 *Total High Wang Raymondo Gratz Hollowell Rosen Christmann Ware Lane Gilligan Lane Lane White Marcoulides Dorn Koch Sterling Porter Marcoulides Dorn Dinkins McBride *Supplement Aberson Hurlburt Gonzalez Myers Athey Gonzalez Jones Athey Olsen •Substitute d 0 f—1 1 Author(s) — ] Figure C.7 Forest plots for effect sizes grouped by instructional role. 123 2 25 6 5 24 22 12 26 11 1 18 to 09 12 OC 13 14 17 19 49 50 42 [*] -—] ] [ 0-* ] [ 0-* ] [ 0-* ] [0-- - * *— ] [ 0-—] [-0-— * ] [ 0-— * * —— ] [0~ :1c ] [-0-*-] [0-] * 0 [ -* — 0 [ * 0[ 1 1 1 1 1 I1 1* 1 0 11 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 c— [-[-*-] *- ~ — ] — ] [ *0— -] *0— —] [—0-*— ] [0~ — * [ [ 1 o 28 20 00 OC OC o 00 o OC o 21 92 07 22 41 77 58 27 07 04 21 52 58 59 1 J_L 30 23 15 39 59 54 54 61 67 00 ^t 16 43 13 15 18 max 2.8023190765 * 1 1 1 1 ~] -] *— -] * — 0[ ] * 0 [ [—-*—] 0 0 [-*-] -] [—0*---] [—0*-—] 0 [-*-] 0 [-*] 0 [-*] [*] 0 1 1 1 1 * 1 1 31 7 0 -0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 -0 -0 -0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 min -0.906125548 r—1 10 *Total Wcing Rosen Christmann Gonzalez Gilligan Athey Gonzalez White Dorn Koch Athey Porter Dorn Dinkins Olsen McBride *N<=50 High Raymondo Gratz Ware Myers Jones Marcoulides Sterling Marcoulides *51<=N<=100 Aberson Hollowell Hurlburt Lane Lane Lane *N>100 CO 29 27 4 9 8 3 d t—1 Author(s) 1 1 1 1 1 1 Figure C.8 Forest plots for effect sizes grouped by sample size. 124 REFERENCES References marked with an asterisk indicate studies included in the meta-analysis. Aberson, C. L., Berger, D. E., Emerson, E. P., & Romero, V. L. (1997). WISE: Web interface for statistics education. Behavior Research Methods, Instruments, & Computers, 29, 217-221. *Aberson, C. L., Berger, D. E., Healy, M. R., Kyle, D. J., & Romero, V. L. (2000). Evaluation of an interactive tutorial for teaching the central limit theorem. Teaching of Psychology, 21, 289-291. Albert, J. H. (1993). Teaching Bayesian statistics using sampling methods and Minitab. The American Statistician, ^7, 182-191. American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author. Anderson, D. R., Burnham, K. P., & Thompson, W. L. (2000). Null hypothesis testing: Problems, prevalence, and an alternative. Journal of Wildlife Management, 64, 912-923. *Athey, S. (1987). A mentor system incorporating expertise to guide and teach statistical decision making. (Doctoral dissertation. The University of Arizona, 1987). Dissertation Abstracts International, 48, 0238. Bajgier, S. M., Atkinson, M., &: Prybutok, V. P. (1989). Visual fits in the teaching of regression concepts. The American Statistician, 43, 229-235. Bangert-Drowns, R. L., Kulik, J. A., & Kulik, C.-L. C. (1985). Effectiveness of computer-based education in secondary schools. Journal of Computer-Based Instruction, 12, 59-68. Barnet, B. D. (1999). A comparison of the effects of using interactive WWW simulations versus hands-on activities on the conceptual understanding and attitudes of introductory statistics students. (Doctoral dissertation, Iowa State University, 1999). Dissertation Abstracts International, 60, 3940. Bartz, A. E. (2001). Computer and software use in teaching the beginning statistics course. Teaching of Psychology, 28, 147-149. 125 Bayrakta, S. (2000). A meta-analysis on the effectiveness of computer-assisted instruction in science education. (Doctoral dissertation, Ohio University, 1990) Dissertation Abstracts International, 61, 2570. Becky, B. J. (1996). A look at the literature (and other resources) on teaching statistics. Journal of Educational and Behavioral Statistics, 21, 71-90. Begg, C. B. (1994). Publication bias. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 400-409). New York: Russell Sage Foundation. Beins, B. C. (1989). A BASIC program for generating integer means and variances. Teaching of Psychology, 16, 230-231. Ben-Zvi, D. (2000). Toward understanding the role of technological tools in statistical learning. Mathematical Thinking & Learning, 2, 127-155. Birge, R. T. (1932). The calculation of errors by the method of least squares. Physical Review, 40, 207-227. Bisgarrd, S. (1991). Teaching statistics to engineers. The American Statistician, 45, 274-283. Bower, G. H., & Hilgard, E. R. (1981). Theories of learning. Englewood Cliffs, NJ: Prentice Hall. Bradley, D. R., Hemstreet, R. L., & Ziegenhagen, S. T. (1992). A simulation laboratory for statistics. Behavior Research Methods, Instruments, & Computers, 24, 190-204. Bradstreet, T. E. (1996). Teaching introductory statistics courses so that nonstatisticians experience statistical reasoning. The American Statistician, 50, 69-78. Briggs, N. E., & Sheu, C. F. (1998). Using Java in introductory statistics. Behavior Research Methods, Instruments, & Computers, 30, 246-249. Britt, M. A., Sellinger, J., & Stillerman, L. M. (2002). A review of ESTAT: An innovative program for teaching statistics. Teaching of Psychology, 29, 73-75. Bureau of Labor Statistics (2002). Occupational outlook handbook, 2002-2003. Burns, P. K., & Bozeman, W. C. (1981). Computer-assisted instruction and mathematics achievement: Is there a relationship? Educational Technology, 21, 32-39. 126 Bushman, B. J. (1994). Vote-counting procedures in meta-analysis. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 193-213). New York: Russell Sage Foundation. Butler, D. L., &: Eamon, D. B. (1985). An evaluation of statistical software for research and instruction. Behavior Research Methods, Instruments, & Computers, 17, 352-358. Butler, D. L., & Neudecker, W. (1989). A comparison of inexpensive statistical packages for microcomputers running MS-DOS. Behavior Research Methods, Instruments, & Computers, 21, 113-120. Cake, L., & Hostetter, R. C. (1986). DATAGEN: A BASIC program for generating and analyzing data for use in statistics courses. Teaching of Psychology, 13, 210-212. Carpenter, E. H. (1993). Statistics and research methodology: Authoring, multimedia, and automation of social science research. Social Science Computer Review, 11, 500-514. Castellan, N. J. (1982). Computers in psychology: A survey of instructional applications. Behavior Research Methods and Instrumentation, 14, 198-202. Cerny, B., & Kaiser, H. F. (1978). Computer program for the canonical analysis of a contingency table. Educational & Psychological Measurement, 38, 835. Chadwick, D. K. H. (1997). Computer-assisted instruction in secondary mathematics classroom: A meta-analysis. (Doctoral dissertation. Drake University, 1997). Dissertation Abstracts International, 58, 3478. Christmann, E. P. (1995). A meta-analysis of effect of computer-assisted instruction on the academic achievement of students in grades 6 through 12: A comparison of urban, suburban, and rural educational settings (sixth-grade, 12-grade, urban education, rural education. (Doctoral dissertation. Old Dominion University, 1995). Dissertation Abstracts International, 56, 3089. *Christmann, E. P., &: Badgett, J. L. (1997). Microcomputer-based computer-assisted instruction within differing subject areas: A statistical deduction. Journal of Educational Computing Research, 16, 281-296. Christmann, E. P., & Badgett, J. L. (1999). The comparative effectiveness of various microcomputer-based software packages on statistical achievement. Computers in the Schools, 16, 209-220. 127 Christmann, E., & Badgett, J. L. (2000). The comparative effectiveness of CAI on collegiate academic performance. Journal of Computing in Higher Education, 11, 91-103. Cobb, G. (1992). Teaching statistics. In L. A. Steen (Ed.). Heeding the call for changes: Suggestions for curricular action (pp. 3-43). Washington, D.C.: Mathematical Association of American. Cochran, W. G. (1937). Problems arising in the analysis of a series of similar experiments. Journal of the Royal Statistical Society, 4 (Suppl.), 102-118. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum. Cohen, J. (1994). The earth is round {p < .05). American Psychologist, 49, 997-1003. Cohen, P. A., & Dacanay, L. S. (1991). Computer-based instruction and health professions education: A meta-analysis of outcomes. Evaluation and the Health Professions, 15, 259-281. Collis, B. (1983). Teaching descriptive and inferential statistics using a classroom microcomputer. Mathematics Teacher, 76, 318-322. Conard, E. H., & Lutz, J. G. (1979). APRIORI: A FORTRAN IV computer program to select the most powerful a prior comparison methods in an analysis of variance. Educational & Psychological Measurement, 39, 689-691. Cooley, W. W. (1969). Computer-assisted instruction in statistics. Paper presented at the conference on Statistical Computation. Madison: University of Wisconsin. (ERIC Document Reproduction Service No. ED035249) Cooper, H. M. (1979). Statistically combining independent studies: A meta-analysis of sex differences in conformity research. Journal of Personality and Social Psychology, 37, 131-146. Cooper, H., & Hedges, L. V. (Eds.) (1994) The handbook of research synthesis. New York: Russell Sage Foundation. Cooper, H., & Hedges, L. V. (1994). Research synthesis as a scientific enterprise. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 3-14). New York: Russell Sage Foundation. 128 Couch, J. v., & Stoloff, M. L. (1989). A national survey of microcomputer use by academic psychologists. Teaching of Psychology, 16, 145-147. Cruz, R. F., & Sabers, D. L. (1996). All effect sizes are not created equally: Response to Ritter and Low (1996). Unpublished manuscript. Dambolena, I. G. (1986). Using simulations in statistics courses. Collegiate Microcomputer, 4, 339-343. Derry, S. J., Levin, J. R., & Schauble, L. (1995). Stimulating statistical thinking through situated simulations. Teaching of Psychology, 22, 51-57. Derry, S. J., Levin, J. R., Osana, H. P., & Jones, M. S., (1998). Developing middle-school students' statistical reasoning abilities through simulation gaming. In S. P. Lajoie (Ed.), Reflections on statistics: Learning, teaching, and assessment in grades K-12. Studies in mathematical thinking and learning (175-195). Mahwah, NJ: Erlbaum. *Dinkins, P. (1985). Development of a computer-assisted instruction courseware package in statistics and a comparative analysis of three management strategies. (Doctoral dissertation. Louisiana State University, 1985). Dissertation Abstracts International, ^7, 800. Dixon-Krauss, L. (1996). Vygotsky in the classroom: Mediated literacy instruction and assessment. White Plains, NY: Longman. Dokter, C., & Heimann, L. (1999). A web site as a tool for learning statistics. Computers in the Schools, 16, 221-229. *Dorn, M. J. (1993). The effect of an interactive, problem-based HyperCard modular instruction on statistical reasoning. (Doctoral dissertation. Southern Illinois University at Carbondale, 1993). Dissertation Abstracts International, 55, 2770. Duchastel, P. C. (1974). Computer applications and instructional innovation: A case study in the teaching of statistics. International Journal of Mathematical Education in Science & Technology, 5, 713-716. Eamon, D. B. (1992). Data generation and analysis using spreadsheets. Behavioral Research Methods, Instruments, & Computers, 24, 174-179. Earley, M. A. (2001). Improving statistics education through simulations: The case of the sampling distribution. Paper presented at the Annual Meeting of the Mid-Western Educational Research Association, Chicago, IL. (ERIC Document Reproduction Service No. ED458282) 129 Edgar, S. M. (1973). Teaching statistics; while simultaneously saving time, chalk, etc... Paper presented at the conference on Computers in the Undergraduate Curricula, Claremont, CA. (ERIC Document Reproduction Service No. ED079993) Egger, M., Smith, G. D., & Altma, D. G. (2001). Systematic reviews in health care: Meta-analysis in context (2nd ed.). London: BMJ books. Elmore, P. B., & Rotou, 0. (2001). A primer on basic effect size concepts. Paper presented at the Annual Meeting of the American Educational Research Association, Seattle, WA. (ERIC Document Reproduction Service No. ED453260) Emond, W. J. (1982). Some benefits of micro-computers in teaching statistics. Computers & Education, 6, 51-54. Erickson, M. L., & Jacobson, R. B. (1973). On computer applications and statistics in sociology: Toward the passing away of an antiquated technology. Teaching Sociology, 1, 84-102. Evans, G. E., & Newman, W. A. (1988). A comparison of SPSS PC+, SAS PC, and BMDP PC. Collegiate Microcomputer, 6, 97-106. Eysenck, H. J. (1952). The effects of psychotherapy: An evaluation. Journal of Consulting Psychology, 16, 319-324. Fisher, R. A. (1932). Statistical methods for research workers (4th ed.). London: Oliver and Boyd. Fletcher, J. D. (1990). Effectiveness and cost of interactive videodisc instruction in defense training and education (IDA Paper P-2372). Alexandria, VA: Institute for Defense Analyses. Fletcher-Flinn, C. M., & Gravatt, B. (1995). The efficacy of computer-assisted instruction (CAI): A meta-analysis. Journal of Educational Computing Research, 12, 219-242. Friel, S. N., Corwin, R. B., & Rowan, T. E. (1990). The statistics standards in K-8 mathematics (Implementing the standards). Arithmetic Teacher, 38, 35-40. Furtuck, L. (1981). The TREE system as a teaching aid in statistics, modeling and business courses. Computers and Education, 5, 31-36. 130 Gagne, R. (1985). The conditions of learning and theory of instruction (4th ed.). New York: Holt, Rinehart and Winston. Gagne, R., & Briggs, L. J. (1979). Principles of instructional design (2nd ed.). New York: Holt, Rinehart and Winston. Gagne, R., Wager, W., & Rojas, A. (1981). Planning and authoring computer-assisted instruction lessons. Educational Technology, 11, 17-21. Garfield, J. (1995). How students learn statistics. International Statistical Review, 63, 25-34. Garfield, J., k Ahlgren, A. (1994). Student reactions to learning about probability and statistics: Evaluating the quantitative literacy project. School Science and Mathematics, 94 , 89-97. *Gilligan, W. P. (1990). The use of a computer statistical package in teaching a unit of descriptive statistics. (Doctoral dissertation, Boston University, 1990). Dissertation Abstracts International, 51, 2302. Glass, G. V. (1976). Primary, secondary, and meta-analysis. Educational Researcher, 5, 3-8. Glass, G. V. (1978). Integrating findings: The meta-analysis of research. In L. L. Shulman, Review of research in education (pp. 351-379). Itasca, IL: F. E. Peacock. Glass, G. v., McGaw, B., & Smith, M. L. (1981). Meta-analysis in social research. Beverly Hills, CA: Sage. Glass, G. v., & Smith, M. L. (1979). Meta-analysis of research on the relationship of class size and achievement. Educational Evaluation and Policy Analysis, 1, 2-16. Gleser, L. J., & Olkin, I. (1994). Stochastically dependent effect sizes. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 339-355). New York: Russell Sage Foundation. *Gonzalez, G. M., & Birch, M. A. (2000). Evaluating the instructional efficacy of computer-mediated interactive multimedia: Comparing three elementary statistics tutorial modules. Journal of Educational Computing Research, 22, 411-430. Goodman, T. (1986). Using the microcomputer to teach statistics. Mathematics Teacher, 79, 210^215. 131 Gordon, F. S., & Gordon, S. P. (1989). Computer graphics simulation of sampling distributions. Collegiate Microcomputer, 7, 185-189. *Gratz, Z. S., Volpe, G. D., & Kind, B. M. (1993). Attitudes and achievement in introductory psychological statistics classes: Traditional versus computer-supported instruction. Proceedings of the Annual Conference on Undergraduate Teaching of Psychology, Ellenville, New York. (ERIC Document Reproduction Service No. ED365405) Gredler, M. E. (2001). Learning and instruction: Theory into practice (4th ed.). Upper Saddle River, NJ: Merrill/Prentice Hall. Grubb, R. E., & Selfridge, L. D. (1964). Computer tutoring in statistics. Computers and Automation, March, 20-26. Hahn, G. J. (1985). More intelligent statistical software and statistical expert systems; Future directions. The American Statistician, 39, 1-16. Hall, J. A., Tickle-Degnen, L., Rosenthal, R., & Hosteller, F. (1994). Hypotheses and problems in research synthesis. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 17-28). New York: Russell Sage Foundation. Hartley, S. (1978). Meta-analysis of the effects of individually paced instruction in mathematics. (Doctoral dissertation. University of Colorado, 1978). Dissertation Abstracts International, 38, 4003. Hassebrock, F., & Snyder, R. (1997). Applications of a computer algebra system for teaching bivariate relationships in statistics courses. Behavior Research Methods, Instruments, & Computers, 29, 246-249. Hatchette, V., Zivian, A. R., Zivian, M. T., & Okada, R. (1999). STAZ: Interactive software for undergraduate statistics. Behavioral Research Methods, Instruments, & Computers, 31, 19-23. Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational Statistics, 6, 107-128. Hedges, L. V. (1982) Estimation of effect size from a series of independent experiments. Psychological Bulletin, 92, 490-499. Hedges, L. V. (1984). Advances in statistical methods for meta-analysis. New Directions for Program Evaluation, 24, 25-42. 132 Hedges, L. V. (1986). Issues in meta-analysis. In E. Z. Zothkoph (Ed.), Review of Research in Education, 13 (pp. 353-398). Washington, DC: American Education Research Association. Hedges, L. V. (1994). Fixed effects models. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 285-299). New York: Russell Sage Foundation. Hedges, L. V., & Olkin, I. (1980). Vote-counting methods in research synthesis. Psychological Bulletin, 88, 359-369. Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. New York: Academic Press. Hergenhahn, B. R. (1988). An introduction to theories of learning (3rd ed.). Englewood Cliffs, NJ: Prentice Hall. *High, R. V. (1998). Some variables in relation to students' choice of statistics classes: Traditional versus computer-supported instruction. (ERIC Document Reproduction Service No. ED427762) Hilton, S. C., Grimshaw, S. D., & Anderson, G. T. (2001). Statistics in preschool. The American Statistician, 55, 332-336. Hogg, R. V. (1991). Statistical education: Improvements are badly needed. The American Statistician 45, 342-343. Hogg, R. V. (1992). Towards lean and lively course in statistics. In F. Gordon & S. Gordon (Eds.), Statistics for the twenty-first century. MAA notes No. 26 (pp. 3-13). Washington: Mathematical Association of America. *Hollowell, K. A., & Duch, B. J. (1991). Functions and statistics with computers at college level. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, IL. Holcomb, J. P., Jr., & Ruffer, R. L. (2000). Using a term-long project sequence in introductory statistics. The American Statistician, 54, 49-53. Howe, K. R., & Berv, J. (2000). Constructing constructivism, epistemological and pedagogical. In D. C. Phillips (Ed.), Ninety-ninth yearbook of the national society for the study of education: Part I. Constructivism in education opinions and second opinions on controversial issues (pp. 19-40). Chicago: The University of Chicago Press. 133 Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and bias in research findings. Newbury Park, CA: Sage. Hunter, J. E., & Schmidt, F. L. (1997). Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In L. L. Harlow, S. A. Mulaik, &: J. H. Steiger (Eds.), What if there were no significance tests? (pp. 37-64). Mahwah, NJ: Erlbaum. *Hurlburt, R. T. (2001). "Lectlets" deliver content at a distance: Introductory statistics as a case study. Teaching of Psychology, 28, 15-20. James, M. (1979). Mixed effects MANOVA using BMD12V. Educational & Psychological Measurement, 39, 45-47. Johnson, M. C. (1965). Note on the computer as an instructional tool in statistics. The American Statistician, 19, 32, 36. Jonassen, D. (1996). Computers in the classroom: Mindtools for critical thinking. Englewood Cliffs, NJ: Merrill/Prentice Hall. Jonassen, D., Peck, K., & Wilson, B. (1999). Learning with technology: A constructivist perspective. Englewood Cliffs, NJ: Prentice Hall. *Jones, E. R. (1999). A comparison of an all web-based class to a traditional class. Paper presented at the meeting of Society for Information Technology & Teacher Education International Conference, San Antonio, TX. Kao, M. T., & Lehman, J. D. (1997). Scaffolding in a computer-based constructivist environment for teaching statistics to college learners. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, IL. (ERIC Document Reproduction Service No. ED408317) Khalili, A., & Shashaani, L. (1994). The effectiveness of computer application: A meta-analysis. Journal of Research on Computing in Education, 21, 48-61. Khamis, H. J. (1991). Manual computations — A tool for reinforcing concepts and techniques. The American Statistician, 45, 294-299. Kidwell, P. A., & Ceruzzi, P. E. (1994). Landmarks in digital computing: A Smithsonian pictorial history. Washington, DC: Smithsonian Institution Press. Kirk, R. (1996). Practical significance: A concept whose time has come. Educational & Psychological Measurement, 56, 746-759. 134 Knief, L. M., & Cunningham, G. K. (1976). Effects of tutorial CBI on performance in statistics. AEDS Journal, 9, 43-45. *Koch, C., & Gobell, J. (1999). A hypertext-based tutorial with links to the Web for teaching statistics and research methods. Behavioral Research Methods, Instruments, & Computers, 31, 7-13. Krieger, H., & James, P. L. (1992). Computer graphics and simulations in teaching statistics. In F. Gordon & S. Gordon (Eds.), Statistics for the twenty-first century. MAA notes No. 26 (pp. 167-188). Washington: Mathematical Association of America. Kuchler, J. M. (1998). The effectiveness of using computers to teach secondary school (grades 6-12) mathematics: A meta-analysis. (Doctoral dissertation. University of Massachusetts Lowell, 1999). Dissertation Abstracts International, 59, 3764. Kulik, C.-L. C., & Kulik, J. A. (1986). Effectiveness of computer-based education in college. AEDS Journal, 19, 81-108. Kulik, C.-L. C., Kulik, J. A., & Bangert-Drowns, R. L. (1985). Effectiveness of computer-based education in elementary schools. Computer in Human Behavior, 1, 59-74. Kulik, C.-L. C., Kulik, J. A., & Shwalb, B. J. (1986). The effectiveness of computer-based adult education: A meta-analysis. Journal of Educational Computing Research, 2, 235-252. Kulik, J. A. (1994). Meta-analytic studies of findings on computer-based instruction. In E. L. Baker & H. F. O'Neil, Jr. (Eds.), Technology assessment in education and training (pp. 9-33). Hillsdale, NJ: Erlbaum. Lamb, A. (1992). Multimedia and the teaching-learning process in higher education. In J. Albright and D. Graf (Eds.), Teaching in the information age: The role of educational technology (pp. 33-42). San Francisco: Jossey-Bass. Lane, D. M. (1999). The Rice virtual lab in statistic. Behavioral Research Methods, Instruments, & Computers, 31, 24-33. Lane, D. M., & Tang, Z. (2000). Effectiveness of simulation training on transfer of statistical concepts. Journal of Educational Computing Research, 22, 383-396. 135 *Lane, J. L., & Aleksic, M. (2002). Transforming elementary statistics to enhance student learning. Paper presented at the Annual Meeting of the American Educational Research Association, New Orleans, LA. (ERIC Document Reproduction Service No. ED463332) Langdon, J. S. (1989). The effects of the use of software on students' understanding of selected statistical concepts. (Doctoral dissertation. The American University, 1989). Dissertation Abstracts International, 50, 1971. Lehman, R. S. (1972). The use of the unknown in teaching statistics. Paper presented at the EPA convention, Boston, MA. (ERIC Document Reproduction Service No. ED068581) Lee, C. (1999). Computer-assisted approach for teaching statistical concepts. Computers in the Schools, 16, 193-208. Leon, R. V., & Parr, W. C. (2000). Use of course home pages in teaching statistics. The American Statistician, 54, 44-48. Lewis, S., & Clark, M. (2001). Forest plots: Trying to see the wood and the trees. BMJ, 322, 1479-1480. Retrieved April 13, 2003, from http://bmj.eom/cgi/content/full/322/7300/1479 Liao, Y. C. (1998). Effects of hypermedia versus traditional instruction on students' achievement: A meta-analysis. Journal of Research on Computing in Education, 30, 341-359. Light, R. J. (1984). Six evaluation issues that synthesis can resolve better than single studies. In W. H. Yeaton & P. M. Wortman (Eds.), New directions for program evaluation: Issues in data synthesis, 26 (pp. 57-73). San Francisco: Jossey-Bass. Light, R. J., & Smith, P. V. (1971). Accumulating evidence: Procedures for resolving contradictions among different research studies. Harvard Educational Review, 41, 429-471. Lipsey, M. W. (1994). Identifying potentially interesting variables and analysis opportunities. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 111-123). New York: Russell Sage Foundation. Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage. 136 Lockard, J. D. (1967). Computers in undergraduate education: Mathematics, physics, statistics, and chemistry. College Park, MD: The University of Maryland, Science Teaching Center. Loftsgarrden, D. O., Rung, D. C. &: Watkins, A. E. (1997). Statistical abstract of undergraduate programs in the mathematical science: Fall 1995 CBMS Survey, MAA Notes, No. 2. Washington, DC: Mathematical Association of America. Loftsgarrden, D. O., & Watkins, A. E. (1998). Statistics teaching in colleges and universities: Courses, instructors, and degrees in Fall 1995. The American Statistician, 52, 308-314. Long, K. E. (1998). Statistics in the high school mathematics curriculum: Is the curriculum preparing students to be quantitatively literate? (Doctoral dissertation. The American University, 1998). Dissertation Abstracts International, 60, 87. Maddux, C. D., & Cummings, R. (1999). Constructivism; Has the term outlived its usefulness? Computers in the Schools, 16, 5-19. Malloy, T. E., &: Jensen, G. C. (2001). Utah Virtual Lab: Java interactivity for teaching science and statistics on line. Behavioral Research Methods, Instruments, & Computers, 33, 282-286. Marasinghe, M. G., & Meeker, W. Q. (1996). Using graphics and simulation to teach statistical concepts. The American Statistician, 50, 342-351. *Marcoulides, G. A. (1990). Improving learning performance with computer based programs. Journal of Computing Research, 6, 147-155. Matthews, M. R. (2000). Appraising constructivism in science and mathematics education. In D. C. Phillips (Ed.), Ninety-ninth yearbook of the national society for the study of education: Pari I. Constructivism in education opinions and second opinions on controversial issues (pp. 161-191). Chicago: The University of Chicago Press. Mausner, B., Wolff, E. F., Evans, R. W., DeBoer, M. M., Gulkus, S. P., D'Amore, A., et al. (1983). A program of computer assisted instruction for a personalized instructional course in statistics. Teaching of Psychology, 10, 195-200. Maxwell, S. E., & Delaney, H. D. (1990). Designing experiments and analyzing data: A model comparison perspective. Belmont, CA: Wadsworth. 137 *McBride, A. B. (1996). Creating a critical thinking learning environment: Teaching statistics to social science undergraduates. PS: Political Science & Politics, 29, 517-521. McCarty, L. P., & Schwandt, T. A. (2000). Seductive illusions: Von Glasersfeld and Gergen on epistemology and education. In D. C. Phillips (Ed.), Ninety-ninth yearbook of the national society for the study of education: Part I. Constructivism in education opinions and second opinions on controversial issues (pp. 41-85). Chicago: The University of Chicago Press. Mead, R. (1974). The use of computer simulation games in the teaching of elementary statistics to agriculturists. International Journal of Mathematical Education in Science & Technology, 5, 705-712. Milligan, G. W. (1979). A computer program for calculating power of the chi-square test. Educational & Psychological Measurement, 39, 681-684. Mills, J. D. (2002). Using computer simulation methods to teach statistics: A review of the literature. Journal of Statistics Education, 10. Retrieved November 19, 2002, from http://www.amstat.org/publications/jse/vlOnl/mills.html Mitchell, M. L., & Jolley, J. M. (1999). The Correlator: A self-guided tutorial. Teaching of Psychology, 26, 298-299. Mittag, K. C. (1993). A Delphi study to determine standards for essential topics and suggested instructional approaches for an introductory non-calculus-based college-level statistics course. (Doctoral dissertation, Texas A&M University, 1993). Dissertation Abstracts International, 54, 2933. Moore, C. N. (1974). Computer-assisted laboratory experiments of teaching business and economic statistics. International Journal of Mathematical Education in Science & Technology, 5, 713-716. Moore, D. S. (1993). The place of video in new styles of teaching and learning statistics. The American Statistician, 47, 172-176. Moore, D. S. (1997). New pedagogy and new content: The case of statistics. International Statistical Review, 65, 123-165. 138 *Myers, K. N. (1989). An exploratory study of the effectiveness of computer graphics and simulations in a computer-student interactive environment in illustrating random sampling and the central limit theorem. (Doctoral dissertation, The Florida State University, 1989). Dissertation Abstracts International, 51, 441. National Center for Education Statistics. Retrieved November 22, 2002, from http://nces.ed.gov National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics. Reston, VA: National Council of Teachers of Mathematics. Newby, T. J. (1996). Instructional technology for teaching and learning: Designing instruction, integrating computers, and using media. Englewood Cliffs, NJ; Merrill. Newmark, J. (1996). Statistics and probability in modern life. Fort Worth: Saunders College Publishing. Niemiec, R. P., & Walberg, H. J. (1985). Computers and achievement in the elementary schools. Journal of Educational Computing Research, 1, 435-440. O'Keeffe, L., & Klagge, J. (1986). Statistical packages for the IBM PC family. New York: McGraw-Hill. Olkin, I. (1990). History and goals. In K. W. Wachter & M. L. Straf (Eds.). The future of meta-analysis (pp. 3-10). New York: Russell Sage Foundation. *01son, C. R., & Bozeman, W. C. (1988). Decision support systems: Applications in statistics and hypothesis testing. Journal of Research on Computing in Education, 20, 206-212. Orwin, R. G. (1983). A fail-safe N for effect size in meta-analysis. Journal of Educational Statistics, 8, 157-159. Ouyang, R. (1993). A meta-analysis: Effectiveness of computer-assisted instruction at the level of elementary education (K-6). (Doctoral dissertation. Indiana University of Pennsylvania, 1993). Dissertation Abstracts International, 54 , 0421. Papert, S. (1980). Mindstorms: Children, computers, and powerful ideas. New York: Basic Books. 139 Pearson, K. (1933). On a method of determining whether a sample of size n supposed to have been drawn from a parent population having a known probability integral has probably been drawn at random. Biometrika, 25, 379-410. Perry, M., & Kader, G. (1995). Using simulation to study estimation. Mathematics and Computer Education, 29, 53-64. Phillips, D. C. (2000). An opinionated account of the constructivist landscape. In D. C. Phillips (Ed.), Ninety-ninth yearbook of the national society for the study of education: Part /. Constructivism in education opinions and second opinions on controversial issues (pp. 1-16). Chicago: The University of Chicago Press. Pollane, L. P., & Schnittjer, C. J. (1977). The relative performance of five computer program packages which perform factorial-univariate analysis of covariance. Educational & Psychological Measurement, 37, 227-231. *Porter, T. S., & Riley, T. M. (1996). The effectiveness of computer exercises in introductory statistics. Journal of Economic Education, 21, 291-299. Pregibon, D., & Gale, W. A. (1984). REX: an expert system for regression analysis. Computational Statistics Quarterly, 1, 242-248. *Raymondo, J. C., & Garrett, J. R. (1998). Assessing the introduction of a computer laboratory experience into a behavioral science statistics course. Teaching Sociology, 26, 29-37. Ritter, M., & Low, K. G. (1996). Effects of dance/movement therapy: A meta-analysis. Arts in Psychotherapy, 23, 249-260. Roblyer, M. D. (1988). The effectiveness of microcomputers in education: A review of the research from 1980-1987. Technological Horizons in Educational Journal, 16, 85-89. Roblyer, M. D., & Edwards, J. (2000). Integrating educational technology into teaching (2nd ed.). Upper Saddle River, NJ: Merrill/Prentice Hall. Rogers, R. L. (1987). A microcomputer-based statistics course with individualized assignment. Teaching of Psychology, 14, 109-111. Romero, V. L., Berger, D. E., Healy, M. R., & Aberson, C. L. (2000). Using cognitive learning theory to design effective on-line statistics tutorials. Behavior Research Methods, Instruments, & Computers, 32, 246-249. 140 *Rosen, E., Feeney, B., & Petty, L. C. (1994). An introductory statistics class and examination using SPSS/PC. Behavior Research Methods, Instruments, & computers, 26, 242-244. Rosenthal, R. (1984). Meta-analytic procedures for social research. Beverly Hills, CA: Sage. Rosenthal, R. (1991). Meta-analytic procedures for social research (Rev. ed.). Newbury Park, CA; Sage Publication. Rosenthal, R. (1994). Parametric measures of effect size. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 231-281). New York: Russell Sage Foundation. Rosenthal, R., & Rubin, D. B. (1978). Interpersonal expectancy effects: The first 345 studies. Behavioral and Brain Science, 3, 377-386. Ryan, T. A., Joiner, B., & Ryan, B. (1976). Minitab student handbook. North Scituate, MA: Duxbury Press. Sandals, L. H., & Pyryt, M. C. (1992). New directions for teaching research methods and statistics: The development of a computer-based expert system. Paper presented at the Annual Meeting of the American Education Research Association, San Francisco, CA. (ERIC Document Reproduction Service No. ED349960) Scalzo, F., & Hughes, R. (1976). Integration of prepackaged computer programs into an undergraduate introductory statistics course. Journal of Computer-Based Instruction, 2, 73-79. Schacter, J., & Fagnano, C. (1999). Does computer technology improve student learning and achievement? How, when, and under what condition? Journal of Educational Computing Research, 20, 329-343. Scheaffer, R. L. (1990). Toward a more quantitatively literate citizenry. The American Statistician, 44-1 2-3. Scheaffer, R. L. (2001). In a world of data, Statisticians count. Retrieved November 10, 2002, from http://www.amstat.org/publications/arastat_news/2001/pres09.html Schmidt, F. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1, 115-129. 141 Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generalization. Journal of Applied Psychology, 62, 529-540. Schmidt, M., Weinstein, T., Niemiec, R., & Walberg, H. J. (1985). Computer-assisted instruction with exceptional children: A meta-analysis of research findings. Journal of Special Education, 19, 493-501. Schnittjer, C. J. (1976). Canonical correlation program: A comparative analysis of performance. Educational & Psychological Measurement, 36, 179-182. Schram, C. M. (1996). A meta-analysis of gender differences in applied statistics achievement. Journal of Educational and Behavioral Statistics, 21, 55-70. Sedlmeier, P. (1997). BasicBayes: A tutor system for simple Bayesian inference. Behavioral Research Methods, Instruments, & Computers, 29, 328-336. Skavaril, R. V. (1974). Computer-based instruction of introductory statistics. Journal of Computer-Based Instruction, 1, 32-40. Skinner, B. F. (1989). Recent issues in the analysis of behavior. Upper Saddle River, NJ: Merrill/Prentice Hall. Smith, M. L., & Glass, G. V. (1977). Meta-analysis of psychotherapy outcome studies. American Psychologist, 32, 752-760. Snee, R. D. (1993). What's missing in statistical education? The American Statistician, Jf.1, 149-154. Snell, J. L., & Peterson, W. P. (1992). Does the computer help us understand statistics? In F. Gordon & S. Gordon (Eds.), Statistics for the twenty-first century. MAA notes No. 26 (pp. 167-188). Washington: Mathematical Association of America. Snyder, P., & Lawson, S. (1993). Effect size estimates. The Journal of Experimental Education, 61, 334-349. Snyder, R. R. (1977) Computer simulations in teaching psychology. Paper presented at the annual meeting of the American Educational Research Association, New York. (ERIC Document Reproduction Service No. ED143313) Steiger, J. H. (1979). MULTICORR: A computer program for fast, accurate, small sample testing of correlational pattern hypotheses. Educational & Psychological Measurement, 39, 677-680. 142 Steinberg, E. R. (1991). Computer-assisted instruction: A synthesis of theory, practice, and technology. Hillsdale, NJ: Erlbaum. Stemmer, P. M., & Berger, C. F. (1985). Microcomputer programs for educational statistics: A review of popular programs. (ERIC Document Reproduction Service No. ED269442) Stephenson, W. R. (1990). A study of student reaction to the use of Minitab in an introductory statistics course. The American Statistician, 44> 231-235. *Sterling, J., & Gray, M. W. (1991). The effect of simulation software on students' attitudes and understanding in introductory statistics. Journal of Computers in Mathematics & Science Teaching, 10, 51-56. Sterrett, A., & Karian, Z. A. (1978). A laboratory for an elementary statistics course. American Mathematical Monthly, 85, 113-116. Stockburger, D. W. (1982). Evaluation of three simulation exercises in an introductory statistic course. Contemporary Educational Psychology, 7, 365-370. Strube, M. (1991). Demonstrating the influence of sample size and reliability on study outcome. Teaching of Psychology, 18, 113-115. Strube, M. J., & Goldstein, M. D. (1995). A computer program that demonstrates the difference between main effects and interactions. Teaching of Psychology, 22, 207-208. Susman, E. B. (1998). Cooperative learning: A review of factors that increase the effectiveness of cooperative computer-based instruction. Journal of Educational Computing Research, 18, 303-322. Tanis, E. A. (1973). A computer laboratory for mathematical probability and statistics. Paper presented at the conference on Computers in the Undergraduate Curricula, Claremont, CA. (ERIC Document Reproduction Service No. ED079985) Thomas, D. B. (1971). STATSIM: Exercises in statistics. Tallahassee: Florida State University. Computer-Assisted Instruction Center. (ERIC Document Reproduction Service No. ED055440) Thompson, B. (1994). Guidelines for authors. Educational & Psychological Measurement, 54, 837-847. 143 Thompson, B. (1999). If statistical significance tests are broken/misused, what practices should supplement or replace them? Theory & Psychology, 9, 165-181. Thompson, B. (2001). Significance, efi'ect sizes, stepwise methods, and other issues: Strong arguments move the field. The Journal of Experimental Education, 70, 80-93. Thompson, B., & Frankiewicz, R. G. (1979). CANON: A computer program which produces canonical structure and index coefficients. Educational & Psychological Measurement, 39, 219-222. Tippett, L. H. C. (1931). The methods of statistics. London: Williams &: Norgate. Tubb, G. W. (1977). Current use of computer in the teaching of statistics. Paper presented at the Computer Science and Statistics Annual Symposium, Gaitehersbury, MD. (ERIC Document Reproduction Service No. ED141109) Varnhagen, C. K., & Zumbo, B. D. (1990). CAI as an adjunct to teaching introductory statistics: Affecting mediates learning. Journal of Educational Computing Research, 6, 29-40. Velleman, P. F., & Moore, D. S. (1996). Multimedia for teaching statistics: Promises and pitfalls. The American Statistician, 50, 217-225. Vogel, D., & Klassen, J. (2001). Technology-supported learning: Status, issues and trends. Journal of Computer Assisted Learning, 17, 104-114. Walker, H. M. (1929). Studies in the history of statistical method, with special reference to certain educational problems. Baltimore: Williams & Wilkins. Walsh, J. F. (1993). Crafting questionnaire-style data: An SAS implementation. Teaching of Psychology, 20, 188-190. Walsh, J. F. (1994). One-way between subjects design: Simulated data and analysis using SAS. Teaching of Psychology, 21, 53-55. Wang, M. C., & Bushman, B. J. (1999). Integrating results through meta-analytic review using SAS software. Gary, NC: SAS Institute. *Wang, X. (1999). Effectiveness of statistical assignments in MPA education: An experiment. Journal of Public Affairs Education, 4, 319-326. 144 *Ware, M. E., & Chastain, J. D. (1989). Computer-assisted statistical analysis: A teaching innovation? Teaching of Psychology, 16, 222-227. Warner, C. B., & Meehan, A. M. (2001). Microsoft Excel-super(TM) as a tool for teaching basic statistics. Teaching of Psychology, 28, 295-298. Watts, D. G. (1991). Why is introductory statistics difficult to learn and what can we do to make it easier? The American Statistician, 45, 290-291. Wegman, E. J. (1974). Computer graphics in undergraduate statistics. International Journal of Mathematical Education in Science & Technology, 5, 15-23. West, R. W., Ogden, R. T., & Rossini, A. J. (1998). Statistical tools on the World Wide Web. The American Statistician, 52, 257-262. White, A. P. (1995). An expert system for choosing statistical tests. The New Review of Applied of Expert Systems, 1, 111-121. *White, S. L. (1985). Teaching introductory statistics: Hand calculations versus computer data analysis. Unpublished master's thesis, California State University. Wilkinson, L., & American Psychological Association Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604. Willett, J. B., Yamashita, J. J., & Anderson, R. D. (1983). A meta-analysis of instructional systems applied in science teaching. Journal of Research in Science Teaching, 20, 405-417. Wimberley, R. C. (1978). Comparing package programs for factor analysis. Educational & Psychological Measurement, 38, 143-145. Wolf, F. M. (1986). Meta-Analysis: Quantitative methods for research synthesis. Beverly Hills, CA: Sage. Wurster, C. (2001). Computers: An illustrated history. New York: Taschen. Yates, F., & Cochran, W. G. (1938). The analysis of groups of experiments. Journal of Agricultural Science, 28, 556-580.