azu_td_3089963_sip1_... - The University of Arizona Campus

advertisement
THE EFFECTIVENESS OF
COMPUTER-ASSISTED INSTRUCTION
IN STATISTICS EDUCATION: A META-ANALYSIS
by
Yung-chen Hsu
A Dissertation Submitted to the Faculty of the
DEPARTMENT OF EDUCATIONAL PSYCHOLOGY
In Partial Fulfillment of the Requirements
For the Degree of
DOCTOR OF PHILOSOPHY
In the Graduate College
THE UNIVERSITY OF ARIZONA
20 03
UMI Number: 3089963
UMI
UMI Microform 3089963
Copyright 2003 by ProQuest Information and Learning Company.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
ProQuest Information and Learning Company
300 North Zeeb Road
P.O. Box 1346
Ann Arbor, Ml 48106-1346
2
THE UNIVERSITY OF ARIZONA
GRADUATE COLLEGE
As members of the Final Examination Committee, we certify that we have
read the dissertation prepared by
entitled
Yung-Chen Hsu
Thp. Effectiveness of Computer-assisted Instruction in
Statistics Education; A Meta-analysis
and recommend that it be accepted as fulfilling the dissertation
requirement for the Degree of
Stimuli
Doctor of Philosophy
¥ '17- 03
Date
Date
Date
Date
Date
Final approval and acceptance of this dissertation is contingent upon
the candidate's submission of the final copy of the dissertation to the
Graduate College.
I hereby certify that I have read this dissertation prepared under my
direction and recommend that it be accepted as fulfilling the dissertation
requirement.
Dissertation Director
Date
3
STATEMENT BY AUTHOR
This dissertation has been submitted in partial fulfillment of requirements for
an advanced degree at The University of Arizona and is deposited in the University
Library to be made available to borrowers under rules of the Library.
Brief quotations from this dissertation are allowable without special
permission, provided that accurate acknowledgment of source is made. Requests for
permission for extended quotation from or reproduction of this manuscript in whole
or in part may be granted by the head of the major department or the Dean of the
Graduate College when in his or her judgment the proposed use of the material is in
the interests of scholarship. In all other instances, however, permission must be
obtained from the author.
SIGNED:
4
ACKNOWLEDGEMENTS
I would like to express my sincerest gratitude to the following people who
have taught me knowledge and skills and supported me on the journey of
completing the dissertation and my graduate study.
Professor Darrell L. Sabers, my advisor and chairman of my dissertation
committee, has supported me from my first statistics course through the last
moment of my study, has taught me knowledge in measurement and psychological
testing, and has encouraged me not giving up whenever I felt frustrated. Without
his strong support and encouragement, the completion of this dissertation would
have been impossible.
Professor Kenneth J. Smith, the chairman of my minor, has guided me to the
field of instruction technology. With taking the courses from Professor Smith, I have
learned knowledge in computer technology applied in instruction and have benefit in
the section of learning theories in this dissertation.
Dr. Patricia B. Jones, a member of my dissertation committee, has taught
me the knowledge and skills in applying SAS and SPSS in statistical analysis. The
courses I have taken with Dr. Jones inspired me to examine the effectiveness of
computer-assisted instruction in statistics education. With knowledge in performing
statistical analysis in SAS, I could have successfully completed the dissertation.
Dr. Philip E. Callahan and Professor Sarah M. Dinham, as members of my
dissertation committee have also taught me technology and statistics and have
greatly supported my graduate study.
I would also like to thank Ms. Patricia R. Bauerle, a long-term friend since I
came to the United States, who read the draft of the dissertation for several times
and provided precious comments and suggestions. And, special thanks go to
Mrs. Karoleen P. Wilsey for checking the format and the references of this
dissertation.
And above all, my parents, my husband, my in-laws, my sister, and many
friends gave me full support throughout my graduate study. And, my three
children, Andy, Alice, and Jasmine all waited patiently for me to finish doing my
endless homework.
5
DEDICATION
To my parents and parents-in-law
6
TABLE OF CONTENTS
LIST OF ILLUSTRATIONS
9
LIST OF TABLES
10
ABSTRACT
11
CHAPTER 1. INTRODUCTION
Background
Statement of the Problem
Research Questions
Significance of the Study
Definitions
12
12
16
18
19
20
CHAPTER 2. LITERATURE REVIEW
Directed Instruction
Skinner's Operant Conditioning Theory
Information-Processing Theory
Gagne's Learning Condition Theory
Constructivism
Piaget's Cognitive-Development Theory
Vygotsky's Cultural-Historical Theory
Varieties and Characteristics of Constructivism
Computer-Assisted Instruction in Statistics Education
Research Synthesis Methods
Traditional Review Methods
Statistically Correct Vote-Counting Methods
Meta-Analysis Methods
Meta-Analyses on Computer-Assisted Instruction
22
23
24
25
27
29
30
33
35
37
42
42
44
45
51
CHAPTER 3. METHOD
Research Questions
Sampling Criteria and Procedure
Study Characteristics
Publication Year
Publication Source
Educational Level of Participants
Mode of CAI Program
Type of CAI Program
55
55
56
58
58
59
59
59
59
7
TABLE OF CONTENTS—Continued
Level Of Interactivity of CAI Program
Instructional Role of CAI Program
Sample Size of Participants
Dependent Variable
Statistical Analysis
Conceptualization of Effect Size
Effect Size and Statistical Significance
Definition and Calculation of Effect Size
Combination of Effect Sizes
ANOVA Approach to Test the Moderating Effects of Categorical Study
Characteristics
Comparisons Among Groups
Regression Approach to Study Moderating Effects of Continuous Study
Characteristics
File Drawer Problem
CHAPTER 4. ANALYSIS AND RESULTS
Primary Study Selection
Reviewing and Coding the Primary Data
Examination for Selection Bias
Estimate of Overall Effect Size
Dependence of Effect Sizes
Fail Safe Number
Primary Study Characteristics
Publication Year
Publication Source
Educational Level of Participants
Mode of CAI Program
Type of CAI Program
Level Of Interactivity of CAI Program
Instructional Role of CAI Program
Sample Size of Participants
Comparisons Among Groups for Mode of CAI Program
CHAPTER 5. SUMMARY, DISCUSSION, CONCLUSIONS, AND
RECOMMENDATIONS
Summary
Discussion
Conclusions and Recommendations
60
60
60
61
61
61
63
64
67
69
72
73
73
76
76
78
78
80
81
83
84
84
87
88
89
91
93
94
95
96
99
99
103
106
8
TABLE OF CONTENTS—Continued
APPENDIX A. PRIMARY STUDY REVIEW SHEET
109
APPENDIX B. TABLES OF DATA
110
APPENDIX C. FOREST PLOTS FOR EFFECT SIZES GROUPED BY STUDY
CHARACTERISTICS
115
REFERENCES
124
9
LIST OF ILLUSTRATIONS
3.1.
Graphical representation of effect size
62
4.1.
4.2.
4.3.
4.4.
Funnel plot
Histogram of effect sizes
Normal quantile plot
Regression of effect sizes on publication year
79
81
82
85
C.l.
C.2.
C.3.
C.4.
C.5.
C.6.
C.7.
C.8.
Forest
Forest
Forest
Forest
Forest
Forest
Forest
Forest
plots
plots
plots
plots
plots
plots
plots
plots
for effect
for effect
for effect
for effect
for effect
for effect
for effect
for effect
sizes grouped
sizes grouped
sizes grouped
sizes grouped
sizes grouped
sizes grouped
sizes grouped
sizes grouped
by publication year
by publication source
by level of education
by mode
by type
by level of interactivity
by instructional role
by sample size
116
117
118
119
120
121
122
123
10
LIST OF TABLES
1.1.
Enrollments in Introductory Statistics, in Thousand (CBMS)
14
2.1.
2.2.
Relationships Between Learning Phases and Instruction Events
Findings of 12 Meta-Analyses on Computer-Based Instruction Published
between 1978 and 1991
Findings of 12 Meta-Analyses on Computer-Based Instruction Published
between 1993 and 2000
28
54
4.1.
4.2.
4.3.
4.4.
4.5.
4.6.
4.7.
4.8.
4.9.
4.10.
4.11.
4.12.
4.13.
4.14.
4.15.
4.16.
Statistics of Study Effect Sizes by Year
Q Statistics by Year
Statistics of Study Effect Sizes by Source
Q Statistics by Source
Statistics of Study Effect Sizes by Educational Level
Q Statistics by Educational Level
Statistics of Study Effect Sizes by Mode
Q Statistics by Mode
Statistics of Study Effect Sizes by Type
Q Statistics by Type
Statistics of Study Effect Sizes by Level of Interactivity
<5 Statistics by Level of Interactivity
Statistics of Study Effect Sizes by Instructional Role
Q Statistics by Instructional Role
Statistics of Study Effect Sizes by Sample Size
Q Statistics by Sample Size
86
87
88
88
89
89
90
91
92
92
94
94
95
95
96
96
B.l.
B.2.
B.3.
B.4.
Primary Study Data
Effect Size Data
Primary Study Characteristics
Standard Errors and Confidence Intervals
2.3.
53
Ill
112
113
114
11
ABSTRACT
The purpose of this study was to investigate the effectiveness of
computer-assisted instruction (CAI) in statistics education at the college level in the
United States. This study employed meta-analysis to integrate the findings from 25
primary studies which met a specific set of criteria. The primary studies were
selected from journal articles, ERIC documents, and dissertations.
Results of the meta-analysis produced an overall effect size estimate of 0.43,
indicating a small to medium positive effect of applying CAI in teaching
college-level introductory statistics on students' achievement. Several study
characteristics were examined for the association with the effect magnitude. These
characteristics included the publication year, the publication source, the educational
level of participants, the mode of the CAI program, the type of CAI program, the
level of interactivity of the CAI program, the instructional role of the CAI program,
and the sample size. The results of the analogous analysis of variance showed that
different modes of CAI programs produced significantly different effects on students'
achievement in learning statistics. Expert systems and drill-and-practice programs
were the most effective modes and were followed by multimedia, tutorials, and
simulations. Computational statistical packages and web-based programs were the
least effective modes. The teacher-made CAI programs were significantly more
effective than the commercially-developed CAI programs. The effectiveness of CAI
program in teaching statistics did not differ significantly according to the study
characteristics of the publication year, the publication source, the educational level
of participants, the level of interactivity of CAI program, the instructional role of
CAI program, and the sample size.
12
CHAPTER 1
INTRODUCTION
Background
Many people live in societies that depend heavily on information and
technology. Issues regarding politics, economics, education, and science are decided
and judged on the basis of data. Statistical reports, such as the results of surveys,
and observational and experimental studies are reported regularly in the media.
Statistical information has affected people's lives in various aspects. Therefore, the
ability to understand, interpret, and evaluate statistical findings has become an
essential skill for future citizens and workers in society (Ben-Zvi, 2000). The
19th-century prophet H. G Wells predicted that "statistical thinking will one day be
as necessary for efficient citizenship as the ability to read and write" (cited in
Newmark, 1996, p. 6). The employment of all kinds of jobs has increasingly
required workers to have analytical, quantitative, and computing skills. These
requirements have placed more pressure on educational systems to prepare and
equip students with statistical concepts and quantitative knowledge (Moore, 1997).
The teaching and learning of statistics has affected the curriculum in all
levels of education. In the United States, students from preschool through grade 12
have been taught how to collect and interpret quantitative information (Derry,
Levin, Osana, & Jones., 1998; Friel, Corwin, & Rowan, 1990; Hilton, Grimshaw, &
Anderson, 2001; Long, 1998). The Center for Statistical Education has been
developing curriculum materials and conducting workshops mostly under the
Quantitative Literacy Project (QLP) of the American Statistical Association. The
QLP provides instructional materials on probability and statistics that can be used
in the pre-college curriculum (Garfield & Ahlgren, 1994). A set of guidelines has
13
been suggested for teaching statistics to the K-12 grade students (Scheaffer, 1990).
In addition, the release of the Principles and Standards of the National Council of
Teachers of Mathematics (NCTM) includes a content standard that emphasizes
statistical reasoning about data analysis and probability (NCTM, 2000).
In the past 20 years, the number of statistics courses at the college level has
been increasing for most of the disciplines in the United States (Loftsgarrden &:
Watkins, 1998: Moore, 1997). More undergraduate and graduate departments
require their students to acquire some understanding of statistics. According to
Barnet (1999), the first inclusion of statistics courses at the college level started in
the late 1800s. Economics and psychology departments usually offered these
statistics courses. In 1898, the Department of Mathematics at the University of
Illinois became the first mathematics department to offer a statistics course
(Walker, 1929). In 1925, the American Statistical Association found that 84 of 125
colleges surveyed offered statistics courses. Currently, the Bureau of Labor Statistics
(2002) reports that "About 80 colleges and universities offered bachelor's degrees in
statistics in 2000", and "In 2000, approximately 110 universities offered a master's
degree program in statistics, and about 60 offered a doctoral degree program"
(p. 179). However, there are about 4100 degree-granting colleges and universities
(including two-year and four-year public and private institutions) and about 14
million college students (NCES, 2002). Thus, in contrast, there are few colleges and
universities that offer degrees in statistics. The web site of the American Statistical
Association (http://www.amstat.org/education/) provides detailed information
about the departments of statistics in the colleges and universities.
During the past decades, society and employment have become more
quantitative. In Table 1.1, the increasing needs can be seen from the data of the
every-five year census of Conference Board of the Mathematical Science (CBMS) of
mathematical sciences departments (Loftsgaarden, Rung, & Watkins, 1997;
Scheaffer, 2001).
14
Table 1.1 Enrollments in Introductory Statistics, in Thousand (CBMS)
Term
Fall, 1990
Fall, 1995
Fall, 2000
*Based on
Statistics Mathematics
Depts
Depts
30
87
49
115
54
136
incomplete data
Colleges
2-year
54
72
84*
Totals
170
236
274
As the enrollments in the mathematics and statistics departments steadily
grow, the departments of the nonmathematical disciplines also increasingly provide
introductory statistics courses to meet the needs of students without strong
mathematical backgrounds. However, due to the mathematical nature of statistics,
students often have a fear of formulas and mathematical reasoning which easily
leads to negative attitudes and severe frustration in learning statistics (Hogg, 1992).
The question turns to how teachers can help these students learn effectively.
A successful introductory statistics course depends on many factors and efforts from
both the teacher and the students. Barnet (1999) and Moore (1997) describe how
statistics courses have usually been taught in a traditional way. Many people who
have taken statistics courses probably have had the experience of sitting in a class in
which the teacher stood in front of the blackboard or used the overhead projector to
give a lecture with the students taking notes. Then, assignments or projects were
given to reinforce students' understanding. And, tests were used to evaluate
students' learning. There has been criticism that traditional teaching of statistics
focuses on computation, formulas, and procedures rather than on statistical
reasoning and the ability to interpret, evaluate, and flexibly apply statistical ideas
(Ben-Zvi, 2000). As Hogg (1992) pointed out, a problem with these traditional
methods is that they do not adequately equip the students to apply statistics in the
real world. The lectures usually focus on learning statistical concepts rather than on
15
the process of using statistical concepts to solve problems (Garfield, 1995).
Moreover, because the important fundamental concepts are highly abstract and
theoretical, the beginning students usually have difficulty in understanding the
lectures (Watts, 1991). Another hypothesis for the explanation of why traditional
methods do not work well is that lectures sometimes contribute to a high level of
anxiety, fear, and negative attitudes (Barnet, 1999). In a traditional class, the
students have a very passive role in learning and often feel helpless when facing a
subject that is difficult and intimidating. Also, teachers are often unimaginative in
their methods of delivery and unable to use the wide variety of simulations,
experiments, and individual or group projects that are possible (Hogg, 1992).
There has been a growing feeling that statistics education needs significant
changes (Bisgarrd, 1991; Bradstreet, 1996; Hogg, 1991, 1992; Snee, 1993; Watts,
1991). Moore (1997) called for a reform movement in the teaching of statistics to
beginners at the university level. The main suggestion was that "the most effective
learning takes place when content, pedagogy, and technology reinforce each other in
a balanced manner" (p. 124). Also, the American Statistical Association and the
Mathematical Association of American have provided the following
recommendations (Moore, 1997, p. 127):
1. Emphasize the elements of statistical thinking: (a) the need for data; (b) the
importance of data producing; (c) the omnipresence of variability; and (d) the
measuring and modeling of variability.
2. Incorporate more data and concepts with fewer recipes and derivations.
Whenever possible, automate computations and graphics. An introductory
course should: (a) rely heavily on real data; (b) emphasize statistical concepts;
(c) rely on computers rather than computational recipes; and (d) treat formal
derivations as secondary in importance.
16
3. Foster active learning, througli the following alternatives to lecturing: (a)
group problem solving and discussion; (b) laboratory exercises; (c)
demonstrations based on class-generated data; (d) written and oral
presentations; and (e) projects, either group or individual.
There is a great degree of general agreement on the guiding principles for the
changes of beginning statistics instruction (Moore, 1997). That is, the teacher
should emphasize more on concepts, data analysis, inference, and statistical
thinking; foster active learning through various alternatives to lecturing; and use
technological tools to automate computations and graphics (Barnet, 1999; Ben-Zvi,
2000; Cobb, 1992; Garfield, 1995; Hogg, 1992; Holcomb h Ruffer, 2000; Moore,
1997). Moore (1997) indicated that the chain of influence begins with technology.
The continuing revolution in computing has changed the practice of statistics, and
has subsequently changed the tastes of what constitutes interesting research in
statistics. Gradually, the combination of technology, professional practices, and
research tastes have affected introductory instruction in statistics education.
The kinds of technology that are generally used in introductory statistics
classes, according to Moore (1997), include television, video, and computing
software, graphing calculators, statistical software, simulation tools, and multimedia
products. For the purpose of this study, the focus will be on the computer-based
technology.
Statement of the Problem
Computers have increasingly been incorporated in introductory statistics
classes at the college and university level. For example, the survey conducted by
Castellan (1982) showed that about 50 percent of the respondents reported using
computers in courses of statistics and experimental methods in psychology
departments in the United States. Couch and Stoloff (1989) documented an
17
increasing use of computers for research methods and statistics at 71 percent and 66
percent, respectively, of a national sample of psychology departments. Mittag
(1993) concluded in a study that 49 percent of the instructional time of the
non-calculus-based statistics course should be data-based, 28 percent
computer-based, 13 percent probability-based and 10 percent based on other
approaches. Bartz (2001) indicated that 79 percent of North American
undergraduate departments indicated that computers were used in the statistics
courses. In most colleges and universities, the amount and quality of technology
continues to improve. Students have an increased access to computers, graphing
programs, Internet resources, and multimedia. Excellent software programs have
been made available for exploring data, presenting statistical concepts, and even
tutoring students (Moore, 1997).
As advanced computer technology has developed, there have been numerous
tutorials, simulations/demonstrations, and computational packages that support
statistical instruction and learning (Lee, 1999; Marasinghe & Meeker, 1996; Mills,
2002; West, Ogden, & Rossini, 1998). Becky (1996) found that in the literature the
most frequent topics of statistical instruction focuses on teaching approaches and
the second most frequent topics focus on using computers in teaching statistics.
However, the problem was that most of these studies usually described the
development of the computer programs and the implementation in the classes
without investigating the effectiveness of these computer tools (e.g., Beins, 1989;
Brigg &: Sheu, 1998; Britt, Bellinger, & Stillerman, 2002; Butler &: Eamon, 1985;
Derry, Levin, & Schauble, 1995; Eamon, 1992; Hatchett, Zivian, Zivian, & Okada,
1999; Malloy & Jensen, 2001; Mitchell & Jolley, 1999; Rogers, 1987; Sedlmeier,
1997; Strube &: Goldstein, 1995; Walsh, 1993, 1994; Warner & Meehan, 2001). Even
though some studies have conducted experiments to test the effects, the results
seem to be, at times, inconsistent and contradictory (e.g., Athey, 1987; Gonzales &
Birch, 2000; Gratz, Volpe, & Kind, 1993; Hurlburt, 2001; Lane & Aleksic, 2002;
18
Marcoulides, 1990; Porter & Riley, 1996; Ware & Chastain, 1989).
As many instructors strive to incorporate computer tools in teaching
statistics, an important question that should be asked is if the tools have positive
effects on assisting students' statistical learning. There are a number of
meta-analyses that synthesize the research of computer-assisted instruction in
colleges. For example, Kulik and Kulik (1986) found that when receiving
computer-assisted instruction, college students overall outperformed 60 percent (an
effect size of 0.26) of other students who were taught with a traditional method.
However, few meta-analyses have been conducted for the effectiveness of
computer-assisted instruction in the field of statistics education.
Research Questions
The purpose of this meta-analysis is to examine the following questions;
1. How effective is the use of computer-assisted instruction (CAT) in enhancing
the statistical learning of college students as compared with non-computer
instructional techniques?
2. Does the effectiveness of CAI differ by the publication year of the study?
3. Does the effectiveness of CAI differ by the source of the study (dissertation,
journal article, or ERIC document)?
4. Does the effectiveness of CAI differ by students' level of education
(undergraduate or graduate)?
5. Which modes of computer-assisted instruction (CAI) techniques are the most
effective for statistical instruction for college students? For example, there are
drill-and-practice, tutorials, multimedia, simulations, computational statistical
programs, expert systems, and web-based programs.
19
6. Does the effectiveness of CAI differ by the software type (commercial or
teacher-made)?
7. Does the effectiveness of CAI differ by the level of interactivity of the program
(interactive-PC, interactive-mainframe, or batch-mainframe)?
8. Does the effectiveness of CAI differ by the role of the program (supplement or
substitute)?
9. Does the effectiveness of CAI differ by the sample size of the participants?
Significance of the Study
This meta-analysis synthesized and integrated the results from various
experimental studies that investigated the effectiveness of using computers to teach
college-level introductory statistics courses during the last twenty years. As
computers have increasingly been incorporated into teaching statistics, the
investigation of the effect of the computer programs and instruction has become
important. While there have been numerous studies and reports regarding a
diversity of computer programs in statistics education, there have been relatively a
small number of studies that have conducted experiments to examine the effect of
these computer programs and methods. In addition, there have been positive and
negative results of the application of computer programs in teaching statistics. This
meta-analysis provides a systematic and quantitative analysis by integrating the
results from the selected primary experimental studies and examining the
effectiveness of various variables. This meta-analysis identifies the most
advantageous modes of computer programs and provides an overall effect size as
well as detailed effect sizes according to various variables. Moreover, this study
contributes to the literature that contains few meta-analytic studies investigating
the effectiveness of applying CAI in statistics education.
20
Definitions
For the purposes of this study, the following terms are defined.
Computer-Assisted Instruction (CAI) can be defined as the use of computers
to assist in instructional activities. CAI is synonymous with computer-assisted
learning, computer-based instruction, computer-based education, computer-based
learning, and computer-enhanced instruction. The students can receive some or all
of their course materials or instruction by interfacing with computer programs on
microcomputer, mainframe computer, or through the Internet. These instructional
methods usually includes various modes (Jonassen, 1996; Robyler & Edwards, 2000;
Steinberg, 1991):
1. Drill-and-practice provides the students with problems in increasing
complexity to solve. With drill, the students are presented with relatively easy
problems to which they can answer quickly with practice; the students will
answer more complex problems which may require more problem-solving
activities. This method allows the students to work problems or answer
questions at their own pace and obtain immediate feedback on correctness.
2. Tutorials act like tutors by providing the information and instructional
activities students need to master a topic. Tutorials usually present
information summaries, explanations, practice routines, feedback, and
assessment.
3. Simulations & games model real-world phenomena or artificial environments.
These programs require students to respond to computer-driven and changing
situations that allow the students to predict outcomes based on their input
and to explore or discover new information.
4. Problem solving programs perform complex numerical, algebraic, or symbolic
calculations. These programs assist students in understanding the principles
21
and rules through explanation or practice. The steps involved in solving
problems can help students acquire problem-solving skills by providing the
opportunities to solve problems.
5. Expert systems act as advisors and offer suggestions to assist students'
decision-making process. These systems employ artificial intelligence to
computerize human expertise in a specific domain.
6. Multimedia is software that connects elements of a computer system (e.g.,
texts, movies, pictures, and other graphics). It includes hypermedia which
contains hypertext links. Hypertext consists of text elements such as keywords
that can be cross-referenced with other occurrences of the same words or with
related concepts.
Effect size (ES) refers to an estimate of the magnitude of an effect or, more
generally, the size of the relationship between two variables. The most basic form is
the standardized mean difference of the outcomes between an experimental group
and a control groups (Rosenthal, 1991).
Meta-analysis is a statistical analysis of a collection of analysis that results
from individual studies for the purpose of integrating the findings. It is a
quantitative method or research synthesis, that uses various measurement
techniques and statistical analyses to aggregate and analyze the descriptive and
inferential statistics of primary studies focusing on a common topic of interest.
22
CHAPTER 2
LITERATURE REVIEW
This chapter first reviews the learning theories which have influenced various
forms of computer-assisted instruction since the 1970s. During these years, as
computer technology has increasingly and dramatically developed, more educators
have supported the appropriate use and the instructional role of computers in
education (e.g., Jonassen, Peck, & Wilson, 1999; Lamb, 1992; Newby, 1996; Vogel k.
Klassen, 2001). The content of higher education has also changed so rapidly that
content knowledge sometimes becomes outdated before students even graduate.
Thus, educational goals should not be confined to specific knowledge or skills.
Rather, the goals should focus on providing students with creative problem-solving
skills, teaching students with pro-active approaches, and equipping students with
abilities to adapt to the social needs (Newby, 1996; Vogel & Klassen, 2001). Most
educators seem to agree that changes are needed in education. However,
disagreements among learning theorists have centered on which strategies will be
more effective in achieving today's educational goals (Roblyer & Edwards, 2000).
The brief review provides a general overview of the learning theories for directed
instruction and constructivism and the influences of these learning theories on the
development of computer-assisted instruction.
A review of research synthesis methods is also provided in this chapter. As
compared with a single study, research syntheses can increase statistical power as a
result of increased sample sizes. Research syntheses can also be effective in
identifying interactions between the treatment and the study (Light, 1984). In
education, the ability and effectiveness to identify interaction can benefit students
and assist policy decision-making regarding various educational programs. These
23
benefits include: (a) helping to match instructional methods with individual student
needs, (b) explaining the most effective treatment features, (c) explaining
inconsistent or conflicting learning outcomes, (d) determining critical performance
outcomes, (e) assessing the stability of treatment effectiveness, and (f) assessing the
importance of research design (Light, 1984). Both research synthesis and learning
theories provide important information for education policy and educational
practice.
The third section reviews the research studies that have applied
meta-analysis methods to investigate the effectiveness of computer-assisted
instruction in general education and statistics education in colleges and universities.
Directed Instruction
Learning can occur in many situations and might be the result of deliberate
efforts or unintended circumstances. Bower and Hilgard (1981) defined learning as
"the change in a subject's behavior or behavior potential to a given situation
brought about by the subject's repeated experiences in that situation, provided that
the behavior change cannot be explained on the basis of the subject's native
response tendencies, maturation, or temporary states" (p. 11). Newby (1996) also
defined learning as "a change in human performance potential that results from
practice or other experience and endures over time" (p. 25). Learning is basically
concerned with a change in the possession of knowledge.
There are different views on teaching and learning. Roblyer and Edwards
(2000) have provided two basic categories. One is called directed instruction, and
the other is constructivism. Directed instruction is grounded primarily in
behaviorist learning theories and the information processing of the cognitive
learning theories. Constructivism evolved from branches of cognitive theory. A few
computer applications such as "drill and practice" and tutorials are associated with
24
directed instruction. Most others, such as problem-solving, multimedia applications,
and telecommunications can either facilitate directed instruction or constructivist
environments, depending on how the teacher implements the applications.
Behavioral theories and information-processing theories have contributed to
the development of directed instruction (Roblyer &: Edwards, 2000). Behaviorists
have concentrated on the changes of observable behavior in performance as
indicators of learning which is not the result of maturation. There have been many
important behavioral theories, such as Edward L. Thorndike's connectionism, Ivan
Petrovich Pavlov's classical conditioning, Edwin R. Guthrie's contiguous
conditioning, Clark Hull's systematic behavior theory, B. F. Skinner's operant
conditioning, and William K. Estes's stimulus sampling theory (Bower k Hilgard,
1981).
Skinner's Operant Conditioning Theory
Among these behavioral theorists, B.F. Skinner generated much of the
experimental data that laid the basis of behavioral learning theory (Roblyer &
Edwards, 2000). In Skinner's view, learning is behavioral change. Learning is
defined as "a change in the likelihood or probability of a response" (Gredler, 2001,
p. 90). Skinner's operant conditioning model postulated three essential elements of
learning; discriminative stimulus, response, and the reinforcing stimulus. He
distinguished responses into two classes: respondents and operants. Respondents
are the reflex actions elicited by a given stimulus, and operants are emitted
responses without any obvious stimulus, which is attributed to internal processes in
the brain. Operants act on the environment to have different kinds of consequences
which affect the person and change future behavior (Gredler, 2001). For example,
singing a song may operate on the environment to produce consequences like praise,
applause, or money.
25
Within Skinner's model, the major job of the teacher is to modify students'
behavior by setting up situations to reinforce students when they show desired
responses and also teach the students to exhibit the same response in such
situations (Roblyer & Edwards, 2000). Skinner emphasized that teaching occurs
when a response is evoked for the first time and is then reinforced. Therefore, the
design of effective instruction requires careful attention in the selection of the
discriminative stimuli and the use of reinforcement (Gredler, 2001).
In 1954, Skinner started to invent a mechanical device to assist teaching
math, reading, spelling, and other subjects. Skinner's devices and other models were
called "teaching machines" or "autoinstructional devices", and the materials were
called programs (Bower & Hilgard, 1981). The teaching machines provided
contingent reinforcement for right answers in the form of (a) confirmation for
correct answers, (b) a move forward to new materials, and (c) operating the
equipment by the students. The students move forward at their own pace. Skinner
(1989) described the teaching machine as a mechanical anticipation of the
computer. He also considered the computer as the ideal teaching machine because
computers can bring aspects of real life into the classroom and also expand the
range of potential reinforcers. The characteristics of the computer make it especially
appropriate for use in tutorials, drill and practice, and simulation/gaming
instructional modes (Kuchler, 1998).
Information-Processing Theory
Behaviorists have only paid attention to external, directly observable
indicators of human learning. However, many people have found the explanation
insufficient to guide instruction. During the 1950s and 1960s, some cognitive
theorists started to propose the internal mental processes. Information-processing
theorists hypothesized that processes inside the brain allow people to learn and
26
remember. Based on a model of memory and storage proposed by Atkinson and
Shiffrin (cited in Roblyer & Edwards, 2000), the model proposed that the brain
contains certain structures that process information much like a computer does. It
hypothesizes that the human brain has three kinds of memory or stores: (a) sensory
registers-the part of memory that receives all the information a person senses
through five senses (sees, hears, feels, tastes, or smells); (b) short-term memory
(working memory)-the part where new memory is held temporarily until it is lost or
placed into long-term memory; and (c) long-term memory-the part that has an
unlimited capacity and can hold information indefinitely. In this model, learning
occurs through a process. First, sensory registers receive information and hold it for
a very short time, after which it either enters short-term memory or is lost. If the
person does not pay attention, the information may be lost before going to
short-term memory. Then, the information stays in short-term memory for 5 to 20
seconds. At this time, the person needs to practice or process the information which
then is stored in long-term memory. Otherwise, the information is lost. The
theorists also believed that the new information needs to be linked in some way to
prior knowledge (existing schema) in long-term memory.
The information-processing views provide the basis for instruction, which use
a variety of methods to increase the chances that students will pay attention to new
information and transfer the information to long-term memory. Some processing
aids, such as advance organizers, instructional-based aids, and learner-generated
cues for encoding and recalling, are suggested for improving learning (Gredler,
2001). The analogy of the human learning process as a computer information
processing system has also inspired computer-assisted instruction to develop
simulation, gaming to foster problem solving skills, and has guided the artificial
intelligence (AI) application to simulate human thinking and learning behaviors. In
another way, students can use programming languages to instruct the computer to
solve complex problems. Computers can be used as a mindtool to enhance learning.
27
Gagne's Learning Condition Theory
Robert M. Gagne built on the behavioral and information-processing learning
theories in developing practical instructional guidelines that teachers could
implement with directed instruction (Roblyer & Edwards, 2000). He defined
learning as "the set of cognitive processes that transforms the stimulation from the
environment into the several phases of information processing necessary for
acquiring a new capability" (Gagne & Briggs, 1979, p. 43). He believed that
learning is an important causal factor in development and is cumulative. Students
must have all the prerequisite skills they need to learn a new skill. Low-level skills
provide a foundation for higher-level skills. In order to acquire an intellectual skill, a
student has to go through a process of a "learning hierarchy". For instance,
students must possess the skills of number recognition, number facts, simple
addition and subtraction, multiplication, and simple division before they can work
long division problems (Roblyer & Edwards, 2000).
Gagne identified several varieties of learning outcomes when the student
acquires the knowledge. In general, they could be classified into five types:
intellectual skill, cognitive strategies, verbal information, motor skills, and attitudes.
The five distinct skills reflect the capabilities that the student acquires as a result of
learning. Gagne further subdivided the intellectual skills into four subcategories:
concept learning, discrimination learning, higher-order rule learning, and procedure
learning.
Gagne (1985) identified the internal states required in the student to acquire
the new skills. These states are called the internal conditions of learning. However,
Gagne thought that learning new skills also depends on the interactions between the
external environment. These environmental supports are called the external
conditions of learning (Gagne, 1985). They are also referred to as the events of
instruction.
28
Table 2.1 Relationships Between Learning Phases and Instruction Events
Description
Preparation
for learning
1.
2.
3.
Acquisition
and performance
Transfer of
Learning
Learning Phase
Attending
Expectance
1.
2.
5.
6.
7.
Retrieval to working
memory
Selective perception of
stimulus features
Semantic encoding
Retrieval and responding
Reinforcement
5.
6.
7.
8.
9.
Cueing retrieval
Generalizing
8.
9.
4.
3.
4.
Instruction Event
Gaining attention
Informing learner of
lesson objective
Stimulating recall of
prior learning
Presenting distinctive
stimulus features
Providing guide
Eliciting performance
Providing informative
feedback
Assessing performance
Enhancing retention
and learning transfer
Note. Adapted from Gredler, 2001, p. 149
Gagne (1985) applied the internal processes of the information processing
theories to analyzing learning. He identified nine stages of the learning process that
are fundamental to learning and must be executed in sequential order. The nine
stages are called phases of learning and can be categorized into three stages: (a)
preparation for learning, (b) acquisition and performance, and (c) transfer of
learning (see Table 2.1). Gagne believed that learning can occur whether or not
instruction is present. However, each of the learning process might be influenced in
some way by events external to the learner (Gredler, 2001). Gagne also proposed a
set of nine "events of instruction" that teachers could follow to arrange optimal
conditions of learning.
Gagne's principles focus on instruction rather than on simply teaching. His
instructional design uses the systems approach that is characterized by three major
features. First, instruction is designed for specific goals and objectives. Second, the
development of instruction uses media and other instructional technologies. Third,
29
pilot tryouts, material revisions, and field testing of the materials are an important
part of the systems design process (Gredler, 2001). The specific instructional plan
should be based on a detailed learning task analysis. Next, the instructor should
select the appropriate media which are compatible with the intended learning
outcomes, the students' characteristics, and the capabilities of the different
instructional media. The use of media can include low technology options (e.g., the
teacher's voice, printed texts, and real objects) or high technology options (e.g.,
computer-assisted instruction, instructional television, videocassette recording, and
mechanized delivery systems). Computer-assisted instruction may incorporate the
necessary external events of instruction which promote the corresponding internal
processes as outlined in Table 2.1.
The most common modes of computer-assisted instruction are drill-andpractice, simulation, gaming, and tutorials (Gagne, Wager, & Rojas, 1981).
Drill-and-practice computer programs provide two external instructional events
which can help elicit a learner's performance and ofi'er informative feedback. For
simulation and gaming computer programs, two additional events are included informing lesson objectives to the students and presenting stimuli with distinctive
features. The tutorial mode may be the most comprehensive mode in which all nine
external instructional events can be included (Gagne, Wager, & Rojas, 1981).
Constructivism
For the teaching methods in statistics education, Moore (1997) stated that
"the central idea of the new pedagogy is the abandonment of an information
transfer model in favor of a constructivist view of learning" (p. 124). Although there
are different opinions on how introductory statistics should be taught, most still
agree with the current thinking in the field of education that relies on the theory of
constructivism and the use of active learning situations in the classroom (Barnet,
30
1999; Dokter & Heimann, 1999). Basically, constructivist strategies are based on
the principles of learning that have been rooted in cognitive theories. The common
principle among these theories is that "learners construct knowledge themselves
rather than simply receiving it from knowledgeable teachers" (Roblyer & Edwards,
2000, p. 67).
Several early educators have contributed some of the fundamental thinking to
constructivism. As early as 1897, John Dewey argued that "education must be
conceived as a continuing reconstruction of experience that occurs through the
stimulation of the child's powers by the demands of the social situation in which he
finds himself" (cited in Newby, 1996, p. 34). Dewey emphasized the need to center
instruction around activities that are related and meaningful to students' own
experience.
Constructivist . learning theory has its primary foundations in the work of
Jean Piaget, Lev Vygotsky, and others (Howe & Berv, 2000; Maddux & Cummings,
1999). The following sections will briefly describe the main ideas of these theorists.
Piaget's Cognitive-Development Theory
Central to Piaget's stage theory is the idea that cognitive development from
infancy through adulthood is brought about by the individual's efforts to adapt to
the environment with specific goals (Gredler, 2001). Piaget proposed that children
pass through four stages of cognitive development (a) the sensorimotor stage (from
birth to about age 2), where children's innate reflexes interact with the environment;
(b) preoperational stage (from about age 2 to about age 7), where children begin
basic concept formation; (c) concrete operational stage (from about age 7 to about
age 11), where children use interiorized actions or thoughts to solve problems in
their immediate experience; and (d) formal operations stage (from about age 12 to
about age 15), where children can think through complete hypothetical situations
31
(Hergenhahn, 1988). The ages at which children experience these ages might vary.
In the sensorimotor stage, children explore the environment with senses and motor
activities. Then, children begin to understand the relation of cause and effect. In
the preoperational stage, children develop greater abilities in language and engage
in symbolic activities like drawing objects and playing imagination. Also, children
begin to classify things in certain classes of similarity but still make mistakes.
However, one of the most interesting characteristics of the preoperational stage is
that children fail to develop the concept of conservation. That is, they do not
understand that number, length, substance or area remains constant when these
things are presented in different ways. In the stage of concrete operation, children
develop some specific mental operations such as: conservation (i.e., ability to
perform reversible mental operation), class inclusion (i.e., ability to reason about a
part and the whole), seriation (i.e., ability to arrange things according to some
quantified dimension), decentration (i.e., ability to take another's point of view, and
relational thinking (i.e., ability to compare two or more objects simultaneously). In
the formal operations stage, children have developed the abilities of reasoning and
logical thinking. Children can also form and test hypotheses to organize
information, and reason scientifically, show abstract thinking, use higher-order
operations to solve problems, and think about one's own thoughts (Gredler, 2001).
Piaget believed that a child's cognition develops from one stage to another
through a gradual process of interacting with the environment. When the child
confronts new and unfamiliar features of the environment which do not fit his or her
current views of the world, the situation of "disequilibration" occurs and then the
child finds ways to resolve these conflicts through one of the two processes of
adaptation. One way is assimilation when the child attempts to modify or integrate
the new experiences into his or her existing view of the world. The other way is
accommodation when the child changes his or her schema to incorporate the new
experiences. As the child assimilates or accommodates the new situation, the state
32
of equilibration is gradually established (Roblyer & Edwards, 2000).
In Piaget's stage development theory, the goal of the learner is to move
successively and successfully through the lower stages of development to the highest
stage, the formal operations stage. Piaget indicated that formal operational thinking
cannot be acquired through direct transmission of knowledge. He recommended
"the use of active methods that require the learner to rediscover or reconstruct the
truths to be learned" (Gredler, 2001, p. 256). That is, the changes in the cognitive
structures (or schema) and the development of problem solving skills cannot be
brought about directly by an instructor. The only way for the child to successfully
achieve the formal operations stage is through repeated experimentation and
reinvention of the related rules. Direct teaching of ideas usually hampers the
learner's initiative and motivation in the construction of knowledge.
Piaget's theory is strongly oriented to scientific and mathematical thinking.
He emphasized that the use of actions and operations is important in mathematics
education. The use of activities and self-directed experimentation should be
provided as much as possible for science education. Collaboration and interchange
among the students themselves is crucial for the development of learning. However,
with the restriction of the cost or the availability of the experimentation equipment,
the computer is a convenient and important tool and has great potential to provide
the students with opportunities to go through a variety of activities and exercises in
actions (Gredler, 2001).
One of Piaget's famous pupils, Seymour Papert, had a profound influence on
applying technology in instruction (Roblyer & Edwards, 2000). Papert was a
mathematician. After studying with Piaget in Geneva from 1959 to 1964, he joined
the Artificial Intelligence Laboratory at the Massachusetts Institute of Technology
and conducted experiments with Logo, a programming language. Papert published
a book entitled Mindstorms: Children, Computers, and Powerful Ideas, which raised
national concerns about the potential role of technology in providing alternatives in
33
educational methods. This book also became the first widely recognized
constructivist view of educational practice with technology resources. In this book,
Papert viewed Logo as a resource for encouraging learning because Logo is
graphics-oriented and allows children to see cause-and-elfect relationships between
the logic of programming commands and the pictures. These Logo activities make
possible "microworlds" to incubate for knowledge and allow children to pose and
test hypotheses (Papert, 1980). Unlike Piaget who was not concerned with
instructional methods or curriculum matters and did not try to accelerate the stage
of cognitive development, Papert felt that "children could advance in their
intellectual abilities more quickly with the right kind of environment and assistance"
(Roblyer & Edwards, 2000, p. 64).
Vygotsky's Cultural-Historical Theory
The work of the Russian philosopher and educational psychologist Lev
Vygotsky also contributed great support to constructivist approaches. He had more
influence on the development of educational theory in the United States than in
Russia (Gredler, 2001). The primary goal of his work was to "reformulate
psychology as a part of a unified social science and to create a comprehensive
analysis of psychological function" (Gredler, 2001, p. 277). He felt that cognitive
development was directly related to and based on social development. Human
mental abilities develop through the interactions of the individual with the world.
That is, cognitive development is based on social interaction and experiences. He
designated "signs and symbols" as "psychological tools", which direct the mind and
change the process of thinking. These psychological tools are different across
cultures and throughout human history (Gredler, 2001). His concepts of
"scaffolding" and the "zone of proximal development" are important. He explained
that the student represents one end of the continuum of understanding and the
34
teacher represents the other. The gap between the two ends is called the "zone of
proximal development". Scaffolding is the process of bridging the gap between
teacher-supervised work and independent work.
Computer technology can be used as a "psychology tool" for students
(Dixon-Krauss, 1996). Technological devices and media are designed to facilitate
instruction that is developed slightly ahead of the student's development. Social
processes can also be facilitated through or imitated by the computer, which acts as
the more competent peer to enhance the zone of proximal development and
artificially provide a sociocultural means of learning. For example, multimedia
which gathers any combination of heading text, word-processed text, clip art,
animation, sound, graphics, movie clips, and control buttons into a format can be
highly interactive with the student (Dixon-Krauss, 1996). The advantage of
multimedia is for the user to choose the appropriate path and pace based on the
student's preferences and prior knowledge. Also, multimedia allows the student to
function at a high level on his or her own, as well as on a higher level in interacting
with the tool. That is, the tool acts as a "scaffold between superordinate and
subordinate concepts, linking the learner's prior knowledge to new knowledge"
(Dixon-Krauss, 1996, p. 180).
Vygotsky believed that the goal of education is to develop children's
personalities. The human personality has its creative potential, and education
should assist to discover and develop this potential to its fullest. With proper
activities, students can master their inner value and cognitive development.
Teachers can direct and guide the activities but cannot force their will on the
students. The most valuable methods for teaching are those that can meet
individual student's developmental stages and needs. Therefore, these methods
cannot be the same for every student (Roblyer &; Edwards, 2000). Vygotsky's ideas
have shown great influence on constructivist thought.
35
Varieties and Characteristics of Constructivism
According to Phillips (2000), constructivism refers to at least two different
things. For the first type, constructivism describes "the thesis about the disciplines
or bodies of knowledge that have been built up during the course of human history"
and " these disciplines are human constructs, and that the form that knowledge has
taken in these fields has been determined by such things as politics, ideologies,
values, the exertion of power and the preservation of status, religious beliefs, and
economic self-interest" (Phillips, 2000, p. 6). Many theorists believe that the origin
of human knowledge is to be explicated using sociological tools. This broad area of
constructivism is often called "social constructivism" or sometimes "social
constructionism".
There were different degrees of belief within social constructivism, such as
radical, progressive, conservative, reactionary, and so on. The most extreme version
was developed by a group known as "Edinburgh School" of sociologists of
knowledge. This school believes that "the form that knowledge takes in a discipline
can be fully explained, or entirely accounted for, in sociological terms" (Phillips,
2000, p. 8). Lev Vygotsky's social emphasis of constructing knowledge has great
influence in social constructivism. Some of the contemporary famous theorists
include David Bloor, Barry Barnes, Steve Woolgar, Bruno Latour, and Kenneth
Gergen (Phillips, 2000). Among them, Kenneth Gergen is the representative figure
of the radical end of social constructivism.
Regarding the second type, constructivism refers to a set of views about how
individuals learn (and about how those who help them to learn ought to teach).
Simply, this type of constructivist view posits that "learners actively construct their
own sets of meanings or understandings; knowledge is not mere copy of the external
world, nor is knowledge acquired by passive absorption or by simple transference
from one person (a teacher) to another (a learner)". In sum, "knowledge is made.
36
not acquired" (Phillips, 2000, p. 7). Phillips (2000) expressed this type as
"psychological constructivism". However, he clarified that not all psychological
constructivists are psychologists. The focus of the psychological constructivism is on
the way that individuals construct their own psychological understanding. Rand
Spiro, Ernst von Glasersfeld, and a group of researchers developed a constructivist
theory. These constructivists described their radical view of psychological
constructivism and label their position as "radical constructivists" (Roblyer &
Edwards, 2000; Phillips, 2000). Among them, Glasersfeld was the representative
figure. The main idea of radical constructivism is that "human knowledge cannot
consist of accurate representation or faithful copying of an external reality, that is,
of a reality which is nonphenomenal, existing apart from the subject's experiences"
(McCarty & Schwandt, 2000, p. 43). Also, knowledge is in the heads of persons who
have no alternative but to construct the knowledge based on their subjective
experiences. There is no way to know if two persons' experiences are exactly the
same (Phillips, 2000). Radical constructivism holds the belief that individuals can
only really know their own constructions of reality. They can construct truth that
needs no corroboration from outside (Howe & Berv, 2000). Piaget had a great
influence on radical constructivism. From the constructivist perspective, learning is
also a process of assimilation and accommodation. The achievement of equilibration
helps to develop complex levels of learning. Learning usually occurs without any
formal instruction (McCarty & Schwandt, 2000).
Both social and psychological constructivists have had great influence and
implications on education. They stress the importance of students being active
learners. Students construct knowledge themselves rather than passively receive
knowledge from teachers. Students need to have the abilities to solve real-life
practical problems rather than learning "inert knowledge" that they cannot use in
authentic situations. Also, students work in cooperative groups rather than
individually. However, radical constructivists expect a teacher to know every
37
student's mental constructions. The teacher needs to establish an environment in
which experiential and conceptual differences can support learning. For social
constructivists, the teacher is expected to be a coordinator, facilitator, or resource
advisor in assisting students to adapt to various social environments (McCarty k.
Schwandt, 2000).
Constructivism has had a significant influence in literature, art, social
science, religious education, and particularly in contemporary science and
mathematics education (Matthews, 2000; Phillips, 2000). There have been a
number of special issues in journals, such as Educational Studies in Mathematics,
Journal for Research in Mathematics Education, and Educational Research devoted
to constructivism (Matthews, 2000). The curriculum has also been influenced by
the constructivist theory. For example, the revised 1994 National Science Education
Standards illustrated that science is a mental representation constructed by the
individual (cited in Matthews, 2000). The influence has also spread to statistics
education as Moore (1997) has advocated.
Computer-Assisted Instruction in Statistics Education
When used in teaching statistics, computers are usually helpful in three
broad areas: (a) reducing the need for lengthy manual calculations, (b) facilitating
graphical data analysis, and (c) illustrating statistical concepts by means of
simulation experiments (Snell & Peterson, 1992). For example, when using
computational packages, such as SAS (Statistical Analysis System), SPSS
(Statistical Package for the Social Science), and Minitab, the students can save time
in tedious computation work. However, the package sometimes hides some useful
information necessary for understanding. Manual computations are still needed to
enhance the learning process and to reinforce the statistical concepts and techniques
(Khamis, 1991). Statistical graphics, such as histograms, boxplots, stem-and-leaf
38
diagrams, sampling distributions, and graphical presentations are helpful and
important for learning statistics (Krieger & James, 1992; Snell & Peterson, 1992).
Simulations are excellent tools in presenting some abstract concepts in dynamic and
interactive ways. For example, students can simulate the central limit theorem and
construct their understanding in the process with computer graphics (Krieger &
James, 1992). Simulations allow students to investigate phenomena in a simplified
and concrete setting (Barnet, 1999). In the past years, computers have been used in
teaching statistics. The history of computer-assisted instruction is briefly described
in the remainder of this section.
The use of computers in teaching statistics began as early as the 1960s.
Grubb and Selfridge (1964) developed a computer-based teaching machine using the
IBM 650 RAMAC System to tutor the students in learning statistics. They
mentioned that "a computer seems to be the only all-encompassing efficient tutorial
device in the growing teaching machine movement" (p. 20). Johnson (1965)
developed a program using the Michigan Algorithm Decoder to generate
quasi-normally distributed random numbers for teaching statistics. Cooley (1969)
used computers for laboratory exercises, generating random numbers, empirical
theoretical distributions, Monte Carlo studies, and computing means. A report of
the Computer Science Conference emphasized that computers will alter the
curriculum of some fields which include statistics in fundamental ways (Lockard,
1967).
In the early 1970s, more work was presented using computers in statistics
education. Most of these studies applied the techniques of graphical displays,
simulations, computational aids, drill-and-practice exercises, and tutorials on
mainframe (e.g., Duchastel, 1974; Edgar, 1973; Erickson & Jacobson, 1973; Lehman,
1972; Mead, 1974; Moore, 1974; Skavaril, 1974; Tanis, 1973; Thomas, 1971;
Wegman, 1974). In 1973, Minitab was developed for an introductory pre-calculus
statistics course offered at Pennsylvania State University. At that time, Minitab was
39
command driven and did not provide help or advice (Ryan, Joiner, & Ryan, 1976).
SAS, SPSS, and BMDP (Biomedical Computer Program) were also increasingly
used in analyzing data (Pollane & Schnittjer, 1977; Schnittjer, 1976; Wimberley,
1978). There were more studies in developing and applying a variety of statistical
computational programs (e.g., Cerny & Kaiser, 1978; Conard & Lutz, 1979; James,
1979; Milligan, 1979; Steiger, 1979; Sterrett & Karian, 1978; Thompson &
Frankiewicz, 1979). Some tutorials (Knief & Cunningham, 1976; Scalzo & Hughes,
1976) and simulations (Snyder, 1977) were used in assisting statistical instruction.
Computers were also used as a problem-solving tool to teach statistics (Tubb, 1977).
The IBM personal computer was introduced in 1981 and Apple Macintosh in
1984 (Kidwell & Ceruzzi, 1994). In the 1970s, the main statistical packages such
SPSS, SAS, BMDP and Minitab, were run on the DEC-11 minicomputer. In the
1980s, these packages all had personal computer versions (Evans & Newman, 1988).
The interactive mode on the personal computer allowed users to enter commands
one at a time in the process of executing a program. There were also more
computer statistical programs developed to assist statistical learning and teaching
(e.g., Bajgier, Atkinson, & Prybutok, 1989; Butler & Neudecker, 1989; Cake &
Hostetter, 1986; Collis, 1983; Dambolena, 1986; Emond, 1982; Furtuck, 1981;
Goodman, 1986; Gordon & Gordon, 1989; Mausner, Wolff, Evans, DeBoer, Gulkus,
D'Amore, et al., 1983; O'Keeffe & Klagge, 1986; Olson & Bozeman, 1988; Rogers,
1987; Stemmer & Berger, 1985; Stockburger, 1982; Ware & Chastain, 1989). Butler
and Eamon (1985) evaluated 17 microcomputer statistical packages and indicated
that the emphasis of these packages was usually not on analysis of research data but
helping students learn statistics concepts or procedures. In general, the
microcomputer packages were easier to learn and to use than were mainframe
packages. Couch and Stoloff (1989) conducted a national survey of microcomputer
used by academic psychologists and found that the most commonly used types of
software were statistical packages (31%), and the most valued courses of computer
40
use were research methods and statistics. In the early 1980s, the idea of creating
intelligent statistical software was presented (Hahn, 1985). The purpose was to have
the knowledge of statistical experts in computer programs provide guidance on
which analyses to conduct and how to interpret the results (Hahn, 1985). Pregibon
and Gale (1984) developed an expert system called REX to provide guidance,
interpretation, and instruction for doing regression analysis. Athey (1987) developed
a knowledge-based mentor system to assist statistical decision-making and to
stimulate students' learning of data analysis.
In the 1990s, as computer technology continued to be upgraded and
improved, the benefits and power of computers also attracted more development
and applications of computer programs and packages in teaching college-level
statistics. An increasing number of studies has been published in related journals
and publications. For example, tutorial programs were developed to demonstrate
statistical concepts (e.g., Mitchell & Jolley, 1999; Sedlmeier, 1997; Strube, 1991;
Strube & Goldstein, 1995). A large number of simulation programs were used in
presenting difficult and abstract statistical concepts (e.g., Albert, 1993; Bradley,
Hemstreet, & Ziegenhagen, 1992; Derry, Levin, & Schauble, 1995; Marasinghe &
Meeker, 1996; Perry &: Kader, 1995; Sterling & Gray, 1991). Computational
statistical packages such as SPSS, SAS, Minitab, and others were also frequently
used in assisting students in data analysis and interpretation (e.g., Christmann &
Badgett, 1997; Eamon, 1992; Gilligan, 1990; Gratz, Volpe, & Kind, 1993; High,
1998; Hollowell & Duch, 1991; Stephenson, 1990; Walsh, 1994; Wang, 1999). In
addition, an increasing number of applications of expert-system programs has been
used to support statistical decision-making and analysis (e.g., Marcoulides, 1990;
Sandals & Pyryt, 1992; White, 1995).
As multimedia technology was developed and used in different educational
settings, attempts of applying multimedia tools in teaching statistics were also
vigorously applied (e.g., Carpenter, 1993; Dorn, 1993; Gonzalez & Birch, 2000;
41
Hassebrock & Snyder, 1997; Koch h Gobell, 1999; Moore, 1993). Velleman and
Moore (1996) indicated that multimedia has had a dramatic influence on education
as well as statistics education. Multimedia offers a highly interactive and
individualized environment with texts, sound, images, full-motion video, animations,
and computer graphics for students to manipulate animations to respond to
questions and work independently on newly learned concepts. Finally, another
important development of computer technology is the prevalence of the Internet and
the fast growing applications on the World Wide Web. The Internet is popular
because it is widely available, easy to use, and highly visual and graphic (Roblyer &
Edwards, 2000). Naturally, statistics teachers have been trying to take advantage of
the Internet and build web-based computer-assisted tools for statistical teaching
(e.g., Aberson, Berger, Emerson, & Romero, 1997; Britt, Sellinger, &: Stillerman,
2002; Brigg & Sheu, 1998; Lane, 1999; Leon & Parr, 2000; Malloy k. Jensen, 2001;
Romero, Berger, Healy, & Aberson, 2000; West, Ogden, & Rossini, 1998).
With the review of computer technology applied to assist statistics education
in the past 40 years, computers have shown to play an important role in learning
and teaching statistics. Teachers, learning theorists, and computer specialists have
placed great effort in developing a wide range of programs and packages. In
particular, during these recent years, the newer technologies used to create
Web-based collaborative programs, intelligent expert systems, simulations, and
multimedia tools are based on socio-cultural theories, constructivist theories, and
cognitive theories (Schacter & Fagnano, 1999). The questions are: Do computer
technologies really have effects on improving students' statistical achievement and
learning? What types of programs are the most effective ones? During these years,
some researchers have conducted experimental studies to evaluate the effectiveness
of different types of computer programs and methods (e.g., Aberson, Berger, Healy,
Kyle, & Romero, 2000; Athey, 1987; Christmann & Badgett, 1997; Earley, 2001;
Dorn, 1993; Gilligan, 1990; Gonzalez & Birch, 2000; Gratz, Volpe, & Kind, 1993;
42
High, 1998; Hollowell & Duch, 1991; Hurlburt, 2001; Kao & Lehman, 1997; Koch &:
Gobell, 1999; Jones, 1999; Lane, 2002; Lane & Tang, 2000; Marcoulides, 1990;
McBride, 1996; Myers, 1989; Olson & Bozeman, 1988; Porter & Riley, 1996;
Raymondo & Garrett, 1998; Rosen, Feeney, & Petty, 1994; Sterling & Gray, 1991;
Varnhagen & Zumbo, 1990; Wang, 1999; Ware & Chastain, 1989). There are
different results and conclusions from these empirical studies.
Research Synthesis Methods
In the social and behavioral sciences, a single experiment or a single study
can rarely provide definitive answers to research questions. In fact, conducting a few
studies may not even resolve a minor issue. After the accumulation and refinement
of a set of studies, literature reviews of empirical research are important to
summarize and clarify the research findings (Cooper & Hedges, 1994; Glass, 1978;
Hunter & Schmidt, 1990; Wolf, 1986). Methods of combining results across studies
have existed since the early 1900s (Cooper & Hedges, 1994; Olkin, 1990). For
example, in 1904 "Pearson took the average of estimates from five separate samples
of the correlation between inoculation for typhoid fever and mortality" (Cooper &
Hedges, 1994, p. 5). Some other early work for combining estimates include papers
by Tippett (1931), Birge (1932), Fisher (1932), Pearson (1933), Cochran (1937),
and Yates and Cochran (1938).
Traditional Review Methods
Prior to the late 1960s, the primary studies on any specific education or
social science topics were still not common (Hunter & Schmidt, 1990).
Consequently, the traditional narrative review of the small number of studies was
satisfactory for synthesizing the results. They are usually described as "literary",
"qualitative", "nonquantitative", and "verbal" (Hunter & Schmidt, 1990, p. 468).
43
When there are few studies, the researcher uses the results of each study and
attempts to find explanations. If the number of studies is large, the studies will
never be comparable. The results will usually become "pedestrian reviewing where
verbal synopses of studies are strung out in dizzying lists" (Glass, 1976, p. 4). In
addition, the researcher may use unrepresentative studies to simplify the integration
by excluding some other studies which do not agree with the chosen ones. These
traditional narrative approaches have been criticized with the potential problems
which include (a) the researcher's subjective view to include the studies, (b)
differential weighting in the interpretation, (c) misleading interpretations, (d) failure
to examine characteristics as potential explanations for different or consistent
results across studies, and (d) failure to examine moderate variables (Wolf, 1986).
Another approach was the traditional "vote-counting" method. Light and
Smith (1971) were the first to propose a method for taking a vote of study results.
The researcher categorized the findings of the relationship between the independent
variable and the dependent variable of all relevant studies into three outcomes (i.e.,
positively significant, negatively significant, or no specific relationship in either
direction). The number of studies of each category was simply counted. The modal
category was used as the best estimate of the relationship between the independent
and dependent variables (Light & Smith, 1971).
Hedges and Olkin (1985) demonstrated the inadequacy of the traditional
vote-counting approach to detect treatment effects when the amount of primary
studies increases. Wang and Bushman (1999) summarized the problems of the
vote-counting approach. First, this approach does not incorporate sample size into
the vote. When sample size increases, the probability of obtaining a statistically
significant result increases. Second, this approach does not provide any effect size
estimate. Third, this approach has low power for the range of sample sizes. That is,
as an example in Hunter and Schmidt (1990) about the correlation of general
intelligence and proficiency in clerical work, the proficiency measures cannot be
44
obtained on all the applicants and performance can be measured only on those
hired. The restriction of range of sample lowers the effectiveness of the
vote-counting methods.
Statistically Correct Vote-Counting Methods
As described above, the traditional vote-counting method is statistically
inadequate (Hedges & Olkin, 1985). There are methods of integrating research
results based on vote-counting that are statistically correct (Hunter & Schmidt,
1990). Hedges & Olkin (1980) proposed procedures to solve the statistical problems.
Vote-counting procedures are used when studies do not provide enough information
to calculate an estimate of effect size, but do contain information about the
direction or statistical significance.
One type of vote-counting method uses only significance levels. Basically, the
researcher uses a count to decide the proportion of studies which report statistically
significant results and test the proportion against the proportion expected under the
null hypothesis (Hunter & Schmidt, 1990). Another type of vote-counting method
can yield estimates of effect sizes if the sample sizes are known for all studies. The
effect size can be estimated from the proportion of positive results or from the
proportion of positive significant results. The detailed procedures and formula can
be found in Bushman (1994), Hedges and Olkin (1980), and Wang and Bushman
(1999).
When synthesizing research studies, the researcher usually collects the studies
with needed information to calculate the effect sizes. However, it is not unusual to
find some studies without enough information. One method of dealing with this
problem is to omit these studies. Another method is to apply the vote counting
methods to estimate the effect sizes to avoid losing these studies in the count.
45
Meta-Analysis Methods
In 1952, Hans Eysenck argued that psychotherapy had no positive effects on
patients in chnical psychology (Eysenck, 1952) and started a strong debate. By the
mid-1970s, many studies of psychotherapy had produced positive, null, and negative
results. To assess Eysenck's argument, Smith and Glass (1977) integrated 375
psychotherapy studies by statistically standardizing and averaging treatmentcontrol differences. Glass (1976) coined the term "meta-analysis" to refer to "the
analysis of analyses" and "the statistical analysis of a large collection of analysis
results from individual studies for the purpose of integrating the finding" (p. 3). At
the same time when Glass was developing his meta-analysis method, several
applications of meta-analytical techniques also attracted the attention of the
contemporary social science research community to the importance of systematical
synthesis and evaluation across studies. Some included among these applications
were Schmidt and Hunter's (1977) validity generalization of employment tests,
Rosenthal and Rubin's (1978) integration of interpersonal expectancy effect, and
Glass and Smith's (1979) synthesis of the literature on class size and achievement.
The early meta-analytic research basically involved three types of procedures:
(a) summarizing relationships, (b) determining moderator variables, and (c)
establishing relationships by aggregate analysis (Rosenthal, 1991). The first type
estimated the average correlation or the combined p level associated with that
correlation for all the studies. The second procedure calculated a correlation
between some characteristic of the studies and an index of the effect size determined
in the primary studies. And, the third type of procedure correlated mean data
obtained from each study with other mean data or with other characteristics found
in each study. More recent work of meta-analysts have added to the variety of
approaches (Rosenthal, 1991).
An essential element of meta-analysis is the "effect size". Meta-analysis
46
represents each study's findings in the form of effect size. It is a common metric for
measuring "the degree to which the phenomenon is present in the population," or
"the degree to which the null hypothesis is false" (Cohen, 1988, pp. 9-10). The
purpose of using the effect size is to standardize the different findings in numerical
values which are interpretable in a consistent way across all the variables and
measures (Lipsey &: Wilson, 2001).
During these years, the importance of effect size has been increasingly
emphasized in reporting experimental results in publications (Thompson, 1994,
2001). For example, the American Psychological Association (APA) Task Force on
Statistical Inference emphasized "Always provide some effect-size estimate when
reporting a p value" and there are at least 19 journals requiring effect size reporting
(Wilkinson k APA Task Force on Statistical Inference, 1999, p. 599). The fifth
edition of Publication Manual of the American Psychological Association (2001)
also includes "failure to report effect sizes" (p. 5) as a kind of defect in the design
and reporting of research. The advocacy of reporting effect sizes results from the
deficiency of statistical hypothesis testing in interpreting the research results
(Thompson, 1999). For a long time, statistical significance hypothesis testing has
been criticized to be (a) overly dependent on sample size, (b) misinterpreting p as
the probability that the null hypothesis is false, (c) testing for assumption versus
testing for the research hypothesis, and (d) making some nonsensical comparisons
(Anderson, Burnham, & Thompson, 2000; Cohen 1994; Hunter & Schmidt, 1997).
Schmidt (1996) strongly advocated that we must "abandon the statistical
significance test" and we must teach "point estimates of effect sizes and confidence
intervals around these point estimates. For analysis of data from multiple studies,
the appropriate method is meta-analysis" (Schmidt, 1996, p. 116).
Different measures of effect size have been developed over several decades.
Cohen (1988) describes several dimensionless entities that result in specific
experimental effect size statistics. Kirk (1996), Rosenthal (1994), and Snyder and
47
Lawson (1993) provided useful and practical summaries of these measures. The
variety of effect size measures can be categorized into two broad families as: group
mean differences {d family) and association strength (r family) (Elmore & Rotou,
2001; Maxwell & Delaney, 1990; Rosenthal, 1994).
In 1969, Cohen proposed d, which is the difference between population means
divided by the average population standard deviation (Hedges & Olkin, 1985). In
1976, Glass proposed the metric A, which is defined as the mean difference between
the experimental group and the control group divided by the control group standard
deviation. Hedges (1981) presented another index of effect size g as the mean
difference between the experimental group and the control group divided by the
pooled standard deviation which is an approximately unbiased estimate of the
population standard deviation.
The Pearson product moment correlation r usually involves a finding that
deals with the strength of association between two variables. Rosenthal (1984)
presented r as the effect size index with the Binomial Effect Size Display (BESD)
and explained that BESD is a way to show the practical importance of the
correlation index. "The correlation is shown to be the simple difference in outcome
rates between the experimental and control groups in a standard table in which
column and row totals of which always add up to 100" (p. 242). The BESD can be
produced from any effect size r and compute the treatment condition success rate as
0.50 plus r/2 and the control condition success rate as 0.50 minus r/2. For example,
an r of .22 will obtain a treatment success rate of 0.50 + 0.22/2 = 0.61 and a control
success rate of 0.50 — 0.22/2 = 0.39.
There are other correlation indices, such as the correlation between two
dichotomous variables $, the correlation between one continuous variable and one
dichotomous variable rp^, and the correlation between two ranked variables p. In
addition, there are some squared effect size indices including, r^,
and rf'.
However, because directionality is lost when squaring indices of effect size and their
48
magnitudes are hard to interpret, researchers usually avoid using them in
meta-analysis (Rosenthal, 1994). Cohen (1988) also offered a variety of effect size
indices depending on the specific application. For example, the effect size q
represents the difference between correlation coefficients. The effect size g is
(population probability — 0.50). The effect size h is the difference between
proportions.
When Glass proposed meta-analysis methods in 1976, Hunter and Schmidt
were unaware of Glass's work and developed their meta-analysis methods in validity
generalization. They applied their methods on the empirical data set from personal
selection research in the field of industrial psychology (Schmidt & Hunter, 1977).
The meta-analysis methods of Hunter and Schmidt emphasize effect sizes as does
Glass. Effect sizes in their methods are usually expressed as correlations. Unlike
Glass's meta-analysis. Hunter and Schmidt corrected the mean effect size by
"testing the hypothesis that the variance of observed effect sizes is entirely due to
various statistical artifacts" (Hunter & Schmidt, 1990, p. 484). These artifacts
include (a) sampling error, (b) error of measurement in the dependent and
independent variables, (c) range restriction in the independent variable, (d)
instrument validity, and (e) computational, transcription, and typing errors. There
are many sources of errors that may decrease the obtained effect sizes.
With the development of meta-analytical methods, the integration of
research studies becomes objective, systematic, and scientific (Wolf, 1986). Through
appropriate use of statistical techniques, useful information can be obtained from
primary studies, and population parameters can be estimated by the objective and
accurate methods. Also, the relationship among the study characteristics can be
simultaneously examined. Researchers can explore possible moderator variables
when there is a weak or inconsistent relationship between the independent variable
and the dependent variable. That is, the interaction between the treatment and
studies can be effectively investigated. In addition, an analysis of outliers, which
49
may contribute to the heterogeneity of findings among studies, may allow
researchers to obtain more understanding of the topic of interest (Wolf, 1986).
Meta-analysis has received some criticism. Wolf (1986) summarized these
criticisms into four categories. The first criticism is related to the quality of the
studies. Poorly designed studies are generally included along with results from good
studies, which makes the results of meta-analysis hard to interpret. One way to
handle the problem is by coding the quality of the design of each study and
examining how the results differ for poor and good studies (Wolf, 1986).
The second criticism is the "apples and oranges" problem. Any synthesis of
results from multiple studies usually involves a combination of studies dissimilar in
some respects, such as measuring methods, variable definitions, populations, or
research design. The critics argue that it is not logical to draw conclusions by
combining studies that are operationized differently or measured with different
metrics. Hall, Tickle-Degnen, Rosenthal, and Mosteller (1994) argued that "some
degree of mixing apples and oranges must occur in the tidiest of studies. Even when
studies are intended to be direct replication, exact replication probably cannot
occur." (p. 20). However, researchers need to be sensitive to the degree of the
dissimilarity. Wolf (1986) suggested that this problem can be examined by coding
the characteristics in each study and statistically testing if these differences are
related to the results of the meta-analysis.
The third criticism is the "file drawer" problem. Studies published in the
behavioral and social sciences are likely to be a "biased sample" of the actual
studies that are conducted (Rosenthal, 1991). Usually, published research is biased
in favor of significant findings, and nonsignificant findings are proportionally
published less. Although meta-analysts make efforts to obtain comprehensive and
representative studies, the samples of empirical primary studies are still likely to be
biased. Wolf (1986) mentioned an approach to review results in books, dissertations,
and unpublished papers presented at professional conferences and compare them to
50
the results of published studies. Begg (1994) also suggested to attempt to track
down relevant unpublished studies on the topic by following up on published
abstracts and contacting the investigators in the fields. Cooper (1979) proposed to
calculate an estimate of the number of unpublished studies with insignificant
findings that can be used to reverse a significant result in a meta-analysis. The
number is known as the "Fail Safe Number" (Wolf, 1986). If the number is large,
the concern of publication bias may be reasonably reduced.
Finally, the fourth criticism is the problem of including multiple effect sizes
from the same experimental study in one meta-analysis. Hunter and Schmidt (1990)
pointed out that the method of accumulating multiple eff'ect sizes depends on the
nature of the research design of the study. In general, three kinds of replication are
considered: (a) fully replicated designs, (b) conceptual replication, and (c) analysis
of subgroups. First, a fully replicated design occurs when a study can be divided
into several parts that are conceptually equivalent but statistically independent. For
example, if data are collected at difi^erent sites, the outcomes from each site are
statistically independent and can be treated as different studies. Second, a design of
conceptual replication occurs "when more than one observation that is relevant to a
given relationship is made on each subject" (Hunter & Schmidt, 1990, p. 451). One
example is replicated measurement that uses multiple indicators to assess a variable.
In this design, every part can be calculated to have a different result, and these
results can be accumulated within the study. Or, the results can be combined to be
a single measure. And, third, the design of subgroups is with subgroups such as race
or gender in the study. Usually, the subgroup estimates can be used as independent
results. Hunter and Schmidt (1990) provided more discussion in the different
designs and appropriate methods for including the multiple effect sizes in
meta-analysis. Although there are some problems associated with meta-analysis
methods, these methods have greatly clarified the confusion and have answered the
questions in many topics.
51
Meta-Analyses on Computer-Assisted Instruction
Since meta-analysis methods have been developed, many researchers have
applied these methods to examine the effectiveness of various computer-related
instruction methods in some aspects of learning for various disciplines and different
levels of education. Kulik (1994) summarized findings of 12 meta-analyses on
computer-based instruction conducted from 1978 to 1991 ranging from elementary
to adult education. The range of the obtained average effect sizes was between 0.22
and 0.57 (see Table 2.2). Some general conclusions were made from the findings of
these meta-analyses: (a) students usually learn more when receiving computer-based
instruction; (b) students learn faster with computer-based instruction; (c) students
like the classes more when receiving assistance on computers; (d) students develop
more positive attitudes toward computers; and (e) computers do not, however, have
positive effects in every area in which they are used (Kulik, 1994).
In the last decade, there have been a number of meta-analyses to investigate
the effect of computer-based instruction on students in different levels and in
various areas. Some meta-analysis studies are listed in Table 2.3. The results of
these studies show that the range of these average effect sizes is between 0.127 and
0.51, which indicates that computer-assisted instruction has an overall moderate
effect on students' statistical learning. The results of some meta-analyses also reveal
that CAI has different effects for different subject areas for different levels of
students. For example, Christmann & Badgett (1997) compared the academic
achievement of secondary students who received traditional instruction
supplemented with CAI and those who received only traditional instruction across
eight subject areas. Although the combined effect size of the primary studies is
0.209, the comparative effectiveness of CAI was very different among the eight
subject areas: science, 0.639; reading, 0.262; music, 0.23; special education, 0.214;
social studies, 0.205; math, 0.179; vocational education, -0.08; and English, -0.42.
52
Among the literature on applying meta-analytical methods to investigate the
effectiveness of CAI, there have been a handful of studies in mathematics education.
However, the subject of statistical learning and teaching has rarely been examined.
Only one meta-analysis has been found to address the effect of CAI in statistics
education. Christmann and Badgett (1999) integrated nine studies to examine the
comparative effectiveness of using various microcomputer-based software packages
on statistical achievement. The nine primary studies were conducted from 1987 to
1997. The computer statistical software packages include Minitab, SPSS/PC,
MyStat, TruStat, expert mentoring system, HyperCard problem-solving program,
and statistical exercises. There were 14 effect sizes produced from the nine primary
studies. The average effect sizes of these studies ranged from -0.555 to 0.708. The
overall mean effect size was calculated to be 0.256. The meta-analysis concluded
that the typical student moved from the 50th percentile to the 60th percentile when
exposed to microcomputer-based software. This study also categorized the software
into three types: CAI, problem solving, and statistical software packages. The mean
effect size was 0.929 for the type of problem-solving, 0.651 for CAI, and 0.043 for
statistical software packages (MyStat, Minitab, and SPSS). In addition, the study
correlated the effect sizes with the progressive time span in years and found that
there was no significant correlation. The correlation was -0.052 between mean effect
size and years. This meta-analysis urged that more continuing studies are needed to
examine the effectiveness or lack of effectiveness of computers in college-level
statistics education (Christmann & Badgett, 1999)
53
Table 2.2 Findings of 12 Meta-Analyses on Computer-Based Instruction Published
between 1978 and 1991
Meta-Analysis
Instructional Level
No. of
Studies
Average
Effect Size
Bangert-Drowns,
Kulik, k Kulik (1985)
secondary
51
0.25
Burns & Bozeman (1981)
elementary and secondary
44
0.36
Cohen & Dacanay (1991)
health professions education
38
0.46
Fletcher (1990)
higher education & adult
28
0.50
Hartley (1978)
elementary & secondary math
33
0.41
Kulik & Kulik (1986)
college
119
0.29
Kulik, Kulik,
& Shwalb (1986)
adult education
30
0.38
Kulik, Kulik,
k. Bangert-Drowns (1985)
elementary
44
0.40
Niemiec & Walberg (1985)
elementary
48
0.37
Roblyer (1988)
elementary to adult education
82
0.31
Schmidt, Weinstein,
Niemiec, & Walberg (1985)
special education
18
0.57
Willett, Yamashita,
& Anderson (1983)
precollege science
11
0.22
Note. Adapted from Kulik, 1994, p. 12
54
Table 2.3 Findings of 12 Meta-Analyses on Computer-Based Instruction Published
between 1993 and 2000
Meta-Analysis
Instructional Level
Content
No. of
Studies
Average
ES
Bayraktar
(2000)
secondary
& college
science
education
42
0.273
Chadwick (1997)
secondary
mathematics
41
0.51
Christmann (1995)
secondary
mixed
24
0.233
Christmann &
Badgett (1997)
secondary
eight curricular
areas
42
0.209
Christmann &
Badgett (1999)
college
statistical
achievement
9
0.256
Christmann &
Badgett (2000)
college
18
0.127
Fletcher-Flinn &
Gravatt (1995)
elementary
to college
120
0.24
Khalili &
Shashaani (1994)
elementary
to college
mixed
36
0.38
Kuchler (1998)
secondary
mathematics
61
0.32
Liao (1998)
kindergarten
to college
hypermedia
35
0.48
Ouyang (1993)
elementary
mixed
79
0.495
Susman (1998)
elementary
to college
cooperative
learning
23
0.413
mixed
mixed
55
CHAPTER 3
METHOD
Cooper and Hedges (1994) divided the process of research synthesis into five
stages: (a) the problem formulation stage; (b) the collection stage: searching the
literature; (c) the data-evaluation stage: coding the literature; (d) the analysis and
interpretation stage; and (e) the public presentation stage. For this meta-analysis,
the purpose was to integrate the individual primary research studies concerning
using computers in assisting introductory statistics teaching at the college level.
The process of conducting this synthesis study includes the following five tasks: (a)
determining and specifying the sampling criteria to select the primary studies to be
included in and excluded from the meta-analysis, (b) identifying the characteristic
variables which might be related to the effect of the outcomes, (c) coding these
data, (d) calculating individual results from these primary studies and analyzing
these outcomes by the appropriate characteristics, and (e) interpreting and
reporting the results of the analysis.
This chapter presents the meta-analysis method in four sections. First, the
research questions provided in Chapter 1 are restated. Second, the sampling criteria
and procedure used in this meta-analysis study are described. Third, the study
characteristics which might influence the outcome effects are determined and
described. And, fourth, the procedures of the statistical analysis are presented.
Research Questions
This meta-analysis seeks to examine the following questions:
1. How effective is the use of CAI in enhancing the statistical learning of college
students as compared with non-computer instructional techniques?
56
2. Does the effectiveness of CAI differ by the publication year of the study?
3. Does the effectiveness of CAI differ by the source of the study (dissertation,
journal article, or ERIC document)?
4. Does the effectiveness of CAI differ by students' level of education
(undergraduate or graduate)?
5. Which modes of CAI techniques are the most effective for statistical
instruction for college students? For example, there are drill-and-practice,
tutorials, multimedia, simulations, computational statistical programs, expert
systems, and web-based programs.
6. Does the effectiveness of CAI differ by the software type (commercial or
teacher-made)?
7. Does the effectiveness of CAI differ by the level of interactivity of the program
(interactive-PC, interactive-mainframe, or batch-mainframe)?
8. Does the effectiveness of CAI differ by the role of the program (supplement or
substitute)?
9. Does the effectiveness of CAI differ by the sample size of the participants?
Sampling Criteria and Procedure
Two sampling criteria were applied for selecting an adequate sample of
primary studies in this meta-analysis. The first required that a complete and
representative sample of primary studies which have addressed the questions of
interest be located and selected. The second required that the participants and
treatment in the primary studies represent the participant population and
treatment populations of interest (Hedges, 1986).
57
Studies selected in this meta-analysis were those that have investigated the
use of computers in assisting college level students in introductory statistics
instruction between 1985 and 2002 in the United States. College level includes
2-year and 4-year colleges and universities and both undergraduate and graduate
students. These studies must have reported sufficient descriptive and inferential
statistical data to be included in the meta-analysis. For example, studies need to
provide means, standard deviation, variance, t test,or F test to allow the calculation
of effect sizes.
The main sources for searching these primary studies were published journals
and books, as well as unpublished dissertations and conference papers. The major
computer databases used consist of Dissertation Abstracts International, the
Educational Resources Information Center (ERIC), and Psychological Abstracts
(PsycINFO). Descriptive search phrases were used to identify related materials,
including the combination of "computer-assisted instruction", "computer-based
instruction", "computer-based learning", "computer-based education",
"computer-enhanced instruction", "statistics education", "statistics teaching", and
"statistics learning". Manual literature searches were also used to examine some
relevant journals, such as the American Statistician, Behavioral Research Methods,
Instruments, & Computers, College Teaching, Computers in the Schools,
Educational Researcher, Journal of Educational Computing Research, Journal of
Educational Research, Journal of Research on Computing Education, Teaching of
Psychology, and Journal of Research on Computing in Education. In addition, the
references in the primary studies and the relevant studies identified through
computer databases and manual searches were used as another source to make more
comprehensive searches.
58
Study Characteristics
One of the purposes of research synthesis is to integrate empirical research
for creating generalizations (Cooper & Hedges, 1994). Another important purpose is
to identify study characteristics that may be moderator variables that are
associated with effect magnitudes (Rosenthal, 1991). Inconsistent findings may
imply that some variables moderate the treatment effect. The selection of study
characteristics for inclusion in this study follows some guidelines (e.g., Hall,
Tickle-Degnen, Rosenthal, & Mosteller, 1994; Lipsey, 1994; Rosenthal, 1991) and
some studies regarding the use of computers to teach statistics (e.g., Christmann &:
Badgett, 1997; Langdon, 1989; Liao, 1998; Roblyer & Edwards, 2000; Schram,
1996). The review sheet to record the statistical data and study characteristics is
shown in Appendix A. The reasons for selecting these characteristics are provided in
the following sections.
Publication Year
The first study characteristic was publication year. The primary studies
selected for this meta-analysis are from 1985 to 2002. In 1980, the IBM
microcomputer was created and gradually became popular. The quality of the
computer programs for teaching statistics prior to 1980 was very poor (Langdon,
1989) and empirical research on investigating the effect of computer use to teach
statistics was rare. With rapid expansion of the microcomputer market, the
enhancements of software and the development of diverse applications,
programming languages and web applications, the quality of CAI in statistics
teaching may have changed from the time of the initial studies.
59
Publication Source
The second study characteristic was the source of the study. There are three
sources of the primary studies: published journal articles, ERIC documents, and
dissertations. Since the unpublished manuscripts or studies have not been obtained,
the file drawer problem is investigated to examine the possibility of selection bias
(Wolf, 1986). The file drawer problem will be discussed in a later section of this
dissertation.
Educational Level of Participants
The third study characteristic was the educational level of participants in the
primary studies. Introductory statistics courses are a common element of
curriculum for undergraduate and graduate students in colleges and universities..
The use of computers may have different effects on the two student types:
undergraduate and graduate.
Mode of CAI Program
The fourth study characteristic was the mode of computer-assisted
instruction. The major modes of computer-assisted instruction for statistics
instruction include drill and practice, tutorial, simulation/gaming, problem solving,
computation packages, hypermedia, and expert systems (Roblyer & Edwards, 2000).
The effectiveness of CAI may differ among these modes.
Type of CAI Program
The fifth study characteristic was the type of computer software used in
instruction. Over the years, there have been an increasing number of commercially
developed computer applications and programs designed to enhance student
60
learning of statistical concepts, to facilitate computational skills, to assist selecting
correct statistical analysis, and to present graphical statistical results. In addition,
there have been many computer programs specifically designed or developed by
instructors to meet different purposes. Consequently, software type was a promising
variable for this study to examine the effect.
Level Of Interactivity of CAI Program
The sixth study characteristic was the interaction between the CAI program
and the students. Steinberg (1991) indicated that interaction is an important feature
of CAI, and one of the main functions of interaction is to foster learning. In most
CAI programs, interactions consist of a sequence of question-response-feedback.
However, "interaction is not synonymous with learning" (Steinberg, 1991, p. 100).
While most computer programs operated on microcomputers are in an interactive
mode, programs on mainframe computers are in both interactive and batch modes.
To investigate if the effectiveness of CAI programs differ by the the level of
interactivityse of the program, the variable was included in three categories
(interactive PC, interactive mainframe, and batch mainframe).
Instructional Role of CAI Program
The seventh study characteristic was the role of the CAI program in teaching
statistics. There are two common roles of CAI. One is a supplement to traditional
instruction and the other is a substitute for traditional instruction. This variable
was included in this study.
Sample Size of Participants
The eighth study characteristic was the total sample size. Cohen (1988)
indicated that the sample size of the treatment group influences the reliability of
61
statistical tests. Therefore, this variable was used to determine if the effect of CAI
differs by the sample size.
Dependent Variable
The dependent variable investigated by this meta-analysis was students'
achievement in statistics. The outcomes include statistical concepts in diverse
topics, computational skills, problem-solving skills, and programming skills.
Statistical Analysis
Various meta-analytic methods, as presented by Cooper and Hedges (1994),
Glass, McGaw, and Smith (1981), Hedges and Olkin (1985), Hunter and Schmidt
(1990), Rosenthal (1991), and Wolf (1986), have been applied to integrate the
primary studies with consideration of specific statistical data reported in individual
studies. Different statistics or metrics (e.g.. Glass's A, Hedges' g and d, Cohen's d,
q, g, and h, and correlation r) have been used to calculate the effect size for each
primary study and for each result within each study. For this meta-analytical study,
the purpose was to examine the effect of CAf in college-level statistics instruction
across studies. Hedges and Olkin's (1985) methods provide an appropriate
framework and meaning. In this study, all statistics are converted to Hedges' d. This
effect size index allows understandable comparisons of measures of the effectiveness
of different treatments among primary studies for answering the research questions.
Conceptualization of Effect Size
Hedges (1986) distinguished three fundamentally different means of
onceptualizing effect size. One is that effect size is an index of overlap between
distributions, which was emphasized by Glass (1976) and Glass, McGaw, and Smith
62
0.85
therapy
group
control
group
80th percentile of
control group
Figure 3.1 Graphical representation of effect size
(1981). This conceptualization of effect size is illustrated in Figure 3.1, which has
been adapted from Glass, McGaw, and Smith (1981, p. 29).
This figure can be explained as representing an effect size of .85 for a
treatment that improves the effect of the therapy group by .85 standard deviation
compared to the control group. That is, the treatment increases the mean of
therapy group to the 80th percentile of the control group. Hedges (1986)
emphasized that this conceptualization of effect size is important because "overlap
between distributions is a concept that has the same interpretation regardless of
whether the distributions are the distributions of measures of the same (or similar)
construct" (p. 366). The conceptualization is appropriate when combining effect
sizes from a broad range of outcome constructs.
The second way of conceptualizing effect size is as a scale-free index of
treatment effect, which means that "its value does not change under linear
transformations of the original observation" (Hedges, 1986, p. 367). The scale-free
characteristic of effect size is important because it does not depend on the particular
outcome measure used. Therefore, effect size is a way of placing treatment effect
from different studies on the same scale. Hedges (1986) also emphasized that the
63
interpretation of effect size depends on the outcomes of different studies which
measure the same construct. Effect size analyses can also be viewed as an analogue
to pooling raw data from all k studies into a single analysis of two treatment by k
studies (Hedges, 1986).
The third way of conceptualizing effect size is as one of many equivalent
methods to express the magnitude of a relationship between variables (Hedges,
1986). Cohen (1988) presented many different effect size indices for various
conditions. Rosenthal (1984, 1991) developed transformations to convert effect sizes
to correlation coefficients. Rosenthal (1991) also demonstrated the Binomial Effect
Size Display (BESD), a contingency table to illustrate the magnitude of the
relationship for a given effect size. Expressed in BESD, a treatment effect which
looks small in an effect size or a correlation coefficient often appears to be much
larger.
Effect Size and Statistical Significance
In assessing the relationship between two variables, two parts must be
included. One is the estimate of the magnitude of the effect (the effect size). The
other is the indication of the reliability or accuracy of the estimate of effect size,
which is usually used with the result of the significance test of the difference
between the observed and expected effect sizes for the null hypothesis involving the
two variables (Rosenthal 1991). The general relationship between the effect size and
the test of significance can be simply expressed as:
Test of significance = size of effect x size of study
Rosenthal (1991) illustrated this relationship with some examples for independent
and for correlated observations.
64
Definition and Calculation of Effect Size
Since an effect size refers to the strength of a relationship between the
treatment and the outcome variable, the effect size estimate is often expressed as
the magnitude of the difference between two group means in standardized terms as
i=
a
(3.1)
I
where ( e and //c are the population means for experimental and control groups,
respectively, and a is the population standard deviation.
In 1969, Cohen proposed index d as an estimator of 5. Cohen's d is
mathematically expressed as (Rosenthal, 1994, p. 237)
d=
(3 2)
(Spooled
where Me and Mc are the means of the experimental group and control group,
respectively, and apooied is the population standard deviation estimator, which is the
pooled sums of squares divided by
N,
the sum of sample sizes for the experimental
and control groups in the study.
In 1976, Glass proposed the metric A which is defined as the mean difference
between the experimental group Me and the control group Mc divided by the
control group standard deviation Sc- This effect size index is mathematically
expressed (Rosenthal, 1994, p. 237) as
A=
Mp — Mr
(3.3)
When more than one experimental groups are compared to a common control group
or when the standard deviations of the experimental and control group populations
are surely different, Glass's A is appropriate (Wang & Bushman, 1999).
Hedges (1981) presented another index of effect size g as the mean difference
between the experimental group and the control group divided by the pooled
65
standard deviation Spooled which is an approximately unbiased estimate of the
population standard deviation. Hedges and Olkin (1985) indicated that it often
assumes that the population variances do not differ for the experimental and control
groups. Under the assumption of equal variance, a more precise estimator of 5 can
be obtained by pooling the variances of the experimental and control groups. Thus,
Hedges's g is defined by (Rosenthal, 1994, p. 237)
9 = -4
— Mn
2
(3.4)
Spooled
with the pooled sample standard deviation (Hedges & Olkin, 1985, p. 79)
A u e - l ) S l + (nc - l)^c
"
nE + nc-2
/o
where ue and nc are the sample sizes of the experimental and control groups, and
Se and Sc are the standard deviations of the experimental and control groups,
respectively. Under the equal variance assumption, Hedges's g or Cohen's d provides
more precise estimators over Glass's A. Hedges's g is generally preferred because of
a smaller sample variance than Cohen's d. However, using a pooled standard
deviation of the experimental and control group with heterogeneity of variance will
lead to biased estimates of the effect sizes (Gleser & Olkin, 1994).
Hedges's g has been shown to be an overestimate of the population effect size
for small sample sizes, particularly samples with n less than 20 (Hedges, 1981). To
correct the small sample bias. Hedges (1982) proposed a new estimator d to correct
the biased 5' by a correction factor Cm- The unbiased estimator d is given by
Me - Mc
J _
d — Cjfng —
.
(3-6)
Spooled
An exact and an approximate expression of the correction factors
bias are provided as
—
2
V
2
Am — 1
for the sample
66
where m = ue + nc — 2, and
r(.) is
the gamma function.
In this meta-analysis study, Hedges's d , as in Equation 3.6, is the basic effect
size index used in the statistical analysis. A positive (negative) effect size indicates
that the experimental group mean is greater than (less than) the control group
mean.
When a parameter is estimated, the distribution of the estimator is
important. T h e d i s t r i b u t i o n o f H e d g e s ' d i s a p p r o x i m a t e l y n o r m a l w i t h i t s m e a n ( 6 )
and variance (a^). The estimated
(T i d )
is given as (Hedges & Olkin, 1985, p. 86)
_''^E + nc ^
—
d'^
h
uetic
r
2[nE + Tic)
(3-8)
when the sample sizes of both experimental and control groups are large. A
100(1 — a)% confidence interval for the population parameter 6 is given by (Hedges
& Olkin, 1985, p. 86)
d - Za/2^ < S < d + Zai2^,
(3.9)
where Za/2 is the two-tailed critical value of the standard normal distribution.
Although calculating effect size is straightforward, some studies did not
report m e a n s a n d s t a n d a r d d e v i a t i o n s . T h e m o s t c o m m o n a l t e r n a t i v e s t a t i s t i c s a r e t
or F, along with the experimental and control group sample sizes. These inferential
statistics can be converted into Hedges' g through Equation 3.10 (Rosenthal, 1994,
p. 238)
/ n ^ ^ ^ ^ /n^ + nc
V
nEUc
^
V
So far, the effect size estimates discussed here index the magnitude by the
standardized difference between independent experimental and control groups.
There are situations in which only a single group is in the experiment, i.e., a
repeated-measures design in which each observation receives pre- and
67
post-treatments assessments. Rosenthal (1991, 1994) presented the test of
significance t for correlated observations as
D
rt — — X y/n
/
X
(3.11)
where ^ is analogous to Hedges' g. Cruz and Sabers (1996) responded to Ritter
and Low (1996) and argued that effect sizes from between-groups and
repeated-measures designs should not be combined without appropriate adjustment
because the effect sizes are inherently different for the two designs. Rosenthal (1991)
listed examples of the relationship between tests of significance and effect size
estimates for independent and for correlated observations.
In addition to calculating a Hedges's effect size d for each primary study, a
Hedges's weighted effect size dyj (Hedges & Olkin, 1985, p. 302) was also calculated
for each study by
dw = wd
(3.12)
where effect size estimate is weighted by the inverse of its variance
w = ^.
(7^
(3.13)
The reason for calculating weighted effect size is because each study has different
sample sizes, and experiments with larger sample sizes produce more precise
estimates of the population effect size. When combining the effect size from
individual primary studies, the effect size estimates of studies with large sample
sizes should be given more weight (Hedges & Olkin, 1985).
Combination of Effect Sizes
There are some methods used to combine the independent effects to estimate
the population effect size. One method is simply to calculate an arithmetic mean
(Glass, McGaw, & Smith, 1981). However, in a fixed-effects meta-analysis, a
68
preferred strategy is to use unbiased weighted estimators for the population effect
size. The pooled estimator of the population effect size across the k studies is
mathematically represented by (Hedges &: Olkin, 1985, p. Ill)
k
1
k
= —k
P.14)
with each effect size estimate di is weighted by, Wi, the reciprocal of its
corresponding variance a^{di). The variance of
k
k
i=l
i=l
£-)
^
is (Hedges & Olkin, 1985, p. 113)
^
(5s«)
and the 100(1 — a)% confidence interval is
d + - ^a/20-(C?+) < ^ < d + + Z„/2Cr(d+),
(3.16)
where Za/2 is the two-side critical value from the standard normal distribution.
In meta-analyses, effect size estimates from the primary studies are expected
to be representative of a normally distributed population. The data should be
examined before performing other statistical analyses. When the results differ
greatly, it may not be appropriate to combine the results into a single effect size
estimate. One way to investigate if the distribution of effect size estimates are
approximately normal is to use a normal quantile plot (Wang & Bushman, 1999).
The quantiles of an observed distribution are plotted versus the quantiles of the
standard normal distribution. The points on the plot will be close to the line
X = Y if the observed data have a standard normal distribution. If the data are not
normally distributed, the data might be from different populations (Wang &
Bushman, 1999).
69
Hedges and Olkin (1985) proposed a homogeneity test for testing whether all
studies can reasonably be described as sharing a common effect size before
combining the estimates of effect size from individual primary studies. For a series
of k primary studies, a statistical test is for the homogeneity of effect sizes is a test
of the hypothesis
(3.17)
Hq — Si — 62 — • • • — Sk
versus the alternative hypothesis that at lease one of the effect sizes differs from the
others. The test is based on the statistic
(3.18)
where
is the weighted estimator of effect size based on using the sample estimate
of S and is given by Equation 3.14 (Hedges & Olkin, 1985, p. Ill) and a'^{di) is
given in Equation 3.8. The statistic Q is "the sum of squares of the di about the
weighted mean c?+, where the ith square is weighted by the reciprocal of the
estimated variance of d" (Hedges & Olkin, 1985, p. 123). The Q statistic is
compared with the percentage values of the
distribution with k — 1 degrees of
freedom. If the Q statistic exceeds the critical value of the
distribution at
a = .05, then the hypothesis that the studies share a common effect size is rejected.
ANOVA Approach to Test the Moderating Effects of Categorical Study
Characteristics
When the studies do not share a common effect size, a test for
treatment-by-study interactixon needs to be investigated. Sometimes it is also
helpful to categorize the effect sizes into groups according to similarity. Then, one
can test the moderating effect of one categorical study characteristic that had been
overlooked previously. Hedges and Olkin (1985) proposed partitioning the Q
statistic into two independent homogeneity statistics Qb and Qw and then
70
conducting an analogous ANOVA comparison. The
Qb represents the
between-group homogeneity statistic and the Qw represents the within-group
homogeneity statistic.
Assume that the studies are sorted into p groups and there are mi, m2, •••,
rup studies in the p groups. The
Qb statistic is used to test the null hypothesis that
the average effect size does not differ across groups,
Ho '• ^1+ = ^2+ = • • • = 5p_|_.
(3.19)
The Qb is essentially a weighted sum of squares of weighted group mean effect size
estimates about the overall weighted mean effect size and is mathematically
expressed by (Hedges & Olkin, 1985, p. 154)
where
^ ^
1
(3.21)
i=l
the weight Wij = 1/a (dij),
(3.22)
di+ = H
12'
i=i
Wij
is the weighted mean of the effect size estimates in the ith group, and
P
mi
(3-23)
Wi+
i=l j=l
1=1
is the grand weighted mean. The distribution of the
distribution with {p — 1) degrees of freedom. If
Qb statistic approximates a
Qb exceeds the 100(1 — «)% critical
71
value of the
distribution with p — 1 degrees of freedom, the hypothesis that the
mean group effect sizes from the p groups are equal is rejected.
The Qw/ statistic is used to test the hypothesis that effect sizes are
homogeneous within groups of the studies,
^11 — • • •
- \
^pl
— '• • —
—
\
^prup
<^1+
;
,
(3.24)
— ^p+
The sum of the homogeneity statistics is calculated for each of the p groups and is
mathematically expressed as (Hedges & Olkin, 1985, p. 155)
=
'I
i=i j=i
The Qw has an approximate
Qw is greater than the critical
^
=
- d,+f.
(3.25)
i=i j=i
distribution with Y^i=i
degrees of freedom. If
value, the null hypothesis that effect sizes within
each of the p groups are equal is rejected.
Hedges and Olkin (1985) presented a graphical method to plot the effect size
estimates with the correspondent confidence intervals on a set of horizontal lines for
identifying deviant effect sizes. This plot provides a simple visual representation of
the individual effect size and the amount of variation for each study. A useful
technique is to sort the studies into groups based on the proposed study
characteristics and to rank order studies by the effect size estimate. The plots also
display if the confidence interval includes the value zero and how some effect sizes
deviate from the others (Hedges & Olkin, 1985). This type of plot is frequently
called a forest plot or tree plot in medical research (Egger, Smith, &: Altma, 2001;
Lewis & Clarke, 2001)
72
Comparisons Among Groups
The results obtained from the between groups homogeneity test reveal if the
mean effects are equal among the groups. However, when there are more than two
groups to be compared, the
Qb
statistic gives no insight about which groups are
associated with the largest effect size. If priori knowledge or significant value of Qw
leads to the conclusion that the effect sizes are not the same among groups, the
methods of contrast or comparisons can be used to explore the difference among
group means. Such comparisons are analogous to contrasts in ANOVA. (Hedges &
Olkin, 1985).
A contrast 7 is defined as linear combination of population mean effect (5j_|_
where the coefficient Cj sum to zero:
p
7 = ^Ci(5i+
(3.26)
i=l
where
— 0- The contrast coefficients are chosen to reflect a comparison of
interest. The contrast 7 is estimated by a linear combination of sample effect size
means,
p
7 = J^Ci(ii+
(3.27)
i=l
where
is the weighted average effect size for the ith group. The estimated
variance is
and the 100(1 — a)% confidence interval is
7 - Zaiio{f) < 7 < 7 + Z„/20-(7),
(3.29)
where Za.12 is the two-side critical value from the standard normal distribution
(Hedges & Olkin, 1985, p. 159).
73
Regression Approach to Test Moderating Effects of Continuous Study
Characteristics
Regression approach can be used to model the relation between continuous
study characteristics and estimates of effect size for testing the moderating effect
(Hedges, 1994; Wang & Bushman, 1999). Suppose that di, - • • ,dk are k estimates of
effect size with estimated variances cr^((ii), • • • , a'^{dk). The regression model can be
stated as follows:
p
Si = Po + y ^ Pj^ij +
(3.30)
i=i
where i = 1, • • • , k, 5^ is the zth population effect size, Xn, • • • , Xip are p study
characteristics, /3o,/5i, • • • , Pp are regression parameters, and Si are independent
random error term of normal distribution A''(0, af). The weighted least squares
estimates of regression parameters /5o, /?i, • • • , f3p are bo,bi,... ,bp, where the weight
for the effect-size estimate di is defined as the reciprocal of the variance as in
Equation 3.13. The 100(1 — q;)% confidence interval for the regression coefficient Pj
is given by
b, -
< p, < bj +
(3.31)
where a(bj) is the standard error of bj, M S E is the error mean square for the
regression model, and Za/2 is the two-side critical value from the standard normal
distribution (Hedges, 1994; Wang & Bushman, 1999).
File Drawer Problem
One source of publication bias is the "file drawer problem" which affects the
results of meta-analysis. The problem means that studies with statistically
significant results are more likely to be published than those with nonsignificant
results (Begg, 1994; Rosenthal, 1991). An extreme case of this problem would be if
the publications are filled with the five percent of the studies that show Type I
74
errors and the file drawers of the researchers are filled with 95 percent of the studies
that show nonsignificant results for situations in which no population effect exists
(Rosenthal, 1991). The primary studies of meta-analysis are generally more likely to
be retrieved from published than unpublished materials.
One way to detect possible publication bias and to investigate if all the
studies come from a single population is to use a "funnel graph", a plot of sample
size versus the effect sizes from the individual primary studies (Begg, 1994). With
effect size graphed on the horizontal axis and the sample size graphed on the
vertical axis, the plot should be shaped like a symmetrical funnel with the spout
pointing up if there is no selection bias. If the plot is skewed or shows holes in its
center, the selection bias is suspected. The funnel plot is based on the statistical
principle that sampling error decreases as sample size increases (Wang & Bushman,
1999). However, funnel plots have limited use when the number of studies included
in a meta-analysis is very small; it is very difficult to determine if the plot is shaped
like a funnel.
One technique for handling the file drawer problem is to calculate a "Fail
Safe" number, which estimates the number of unpublished primary studies with null
results on tests of the significance that would be needed to raise the overall
probability of a Type I error rate to any desired level of significance (Hedges &
Olkin, 1985; Wolf, 1986). This study computes the "Fail Safe" number which was
proposed by Orwin (1983) and is expressed in Equation 3.32 to examine if the
number of unpublished studies threatens the overall results of this meta-analysis.
N„ =
(3,32)
where N is the number of the studies in the meta-analysis, d is the average effect
size for the studies synthesized, and dc is the criterion value selected that d would
equal when some knowable number of hypothetical studies {Nfs) were added to the
75
meta-analysis. (Wolf, 1986, p. 39). Cohen (1988) suggests d — 0.2 (small effect),
d = 0.5 (medium effect), and d = 0.8 (large effect). For this study, the small effect
size d = 0.2 is used as the criterion value to compute the "Fail Safe" number.
76
CHAPTER 4
ANALYSIS AND RESULTS
This chapter presents the analysis and the results for this meta-analysis. Five
sections are included. The first section describes the process and selection of the
primary studies. The second section describes the process of reviewing and coding
the primary data. The third section examines the possibility of selection bias. The
fourth section presents the overall effect size estimator, the methods to handle the
dependency of effect sizes, and computation of the fail safe number for the file
drawer problem. Finally, the fifth section presents the findings of analyzing the
study characteristics for answering the research questions.
Primary Study Selection
Through intensive and exhaustive search of electronic databases ERIC and
PsylNFO as well as a review of manual literature and study references, 25
experimental studies conducted between 1985 and 2002 regarding the use of
computer-assisted instruction (CAI) to teach college-level introductory statistics
courses in the United States met the criteria and were selected for this
meta-analysis. These 25 studies with primary data are listed in Table B.l in
Appendix B.
In addition, there were five experimental studies that did not provide
complete quantitative information for meta-analysis but concluded the effect of the
use of CAI in teaching statistics to be significant or non-significant. These studies
were Lane and Tang (2000), Mausner et al. (1983), Stephenson (1990), Stockburger
(1982), and Varnhagen and Zumbo (1990). Three of these five studies reported
77
significant differences between the computer group and the group using traditional
teaching method. The studies are briefly described in the following section.
Lane and Tang (2000) examined the effectiveness of a simulation program
training on transfer of statistical concepts. This study compared computer and
traditional groups and concluded that the difference was significant with
F(l, 107) = 5.84, p = 0.017 without reporting the subject numbers of the two
groups. Mausner et al. (1983) developed a simulation computer program based on
the DELTA project at the University of Delaware. This program contained 16
instructional units of descriptive and inferential statistics and was run on the
mainframe. This study compared the test results from students in the
computer-based group with workbook-based group and concluded that the
difference was significant with t = 9.8 with d/ = 46. The sample sizes of the two
groups were not reported. Stephenson (1990) conducted an experimental study to
investigate student reaction to the use of Minitab on the mainframe in an
introductory statistics course. There were 22 students in the control group and 23
students in the computer group. The study concluded that the difference between
exam scores for students in the computer group and the control group was not
statistically significant without reporting the group means or significance test
results. Stockburger (1982) evaluated three simulation exercises in an introductory
statistics course. The study concluded that the difference between the treatment
group and the control group was significant for two of the simulation exercises with
F(l,47) = 11.692, and F(l,46) = 21.254, respectively. However, no sample size of
the two groups was reported. Varnhagen and Zumbo (1990) evaluated two CAI
programs run on PLATO systems and their relationship with attitudes and
performance in statistics instruction. The study investigated the relationship by
path analysis. There were one control group with 49 students and two experimental
groups with 41 and 44 students, respectively. The study measured the performance
and mentioned that there were no significant differences among the groups without
78
reporting detailed statistics. Since these studies did not provide adequate data and
did not meet the criteria, this meta-analysis did not include them in this study.
Reviewing and Coding the Primary Data
After the 25 primary studies were selected for this meta-analysis, a review
sheet was prepared in Appendix A. Detailed information in each study was recorded
in the review sheet. The process of examining the correctness of the data was
repeated three times. The primary data and the study characteristics were then
typed and coded as a text data file. SAS was used to analyze the data and plot the
graphs.
Examination for Selection Bias
This meta-analysis selected the studies that met the criteria and provided
adequate quantitative data for statistical analysis. Since there were only 25 primary
studies included in this study, the estimates from these studies for the population
efi^ect size based may be biased. In order to examine possible selection bias, a funnel
plot was employed in Figure 4.1.
The funnel plot is a scatterplot where the sample sizes of the primary studies
are graphed against their respective effect size estimates (Wang & Bushman, 1999).
The funnel plot is based on the principle that sampling error decreases as sample
size increases. The standard deviations of the effect sizes obtained from these
primary studies are listed according to the sample sizes of the studies in Table B.4
in Appendix B. This table shows that as the sample size increases the standard
deviation decreases. If the sampled studies come from a single population, the plot
would look like a funnel with the width of the funnel becoming smaller as the
sample size increases. The funnel plot in Figure 4.1 shows more data on the bottom
part because there are three studies with extremely large sample sizes. There are
79
1200
1100
1000
900
800
<s>
to
•2
700
0)
1
eS 600
B
o
6«>»
500
400
300
200
100
0
-
1
0
1
2
Effect Size Estimate
Figure 4.1 Funnel plot.
also more studies with positive effect sizes than the ones with negative effect sizes.
The center of the funnel falls on the value greater than zero. In addition, the effect
sizes of the three large-sample studies are 0.49, 0.49, and 0.36, respectively. They
are close to a value of the population estimator 0.43 (which is presented in the
following section). In this funnel plot, there is a small bite on the left side of the
bottom which shows that fewer small-sample studies with negative effects were
available for this meta-analysis. The funnel plot suggests that a slight selection bias
80
may exist in the sample of primary studies.
Estimate of Overall Effect Size
The overall corrected unbiased effect size d for the estimation of a population
effect size
S
is 0.43 by combining the 31 effect sizes obtained from the 25 primary
studies. This result indicates that the use of computer to assist teaching statistics
for college students increases the mean of the experimental group to the 67th
percentile of the control group. The lower bound of the 95% confidence interval is
0.37 and the upper bound is 0.49. The confidence interval does not include the
value zero, indicating that the overall effect size estimate is significant different zero.
The effect size 0.43 reaches Cohen's (1988) criterion value for a medium effect size.
This estimate of the overall effect size of suggests that computer-assisted instruction
has a small to moderate positive effect on teaching statistics at the college level.
Prior to further statistical analysis, the effect size estimates should be
examined. The range of the effect sizes is from -0.267 to 1.77 with the median value
of 0.52. Their individual standard error and 95% confidence interval levels are
reported in Table B.4 in Appendix B and are displayed in the forest plots by groups
of study characteristics in Appendix C. Although the combined effect size indicates
a small to moderate positive effect for CAI in statistics education, 19 of the
confidence interval levels of the 31 effect sizes contain zero (see Table B.4 in
Appendix B). The histogram distribution of the effect sizes is shown in Figure 4.2,
which indicates an approximate normal shape.
A normal quantile plot can be also used to examine the normality of the
distribution of the effect sizes (Wang & Bushman, 1999). The normal quantile plot
compares an observed distribution against the quantiles of the standard normal
distribution. If the observed data have a normal distribution, the points on the plot
will be close to the line X = Y. The normal quantile plot is shown in Figure 4.3.
81
50 i
25-
20-
P
e
r
0
e
n
IS-
t
10-
6-
-0.6
-0.2
0.2
0.6
1.0
1.4
1.8
Effect Size
Normal(Mxi-0.S141 S%gma=0.4687)
Figure 4.2 Histogram of effect sizes.
The points on the plot show an approximate straight line, which indicates that the
data are distributed normally. In addition, statistical tests were performed to
examine the normality of the effect sizes. The Anderson-Darling normality test
{p — 0.35) and the Shapiro-Wilk test (p = 0.96) retain the hypothesis that the
distribution of the effect sizes is sufficiently normal.
Dependence of Effect Sizes
The problem of dependence of effect sizes occurs in four ways (Hedges, 1986):
first, there are multiple effect sizes calculated from different measures on the same
subjects; second, when there are several experimental groups to compare with one
control group in a study; third, when there are several different samples in the same
study used to calculate several effect size estimates; and fourth, when the same
82
2.0 •{
1.S-
1.0«>
+ ++
0.5-
0.0-
-3
0
-2
1
2
3
Normal (tuantilea
NormalLine:
Uu=0.S141, Sigma^O.4687
Figure 4.3 Normal quantile plot.
researchers or investigators conduct a series of studies and generate several related
effect sizes from these studies.
In this meta-analysis, for the first situation of dependency, six of 25 primary
studies calculated more than one effect size from different measures on the same
subjects. These studies are Gratz and Kind (1993), Koch and Gobell (1999), Myers
(1996), Wang (1999), Ware and Chastain (1989), and White (1986). Hedges (1986)
suggested taking the median of the multiple effect sizes to avoid correlation among
effect sizes and that was used to obtain one effect size from each of the above six
studies.
For the second situation of dependence concerning multiple treatment groups
compared with the same control group, three studies have that problem. Gonzalez
and Birch (2000) applied two types of CAI (Computation statistical program and
83
Tutorial) as two experimental groups to compare with one traditional group. Lane
and Aleksic (2002) applied a simulation program to three groups of students in
three consecutive semesters and compared the effect with the same control group.
And, Marcoulides (1990) implemented two types of programs (expert systems and
simulation) to two groups and compared the effect with the same control group.
Since there are only a few such correlated estimates, the dependence can be
cautiously ignored (Hedges, 1986).
The third type of dependence is regarding several different samples used in
the same study. In this meta-analysis, Athey (1987) and Dorn (1993) applied
computer programs to different groups of samples and generated more than one
effect size. Because there are only two such studies, the dependence can be
cautiously ignored. As to the fourth type of dependence, there are no studies
conducted by the same researcher selected in this meta-analysis.
Fail Safe Number
The file drawer problem is handled by calculating the fail safe number for
this meta-analysis. The fail safe number {Nfg) is a number used to estimate the
additional primary studies which result in a finding of no effect to be included in
this meta-analysis to overturn the result of this analysis. Using Equation 3.32, the
criterion value of 0.20 (Cohen's definition for a small effect size), and the calculated
population effect size estimator (5 = 0.43), the fail safe number is
25 X (0.43 — 0.2)/0.2 = 28.75; that is, at least 29 additional studies are needed to
decrease the overall effect size estimator to be 0.20 or less. Over the period from
1985 to 2002, there might possibly have been 29 unpublished studies with
non-significant results for using computers in teaching statistics. Therefore, the
results obtained from the small samples of studies by this meta-analysis need to be
interpreted with caution.
84
Primary Study Characteristics
The population effect size estimator is calculated from the 31 effect sizes of
the 25 primary studies to indicate the magnitude of the overall effect of using CAI
in statistics education in this meta-analysis. There are some study characteristics
that might have contributed or moderated the effect. The following sections
investigate eight study characteristics, which consist of the publication year, the
publication source, the educational level of participants, the mode of CAI program,
the type of CAI program, the level of interactivity of CAI program, the
instructional role of CAI program, and the total sample size.
Publication Year
Computer technology has had tremendous change and development over the
past twenty years (Wurster, 2001). Does the effectiveness of CAI in statistics
education differ by the publication year? Since this a continuous variable, a
weighted regression analysis was used to study the relationship with the estimates of
effect size. The scatterplot of the effect sizes versus publication year with a
regression line is presented in Figure 4.4. The regression weights for the intercept
and year are 13.86752 and -0.00672, respectively. The equation for the regression
line is in Equation 4.1, where
d = 13.86752 - 0.00672 x year.
(4.1)
The standard errors of intercept and publication year are 20.81896 and 0.01042,
respectively. The mean square error from ANOVA for the regression is 2.36297. The
100(1-q;)% confidence interval for the weight of publication year is given by
-0.00672 - 1.96 X
0.01042
\/2.36297
< /3 < -0.00672 + 1.96 X
0.01042
V2.36297
(4.2)
85
2-
I
1984
'
I
1986
'
I
1988
' —•
I
'
1990
I
1992
'
I
'
1994
I
1996
•
I
1998
•
I
2000
'
I
2002
'
r
2004
Publioation Year
Figure 4.4 Regression of effect sizes on publication year.
or
-0.02 < / 3 < 0.0066.
(4.3)
The confidence interval level includes the value zero, indicating that the weight of
publication year is not significantly different from zero. That is, the effect sizes do
not change as the publication year changes.
The Q statistic proposed by Hedges and Olkin (1985) was calculated to test
the homogeneity of the effect sizes of the primary studies. The Q statistic was
calculated using Equation 3.18 and was obtained as 69.511 with degrees of freedom
of 30. The Q statistic of 69.511 was compared with the critical value (43.773) of the
distribution with 30 degrees of freedom at o; = .05. The homogeneity of the
effect sizes was rejected. Hedges and Olkin (1985) proposed a method analogous to
ANOVA to decompose the total Q statistic as the Qb statistic for the between
86
Table 4.1 Statistics of Study Effect Sizes by Year
Group
1985-1989
1990-1994
1995-1999
2000-2002
N
7
9
8
7
(i_|_
0.55834
0.47382
0.24495
0.43366
a{d+)
0.12875
0.08217
0.10120
0.04022
lower
0.30599
0.31277
0.04660
0.35483
upper
0.81069
0.63486
0.44331
0.51249
groups and the Qw statistic for the within group.
For examining the between and within Q statistics of the effect sizes by the
publication year, the effect sizes of the studies were divided into four groups:
1985-1989, 1990-1994, 1995-1999, and 2000-2002. During the 1980s,
microcomputers become increasingly popular and were more frequently used in
teaching. In the early 1990s, Internet and multimedia appeared and were applied in
teaching. Since 1995, the World Wide Web has been widely used in various aspects
of teaching. With different stages and the development of computer technology and
tools, the effects of using computer in statistics teaching may have differed across
the four periods of time. Table 4.1 summarizes the group data from Figure C.l in
Appendix C presents the number of studies in each group, weighted means,
standard errors, and 95% confidence interval levels for the four groups. Figure C.l
presents the forest plots for the effect sizes of the studies grouped by the publication
year. For the two groups of 1985-1989 and 1990-1995, the weighted means of the
effect sizes of most studies are greater than zero. From the regression plot, one
stduy (McBride, 1996) seems to have an extremely large effect size compared to the
rest. In order to examine the influence of this study, a weighted regression was
analyzed by excluding this study. Since this study was a repeated measured design
with ten participants, the results of the weighted regression did not differ much and
had the same conclusion.
In order to test the difference among the groups, the Q statistics for the
87
Table 4.2 Q Statistics by Year
Source
Between Groups
Within Groups
1985-1989
1990-1994
1995-1999
2000-2002
Corrected Total
df
3
27
6
8
7
6
30
Q Stat
4.627
64.884
8.809
21.150
24.474
10.451
69.511
p-value
0.201
0.000
0.185
0.007
0.001
0.107
0.000
difference among groups and within groups are computed and presented in
Table 4.2. The result of Qb is p{Qb{^) — 4.544) = 0.208. The Qw is 64.884
{p < 0.001) which shows significant variation within some of the four groups. The
Qw statistics of the groups of 1990-1994 and 1995-1999 indicate the sources of
significant variation within the two groups.
Publication Source
In order to examine if there is publication bias and if the effect of using
computers differs according to the source of publication, the source of publication
was used as a variable for the analogous ANOVA. In general, journal articles are
likely to report significant results. Unpublished studies are more likely to include
non-significant studies. The primary studies in this meta-analysis were selected from
three sources: dissertations, journals, and ERIC documents. Table 4.3 shows that
there are 10 dissertations, 13 journal articles, and 8 ERIC documents. The means of
the effect sizes are 0.58, 0.43, and 0.41, respectively. The standard errors for the
three groups are 0.11, 0.07, and 0.04. Figure C.2 presents the forest plots for the
effect sizes of the studies grouped by the publication source.
In Table 4.4, the Q b is 2.27 { p — 0.321), indicating that the means of the
three groups of study sources do not differ. The Qw statistic for the dissertation
88
Table 4.3 Statistics of Study Effect Sizes by Source
Group
Dissertation
Journal
ERIC
N
10
13
8
0.58545
0.43055
0.40843
a{d+)
0.11091
0.07463
0.03883
lower
0.36807
0.28428
0.33233
upper
0.80283
0.57681
0.48453
Table 4.4 Q Statistics by Source
Source
Between Groups
Within Groups
Dissertation
Journal
ERIC
Corrected Total
df
2
28
9
12
7
30
Q stat
2.270
67.240
5.215
39.772
22.253
69.511
p-value
0.321
0.000
0.815
0.000
0.002
0.000
group is 5.21 { p = 0.815) which shows that the 10 dissertation studies are
homogeneous. The Qw statistic for the 13 journal articles is 39.772 {p < 0.001),
which shows that the effect sizes from the journal articles vary significantly. And,
the Qw statistic for the ERIC documents is 22.253 {p = 0.002), which shows that
the effect sizes of the studies from ERIC documents also have significant variation.
Educational Level of Participants
In colleges and universities, the introductory statistics courses are generally
offered to both undergraduate and graduate students. Among the studies selected
for this meta-analysis, there are 23 effect sizes from the studies in which the
computer programs were used for undergraduate students, five effects sizes from the
studies for graduate students, and three effect sizes for both undergraduate and
graduate students. Table 4.5 presents the weighted means of the three groups to be
0.43, 0.53, and 0.31. Figure C.3 presents the forest plots for the effect sizes of the
89
Table 4.5 Statistics of Study Effect Sizes by Educational Level
Group
Undergraduate
Graduate
Mixed
N
23
5
3
0.43078
0.53147
0.31102
a{d+)
0.03432
0.18133
0.14957
lower
0.36351
0.17607
0.01787
upper
0.49804
0.88686
0.60417
Table 4.6 Q Statistics by Educational Level
Source
Between Groups
Within Groups
Undergraduate
Graduate
Mixed
Corrected Total
df
2
28
22
4
2
30
Q stat
0.944
68.567
50.455
10.167
7.946
69.511
p-value
0.624
0.000
0.001
0.038
0.019
0.000
studies by the three groups.
In Table 4.6, the Qb is 0.944 (p = 0.624), which reveals that the magnitudes
of the estimates of effect size of undergraduate, graduate, and combination of
students do not differ. The within group heterogeneity test with the Qw=68.567
statistic indicates that the variation of the effect sizes is significant within the three
groups of primary studies. The Qw statistic is 50.455 {p = 0.001) for the
undergraduate group, is 10.167 (p = 0.038) for the undergraduate group, and is
7.946 {p = 0.019) for the combination group, respectively. The individual
heterogeneity test results also show the relatively heterogeneity effects for each
group.
Mode of CAI Program
This variable examines the different modes in which CAI is used to teach
statistics. Seven specific modes are drill-and-practice, tutorials, computational
90
Table 4.7 Statistics of Study Effect Sizes by Mode
Group
Drill
Tutorial
Computation
Simulation
Multimedia
Web-based
Expert systems
N
1
2
12
5
3
4
4
0.92296
0.68748
0.11950
0.48417
0.73997
0.23294
0.98945
lower
a{d+)
0.33698 0.26250
0.20353 0.28857
0.07560 -0.02867
0.04166 0.40251
0.20537 0.33744
0.11092 0.01555
0.16867 0.65886
upper
1.58342
1.08640
0.26767
0.56582
1.14250
0.45033
1.32005
programs, simulations, multimedia. Web-based programs, and expert systems.
Table 4.7 indicates that there is only one effect size in the drill-and-practice group,
two effect sizes in the tutorial groups, three in the multimedia group, four in
Web-based program group, five in simulation group and 12 in computational
program group. The weighted means of the effect sizes from the seven groups are
0.99 for expert systems, 0.92 for drill-and-practice, 0.74 for multimedia, 0.69 for
tutorials, 0.48 for simulations, 0.23 for Web-based programs, and 0.12 for
computational programs. Figure C.4 presents the forest plots for the effect sizes of
the studies by the seven groups. Only the confidence level of the group of
computational program includes zero, which indicates the computational programs
have no significant effect in teaching statistics.
In Table 4.8, the Q b is 38.73 { p < 0.001), indicating that the means of the
seven groups of effect sizes differ significantly. The Qw is 30.778 {p = 0.16) which
indicates that the variation of the effect sizes within the seven groups is
homogeneous. The Qw statistic for the drill-and-practice group is 0 because there is
only one study and the effect size does not have variation. The Qw for the tutorial
group is 1.269 {p = 0.26) which also shows that the effect sizes within this group do
not differ significantly. The Qw statistic for the computational program group is
17.169 {p = 0.103) which also indicates that the effect sizes within the group do not
91
Table 4.8 Q Statistics by Mode
Source
Between Groups
Within Groups
Drill
Tutorial
Computation
Simulation
Multimedia
Web-based
Expert systems
Corrected Total
df Q Stat
6 38.733
24 30.773
0
0.000
1
1.269
11 17.169
4
3.652
2
1.250
3
5.456
3
1.982
30 69.511
value
0.000
0.160
0.260
0.103
0.455
0.535
0.141
0.576
0.000
have significant variation. The remaining four groups have similar result as present
in Table 4.8.
Type of CAI Program
The two types of computer programs for teaching statistics are usually
developed by commercial professionals or by teachers with capable computer
knowledge and skills. In this meta-analysis, 12 effect sizes were obtained from the
studies that applied commercial-developed statistical packages. For example,
Christmann and Badgett (1997) used MYSTAT, Gilligan (1990) used Minitab,and
Gratz, Volpe, and Kind (1993), High (1998), Rosen, Feeney, and Linda (1994), and
Wang (1999) used SPSS. However, some teachers or researchers were interested in
developing statistical programs that focus on specific topics to address different
purposes and needs. For example, Aberson, Berger, Healy, Kyle, and Romero (2000)
developed a Web-based interactive tutorial (WISE) to teach many topics in
introductory statistics courses. Athey (1987) and Olsen (1988), and Marcoulides
(1990) developed expert systems to teach statistics.
The results of the effect size for the two types of programs are listed in
92
Table 4.9 Statistics of Study Effect Sizes by Type
Group
Commercial
Teacher-made
N
12
19
d+
0.14279
0.49745
a{d+)
0.074495
0.036662
lower
-0.00321
0.42559
upper
0.28880
0.56930
Table 4.10 Q Statistics by Type
Source
Between Groups
Within Groups
Commercial
Teacher-made
Corrected Total
df Q stat
1 18.246
29 51.265
11 18.987
18 32.278
30 69.511
p-value
0.000
0.007
0.061
0.020
0.000
Table 4.9. The weighted mean of the commercial group is 0.143 with 95% confidence
level includes zero, reflecting that the measure of simply using commercial statistical
program may not have significant effect in teaching statistics. Instead, in the group
of teacher-made program, which the weighted mean is 0.497 and the 95% confidence
level does not include zero, indicates a positive effect. Figure C.3 presents the forest
plots for the effect sizes of the studies by the two groups.
In Table 4.10, the Q b is 18.25 { p < 0.001) which indicates that the means of
the effect sizes differ significantly between the two groups. The within groups
homogeneity test with Qw is 51.265 {p = 0.007) is also significant. The Qw for the
teacher-made group is 32.278 (p — 0.02), indicating that the means of the effect
sizes within this group are significantly different. The Qw for the commercial
program group is 18.987 {p = 0.061), showing the means of the effect sizes do not
differ significantly within this group.
93
Level Of Interactivity of CAI Program
This variable was included to examine if the effect differs by the level of
interactivity of computer programs. Three groups were used for this variable:
interactive PC, interactive mainframe, and batch mainframe. With popularity and
ease of use of microcomputer, more statistical programs are run on microcomputer
with interactive mode. The results in Table 4.11 shows that there are 28 effect sizes
in the interactive-PC group, two in the interactive-mainframe group, and only one
in the batch-mainframe group. The means of the three groups are 0.434, 0.424, and
0.211, respectively. The standard errors are 0.034, 0.204, and 0.207 for the three
groups. The 95% confidence interval levels for the interactive-PC and
interactive-mainframe groups do not include zero, which shows a positive effect in
using interactive mode in teaching statistics. However, the 95% confidence interval
level for the batch-mainframe mode includes zero, which indicates no significant
effect. Figure C.6 presents the forest plots for the effect sizes of the studies by the
three groups. Because there is only one effect size in the batch-mainframe group
and two effect sizes in the interactive-mainframe group, the results need to be
explained with caution.
In Table 4.12, the Q b is 1.138 { p = 0.566) which indicates that the means of
the three groups of effect sizes have no significant difference. The Qw statistic of
68.373 (p < 0.001) is significant. The Qw for the interactive-PC group is 68.291
(p < 0.001), which shows the variance is significantly different among the 28 effect
sizes in the interactive-PC group. The Qw statistic for the interactive-mainframe
group is 0.082 {p = 0.775), which shows that the means of the effect sizes are not
significantly different in this group. Since there is only one effect size in the
batch-mainframe group, it has no variation.
94
Table 4.11 Statistics of Study Effect Sizes by Level of Interactivity
Group
Interactive PC
Interactive mainframe
Batch mainframe
N
28 0.43420
2 0.42436
1 0.21064
lower
a{d+)
0.03377 0.36802
0.20481 0.02294
0.20689 -0.19485
upper
0.50039
0.82578
0.61614
Table 4.12 Q Statistics by Level of Interactivity
Source
Between Groups
Within Groups
Interactive PC
Interactive mainframe
Batch mainframe
Corrected Total
df
2
28
27
1
0
30
Q Stat p-value
1.138
0.566
68.373
0.000
68.291
0.000
0.082
0.775
0.000
69.511
0.000
Instructional Role of CAI Program
This variable was included to address if the effect differs by the instructional
role of the computer program. Some programs were used as supplement or adjunct
to the traditional method of teaching statistics while others were used as substitute
for traditional instructional methods in the primary studies. Table 4.13 presents
means for the 22 effect sizes in the supplement group and nine effect sizes in the
substitute group. The means for the two groups are 0.439 and 0.355 with standard
errors of 0.035 and 0.092, respectively. The 95% confidence interval levels for both
groups do not include zero, indicating positive effects on teaching statistics.
Figure C.7 presents the forest plots for the effect sizes of the studies by the two
groups.
In Table 4.14, the Q b is 0.73 { p = 0.393), which indicates that the means of
the two groups of effect sizes do not differ significantly. The Qyy statistic is 68.781
{p < 0.000) which indicates significant variation within the groups. The Q w
95
Table 4.13 Statistics of Study Effect Sizes by Instructional Role
Group
Supplement
Substitute
N
22
9
0.43902
0.35466
a{d+)
0.035207
0.092272
lower
0.37001
0.17381
upper
0.50802
0.53551
Table 4.14 Q Statistics by Instructional Role
Source
Between Groups
Within Groups
Supplement
Substitute
Corrected Total
df
1
29
21
8
30
Q stat
0.730
68.781
54.552
14.230
69.511
p-value
0.393
0.000
0.000
0.076
0.000
statistic for the supplement group is 54.552 { p < 0.001), which indicates the means
of the effect sizes are significantly different in the supplement group. The Qw
statistic for the substitute group is 14.23 {p = 0.076), showing that the means of the
effect sizes do not significantly in the substitute group.
Sample Size of Participants
This variable was included to examine if the sample size of study changes the
effect size. Table 4.15 shows means for the 16 effect sizes obtained from the studies
with fifty or fewer participants, nine effect sizes obtained from studies of size
51-100, and six effect sizes obtained from the studies with more than 100
participants. The weighted means for the three groups are 0.577, 0.360, and 0.418,
and the standard errors are 0.091, 0.076, and 0.040, respectively. None of the 95%
confidence interval levels for the three groups include zero, indicating positive effects
for using computers in teaching statistics. Figure C.8 presents the forest plots for
the effect sizes of the studies by the three groups.
In Table 4.16, the Q b is 3.559 { p = 0.169), indicating no significant difference
96
Table 4.15 Statistics of Study Effect Sizes by Sample Size
Group
0-50
51-100
101+
N
16
9
6
0.57724
0.35975
0.41768
a{d+)
0.090555
0.077513
0.039658
lower
0.39976
0.20783
0.33995
upper
0.75472
0.51167
0.49540
Table 4.16 Q Statistics by Sample Size
Source
Between Groups
Within Groups
0-50
51-100
100+
Corrected Total
df
2
28
15
8
5
30
Q Stat
3.559
65.952
22.357
29.910
13.685
69.511
p-value
0.169
0.000
0.099
0.000
0.018
0.000
in effect sizes. The Q w statistic is 65.952 { p < 0.001) which indicates significant
variation within the groups. The Qw statistic for the fifty or fewer group is 22.357
(p = 0.099), which indicates the means do not differ significantly among the 16
effect sizes within this group. The Qw statistic for the group of size 51-100 is 29.91
{p < 0.001), which shows that the effect sizes differ significantly within this group.
And, the Qw statistic for the group with more than 100 participants is 13.685
(p = 0.018), which shows that the effect sizes differ significantly within this group.
Comparisons Among Groups for Mode of CAI Program
The results obtained from the between groups homogeneity tests for eight
study characteristics show that the mean effect sizes of various modes of CAI
programs and the two types of CAI program are significantly differ. Since there are
more than two groups for the mode of CAI program, the differences among these
groups can be explored by employing the comparisons which are analogous to the
97
contrast in ANOVA.
Among the different modes of CAI programs, the results of the groups of
computational statistical packages and the Web-based programs have smaller mean
effect sizes than the other groups. The reason might be that computational
statistical packages were usually used to facilitate computation. This type of
program did not emphasize statistical concepts and understanding. For the
Web-based programs, students used the Web to reach the content of the statistics
course and the Web was used as a tool to reach the information.
To contrast the effect sizes of the two groups of computational statistical
packages and the Web-based programs with the other five groups, the contrast
coefficients were chosen to be -0.5 and -0.5 for the groups of computational packages
and Web-based programs, and 0.2 for each of the other five groups. Then, a linear
combination of the sample effect means in Equation 3.27 was calculated to estimate
the contrast parameter 7 as
7
=
(-0.5) X 0.1195 +(-0.5) X 0.23294+(0.2) X 0.92296+(0.2) X 0.68748-f
(0.2) X 0.48417+ (0.2) x 0.73997 + (0.2) x 0.98945
= 0.59,
(4.4)
with an estimated variance of
(J2(7)
=
(-0.5)2 ^ (0.0756)2 ^ (-0.5)2 ^ (0.ii092)2 +
(0.2)2 ^ (0.33698)2 + (0.2)2 ^ (0.20353)2 + (0.2)2 ^ (o.o4166)2 +
(0.2)2 ^ (0.20537)2 + (0.2)2 ^ (0.16867)2
= 0.0136.
(4.5)
98
The 95% confidence interval for 7 is
0.59 - 1.96 X VO.0136 < 7 < 0.59 + 1.96 x ^0.0136
(4.6)
0.36 < 7 < 0.82.
(4.7)
or
Because this confidence interval does not include zero, the contrast is significant at
the a = 0.05 level. That is, the mean effect size of the combined groups of
computational statistical packages and Web-based programs is significantly different
from the mean effect size of the combined group of drill-and-practice, tutorials,
simulations, multimedia, and expert systems.
99
CHAPTER 5
SUMMARY, DISCUSSION, CONCLUSIONS, AND
RECOMMENDATIONS
This chapter provides a summary of the results for answering the research
questions, a discussion of the significant findings in Chapter 4, the conclusions, and
some recommendations for using CAI in teaching introductory statistics at the
college level.
Summary
The literature of using CAI in teaching statistics has shown that various
computer programs have been popular and beneficial to both undergraduate and
graduate students. Moore (1997) and Ben-zvi (2000) pointed out that an
introductory statistics course should emphasize more on concepts, data analysis,
inference, and statistical thinking; foster active learning through different
alternatives to lecturing; and use technology and computers to automate
computations and graphics. This meta-analysis reviewed the research of CAI in
statistics education during the past few decades and performed a quantitative
synthesis of 25 primary studies with experimental results comparing the
effectiveness of CAI and traditional methods in teaching statistics. This section
includes the research questions with the results summarized.
The first question was "How effective is the use of computer-assisted
instruction (CAI) in enhancing the statistical learning of college students as
compared with non-computer instructional techniques?" The overall estimate of
population effect size 5 is 0.43 for the 25 primary studies with 31 effect sizes,
suggesting a medium effect according to Cohen's (1988) criterion. This result
100
indicates that the use of computer programs to assist teaching statistics for college
students increases the mean of the experimental group to the 67th percentile of the
control group. The standard error of this overall effect size estimate is 0.033. The
lower bound of the 95% confidence interval is 0.36 and the upper bound is 0.49,
which does not include zero. This effect size estimate suggests that
computer-assisted instruction has a medium positive effect on teaching statistics at
the college level. Examination of normality plots as well as statistical tests for
normality (Anderson-Darling and Shapiro-Wilk) indicated that the distribution of
the effect sizes was sufficiently normal. The Q statistics were used to determine
whether study effects were influenced by eight moderating variables. The results
indicate that the effect sizes were not homogeneous. The analogous ANOVA was
used to examine the eight variables. However, conclusions drawn from these results
should be tempered by the fact that there are only 25 primary studies in this
meta-analysis and the funnel plot detected a slight selection bias.
The second question was "Does the effectiveness of CAI differ by the
p u blication year of the study?" The results of the analogous ANOVA {QB) show
that there is no significant difference in the effect sizes among the four publication
year categories. However, the results from the Qw show that the effect sizes within
the year categories of 1990-1994 and 1995-1999 are significantly different, indicating
the substantial variation within categories reduces the power to detect significant
differences between categories. The weighted regression approach also provides
evidence that the effect size estimates do not change as the publication year
changes. The variable of publication year does not affect the estimates of the
effectiveness of CAI in teaching statistics.
The third question was "Does the effectiveness of CAI differ by the source of
the study (dissertation, journal article, or ERIC document)?" The results of the
analogous ANOVA show that the effectiveness of CAI does not differ according to
the three sources of studies from dissertations, journal articles, and ERIC
101
documents. In general, published articles are expected to have greater effects than
unpublished reports. However, in this study, the effect sizes from journal articles are
not significantly different from the effect sizes from dissertations and ERIC
documents. The Qw statistics indicate substantial variation within the journal
articles and ERIC documents. The variable source of study does not appear to
affect the estimates of effect size.
The fourth question was "Does the effectiveness of CAI differ by students'
level of education (undergraduate or graduate)?" The results indicate there is no
significant difference in the magnitude of effect sizes among the studies with
graduate students, those with undergraduate students, and those with both
graduate and undergraduate students. However, more studies in this meta-analysis
used CAI with undergraduate students. One reason might be that more
introductory statistics courses are offered for the undergraduate than for graduate
students. The Qw statistics indicate substantial variation within the three groups.
The fifth question was "Which modes of computer-assisted instruction (CAI)
techniques are the most effective for statistical instruction for college students?" For
example, there are drill-and-practice, tutorials, multimedia, simulations,
computational statistical programs, expert systems, and Web-based programs. The
results show that there are significant differences among the seven modes of CAI
programs. The comparison of the mean effect sizes of the group combining the
expert systems, the drill-and-practice program, the tutorials, the simulations, and
the multimedia programs with the group of the computational statistical programs
and the Web-based programs conclude that the mean effect sizes of the two contrast
groups differ significantly. The Qw statistics indicate no significant variation within
any of the seven modes. However, the small number of studies in each of these
modes may limit inferences of these results.
The sixth question was "Does the effectiveness of CAI differ by the software
type (commercial or teacher-made)?" The results suggest that the mean effect size
102
of the group of the teacher-made programs is significantly greater than the mean
effect size of the group of the commercial programs. This result might be explained
by the rationale that the teachers who could design and develop statistical computer
program usually are knowledgeable in computer programming and skills and can
design the specific programs to meet their goals of instruction. They may be more
more involved in the teaching process as well as have more commitment to teaching
the CAI course. The commercial programs are usually more general and the
teachers who use them might not have used the programs as well as those teachers
who developed their own programs. The results of the Qw statistics indicate the
effect sizes have no significant variation within the group of commercial programs.
However, the effect sizes have significant variation within the group of teacher-made
programs.
The seventh question was "Does the effectiveness of CAI differ by the level of
interactivity of the program (interactive-PC, interactive-mainframe, or
batch-mainframe)?" The results show that there are no significant differences
among the programs run on PC or mainframe in interactive mode and on
mainframe in batch mode. However, the number of the studies run on the
interactive-PC exceeds the number of the programs run on mainframe. The
statistical computer programs were implemented more on microcomputers than on
the mainframe. And, the interactive mode of programs have been more widely used
than the batch mode. The results of the Qw statistics indicate the effect sizes have
significant variation within the group of interactive PC.
The eighth question was "Does the effectiveness of CAI differ by the role of
the program (supplement or substitute)?" The results show that there is no
significant difference between the programs used as a supplement to the traditional
instructional method or as a substitute for the traditional method for the primary
studies in this meta-analysis. The Qw statistics indicate the effect sizes have
significant variation within the supplement group but no significant variation within
103
the substitute group.
The ninth question was "Does the effectiveness of CAI differ by the sample
size of the participants?" The results indicate no significant differences among the
four groups of different sample sizes. There are 16 effect sizes with sample in the
group of 0-50, 9 effect sizes in the group of 51-100, and 6 effect sizes in the group of
101+. The Qiv statistics indicate the effect sizes have no significant variation within
the group of 0-50, but have significant variation within the groups of 51-100 and
101+.
Discussion
The combined overall effect size is estimated to be 0.43 from the 25 primary
studies in this meta-analysis. A similar meta-analysis conducted by Christmann and
Badgett (1999) compared the effectiveness of some microcomputer-based software
packages on statistical achievement. Christmann and Badgett (1999) selected only
nine primary studies from 1987 to 1997 and generated 14 effect sizes from these
studies. Among the 14 effect sizes, ten were from the studies using computational
statistical software to teach statistics. The effect size estimate for that group of
programs is 0.043. Two of the 14 effect sizes were from the studies using expert
systems and statistical exercises. The effect size for that group is 0.651. The other
two effect sizes were from a study using HyperCard to teach statistics. The effect
size for the two different group is 0.929. The overall effect size estimate is 0.256 for
Christmann and Badgett's (1999) meta-analysis. The present meta-analysis
included eight studies from Christmann and Badgett's (1999) studies and excluded
one study that was conducted with Korean college students. The estimate of overall
effect size for this meta-analysis is 0.43, which is larger than 0.256 obtained by
Christmann and Badgett (1999). One reason might be that the present
meta-analysis includes more modes of CAI programs which have larger effect sizes
104
than the computational statistical programs.
From the examination of the relationship between the eight study
characteristics and the effectiveness of CAI programs, two characteristics show
significant results according to the analogous ANOVA method proposed by Hedges
and Olkin (1985). The two variables are the modes of the CAI programs and the
type of the CAI programs (commercial or teacher-made).
The results of the analogous ANOVA show that the modes of CAI programs
have significant differences on the effects in teaching statistics. The effectiveness can
be examined with the group means of the seven modes, which found that the means
of the modes of expert systems, drill-and-practice, multimedia, and tutorial are
larger than the means of computational programs. The mean of the simulation is at
the medium level. However, the result shows that only one study (Porter, 1996) in
this meta-analysis which used a drill-and-practice program was included. In the
early years, drill-and-practice computer programs were the common mode used in
teaching but were not evaluated by experimental studies. This mode of program is
based on the behavioral learning theory. As the paradigm of learning theories
expanded to include constructivism, educators might have used this mode of
drill-and-practice program less frequently. However, in this meta-analysis the effect
size obtained from this drill-and-practice program is 0.92 and shows a large effect in
assisting learning statistics.
While Gagne, Wager, and Rojas (1981) observed that drill-and-practice,
simulations, and tutorials are the most common modes of CAI, this meta-analysis
reveals that statistical computational packages are the most used in the 25 primary
studies. There are 12 effect sizes obtained in this meta-analysis. This result might
be due to the computational needs of teaching and learning statistics. However, the
effect size for this group is only 0.12 and is not significantly different from zero. One
reasonable explanation for this result may be that computational statistical
packages usually provide students with tools to perform statistical analyses and
105
facilitate computing skills rather than enhance statistical concepts or achievement.
When the students use the statistical packages, they sometimes cannot fully
understand the computational procedures and spend much time and effort in
performing the computer tasks. Although the effect is not significant, the use of
computational packages is very important in learning statistics at the college level.
The students using computer packages may have learned computational and
computer skills which are important and required by employment in the professional
market. For the group of the Web-based programs, the mean effect size is also
small. One reason may be that the Web is a tool of assessing information and
facilitating communication and the use of the Web does not contribute much in
learning of statistics.
Expert systems have also been increasingly used in teaching statistics. The
effect size for this mode of program is about 0.98 which is large. Athey (1987)
developed a mentor expert system incorporating expertise for guiding and teaching
statistical decision-making. Olsen and Bozeman (1988) used an expert system to
assist selection of appropriate statistical procedures. These programs enhance
higher order thinking and emphasize statistical reasoning. Students using this type
of program may have better problem solving and application skills in statistics.
Simulation programs have also been frequently used in teaching statistics.
They provide opportunities for students to become actively involved in the process
of simulating some abstract statistical concepts, such as the central limit theorem.
In this meta-analysis, the effect size for simulation programs is at a medium level.
Tutorial programs also have the same level of effect on teaching statistics. The
advantage of tutorials is that they provide students with graphical displays of
abstract and complicated statistical concepts and allow students to proceed at their
own pace and learning style. Multimedia programs also play a role in teaching
statistics. They integrate various media to display graphic, text, and sound in
computer programs. In this study, the effect size is 0.76 that demonstrates the effect
106
close to a large effect.
In recent years with the rapid development of the Internet and World Wide
Web, some Web-based programs have also been used to teach statistics. However, in
this meta-analysis, the effect size of the Web-based programs is 0.23, a small effect
size. The results indicate that the Web presentation may not be an important factor
in the effectiveness of teaching statistics.
Among the eight study characteristics, the type of CAI program was another
variable that showed a significant difference on the effectiveness in teaching
statistics. The other six variables did not provide significant results. For the type of
CAI program, the analysis showed that the mean effect size of the programs
designed or developed by the teachers was significantly higher than the commercial
statistical programs. This result might be explained with the rationale that the
teachers who could design and develop statistical computer program usually are
more capable and knowledgeable in computer programming and skills. The teachers
could design the specific programs to meet the special goals of instruction.
Kuchler's (1998) meta-analysis for the effectiveness of using computers to teach
secondary school mathematics found that teacher-made software is more effective
for mathematics instruction. The effect was also explained as teachers may be more
committed to teaching than those who use commercial software.
Conclusions and Recommendations
The overall results of this meta-analysis indicate a positive medium effect,
implying that using CAI can increase students' statistical achievement to a
moderate extent. The examination of the selected characteristics show that the
different modes have significant differences in teaching statistics. The computational
statistical packages and the Web-based programs are the least effective modes.
However, the commercially-developed statistical packages are most commonly used
107
in statistics courses. In spite of the results of this meta-analysis, the adequate skills
of students to use statistical packages to perform data analysis should still be
enforced and emphasized because the ability is required for the job in the real
world. The other nonsignificant mode is the Web-based programs for teaching
statistics. This result implies that the effect of learning statistics may not differ by
whether the computer programs are delivered on the Web or not.
Computer programs of drill-and-practice, tutorials, and simulations are
effective as well as expert systems and multimedia. These programs convey
statistical concepts and emphasize comprehension. However, these programs are
usually not available for general use in statistics courses. They are mostly developed
to address specific topics. Teachers need to invest more efforts to obtain these
programs or to develop by themselves, which requires more commitment and cost.
This reason may explain why these programs are more effective. There have been a
number of teacher-made programs to teach specific topics or objectives. The
problem is that it is difficult to evaluate these programs and to distribute some
good programs to other students. A recommendation is made that an online outlet
can be established to collect successful CAI programs for interested teachers to
locate these programs and to share teaching experiences and ideas.
Computer programs are usually used in the statistics courses as a supplement
to lectures. In this meta-analysis, most of the primary studies used computer as an
instructional aid to reinforce students' understanding in statistics. Although the
result does not show significant difference between the two instructional roles of
supplement and substitute, it is reasonable to suggest that lectures are important to
provide explanation and responses while questions are raised in learning statistics.
As the computer technology has advanced in the past few decades, the
effectiveness of CAI in teaching statistics does not differ significantly. This result
implies that learning statistics may not depend on the development of technology.
However, as computers become more popular and available to most students.
108
students' abilities in performing various tasks become stronger than students in the
past years. When using computers in teaching statistics, teachers can focus better
on learning statistics rather than computer skills. Since the 1980s, microcomputers
have been introduced and available to many students, and many CAI programs in
teaching statistics are available on microcomputers and are performed interactively.
Students expend less effort to to learn to use a statistical package on a
microcomputer than on the mainframe. The interactive mode is typical in most
computer programs.
A common problem of using CAI in teaching statistics is that not all
statistics teachers are competent in computer technology and familiar with the
computer programs. Under this circumstance, the effect of teaching statistics is
affected by this factor. Especially, as the technology has changed dramatically in
these years, statistics teachers need to constantly learn new technology as well.
The result of this meta-analysis presents a positive medium effect size of
using CAI in teaching statistics. However, many factors may affect the effectiveness
of a computer program. Can the use of computers improve students' learning in
statistics as Moore (1997) expected? The answer is "yes".
109
APPENDIX A
PRIMARY STUDY REVIEW SHEET
Author (s):
Data:
M e '•
Se riEt:
Mc•
Scnc'.
F:
Repeated-measures design: Yes or No
Study characteristics:
1.
Year of publication:
2.
Source of study:
3.
Level of education:
4.
Mode of application:
5.
Commercial/teacher-made of program:
6.
Interactive/batch on pc/mainframe:
7.
Supplement/substitute:
8.
Total sample size:
9.
Others:
110
APPENDIX B
TABLES OF DATA
Primary study data are listed in Table B.l.
Effect size data are listed in Table B.2
Primary Study Characteristics are listed in Table B.3
Standard Errors and Confidence Intervals are listed in Table B.4
Ill
Table B.l Primary Study Data
Year
2001
1987
1987
1997
1985
1993
1993
1990
2000
2000
1993
1998
1991
2001
1999
1999
2002
2002
2002
1990
1990
1996
1989
1988
1996
1998
1994
1991
1999
1989
1986
Author
Aberson
Athey
Athey
Christmann
Dinkins
Dorn
Dorn
Gilligan
Gonzalez
Gonzalez
Gratz
High
Hollowell
Hurlburt
Jones
Koch
Lane
Lane
Lane
Marcoulides
Marcoulides
McBride
Myers
Olsen
Porter
Raymondo
Rosen
Sterling
Wang
Ware
White
HE n c
55
56
12
13
10
9
36
14
9
9
17
18
19
20
36
42
15
14
14
14
27
28
43
44
52
81
36 116
33
56
12
14
140 340
681 340
776 340
43
44
41
44
10
10
23
29
15
13
20
19
36
51
25
25
38
28
12
14
55
41
10
10
ME
6.73
3.08
3.7
85.67
18.67
12.9
15.2
16.98
4.9
5.8
11.7
71.7
2.846
0.09
80.9
2.5
Mc
6.79
2.07
2.88
83.93
13.44
10
13.5
15.55
4.3
4.3
11.8
75.5
2.938
-0.03
75.2
1.75
SDE
SDc
t
F
0
1.11
1.45
9.63
1.32
2.7
3
3.09
2.5
2.8
2.8
13.8
1.38
1.16
8.35
4.16
2.6
2.4
3.96
2.6
2.6
2.7
14.4
0.96
1
1.07
1.10
2.005
1.367
3.838
0.5
0.62
7.14
13.56
56.06
57.69
42.48
40.71
21.7
72.37
11.4
37.54
37.54
14.2
61.966
7.23
4.19
5.4
7.04
18.65
2.12
5.28
5.28
5.63
20.43
3.56
5.84
8.65
81.68
72.8
75.508
9.92
4.53
16.4
82.68
70.7
62.692
10.14
4.145
14.6
14.58
15.7
10.745
1.56
1.95
3.41
13.41
11.8
18.301
1.61
1.61
2.95
0.75
112
Table B.2 Effect Size Data
ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Aut h or
Aberson
Athey
Athey
Christmann
Dinkins
Dorn
Dorn
Gilligan
Gonzalez
Gonzalez
Gratz
High
Hollowell
Hurlburt
Jones
Koch
Lane
Lane
Lane
Marcoulides
Marcoulides
McBride
Myers
Olsen
Porter
Raymondo
Rosen
Sterling
Wang
Ware
White
Year Hedges' g
2001 0.00000
1987 0.80279
1987 0.54566
1997 0.18708
1985 1.27933
1993 1.09477
1993 0.62764
1990 0.39881
2000 0.23542
2000 0.55517
1993 -0.03637
1998 -0.26938
1991 0.08885
2001 0.11829
1999 0.58640
1999 0.69038
2002 0.36978
2002 0.49719
2002 0.49398
1990 1.03507
1990 0.59384
1996 1.84677
1989 0.52902
1988 1.45010
1996 0.94221
1998 -0.07192
1994 0.15000
1991 0.88853
1999 -0.13860
1989 0.21234
1986 0.56456
Hedges' d
0.00000
0.77627
0.59266
0.18414
1.21825
1.06967
0.61482
0.39486
0.22881
0.53898
-0.03585
-0.26699
0.08834
0.11769
0.58133
0.66854
0.36920
0.49683
0.49365
1.02590
0.58845
1.76855
0.52104
1.40780
0.92296
-0.07129
0.14764
0.87807
-0.13422
0.21064
0.54065
Weighted d
0.000
4.505
2.690
1.850
4.624
8.183
5.720
7.509
1.646
3.641
-0.493
-5.755
2.795
3.229
11.613
4.093
36.104
109.663
113.770
19.717
11.971
6.357
6.467
7.866
8.128
-1.503
1.841
12.937
-0.865
4.921
2.608
113
Table B.3 Primary Study Characteristics
Year
Author
Inter
N
Source Level
Mode
Type
Instr
2001 Aberson
journal mixed web
teacher interpc
subs
111
1987 Athey
under expert
disser
teacher interpc
subs
25
1987 Athey
disser
grad
expert
teacher interpc
subs
19
1997 Christmann journal grad
comp
comm
interpc
supp
50
1985 Dinkins
disser
grad
tutorial teacher interpc
supp
9
1993 Dorn
disser
mixed multi
teacher interpc
supp
35
1993 Dorn
disser
mixed multi
teacher interpc
supp
39
supp
1990 Gilligan
disser
under comp
comm
intermn
49
2000 Gonzalez
disser
under comp
teacher interpc
subs
29
2000 Gonzalez
disser
under multi
teacher interpc
subs
28
under comp
comm
interpc
supp
55
1993 Gratz
ERIC
1998 High
under comp
comm
interpc
supp
87
ERIC
under
comm
interpc
supp
133
1991 Hollowell
ERIC
comp
under web
teacher interpc
subs
152
2001 Hurlburt
ERIC
1999 Jones
under web
teacher interpc
subs
89
ERIC
journal under web
teacher interpc
supp
26
1999 Koch
under simu
teacher interpc
supp
480
2002 Lane
ERIC
under simu
teacher interpc
supp 1021
2002 Lane
ERIC
under simu
teacher interpc
supp 1116
2002 Lane
ERIC
supp
87
teacher interpc
1990 Marcoulides journal under expert
supp
85
1990 Marcoulides journal under tutorial teacher interpc
comm
interpc
supp
10
1996 McBride
journal under comp
under simu
comm
interpc
subs
52
1989 Myers
disser
subs
28
1988 Olsen
journal grad
expert
teacher interpc
teacher interpc
supp
39
1996 Porter
journal under drill
87
comm
interpc
supp
1998 Raymondo
journal under comp
25
comm
interpc
supp
1994 Rosen
journal under comp
teacher interpc
supp
66
1991 Sterling
journal under simu
supp
26
comp
comm
interpc
1999 Wang
journal grad
comm
batchmn supp
96
1989 Ware
journal under comp
supp
20
under comp
comm
intermn
1986 White
disser
Note. Source: disser=dissertation; Level: mixed=undergraduate and graduate,
under=undergraduate, grad=graduate; Mode: web=web-based programs,
comp=computation statistical programs, expert=expert systems,
multi=multimedia, simu==simulation, drill=drill-and-practice;
Type: teacher=teacher-made, comm=commercial; Inter: interpc=interactive pc,
intermn=interactive mainframe, batchmn=batch mainframe;
Instr: subs=substitute, supp=supplement.
Table B.4 Standard Errors and Confidence Intervals
ID
5
22
3
31
2
27
16
29
10
24
9
6
7
25
8
4
23
11
28
21
12
20
26
15
30
1
13
14
17
18
19
Author
Dinkins
McBride
Athey
White
Athey
Rosen
Koch
Wang
Gonzalez
Olsen
Gonzalez
Dorn
Dorn
Porter
Gilligan
Christmann
Myers
Gratz
Sterling
Marcoulides
High
Marcoulides
Raymondo
Jones
Ware
Aberson
Hollowell
Hurlburt
Lane
Lane
Lane
Hedges' d
1.21825
1.76855
0.59266
0.54065
0.77627
0.14764
0.66854
-0.13422
0.53898
1.40780
0.22881
1.06967
0.61482
0.92296
0.39486
0.18414
0.52104
-0.03585
0.87807
0.58845
-0.26699
1.02590
-0.07129
0.58133
0.21064
0.00000
0.08834
0.11769
0.36920
0.49683
0.49365
SE,
N
0.51327
9
0.52744
10
0.46942
19
0.45531
20
0.41510
25
0.28323
25
0.40417
26
0.39384
26
0.38477
28
0.42306
28
0.37282
29
0.36156
35
0.32784
39
0.33698
39
0.22932
49
0.31551
50
0.28385
52
0.26975
55
0.26052
66
0.22171
85
0.21539
87
0.22810
87
0.21775
87
0.22374
89
0.20689
96
0.18984
111
0.17778
133
0.19090
152
0.10112
480
0.06731 1021
0.06587 1116
lower
0.21226
0.73479
-0.32738
-0.35174
-0.03731
-0.40747
-0.12362
-0.90613
-0.21515
0.57861
-0.50191
0.36103
-0.02773
0.26250
-0.05460
-0.43424
-0.03530
-0.56455
0.36746
0.15391
-0.68915
0.57883
-0.49807
0.14281
-0.19485
-0.37208
-0.26010
-0.25647
0.17100
0.36491
0.36455
upper
2.22424
2.80232
1.51271
1.43304
1.58985
0.70276
1.46071
0.63769
1.29311
2.23698
0.95953
1.77831
1.25737
1.58342
0.84432
0.80253
1.07737
0.49284
1.38869
1.02299
0.15517
1.47298
0.35549
1.01985
0.61614
0.37208
0.43678
0.49186
0.56740
0.62875
0.62276
115
APPENDIX C
FOREST PLOTS FOR EFFECT SIZES GROUPED BY
STUDY CHARACTERISTICS
Forest plots by publication year are displayed in Figure C.l.
Forest plots by publication source are displayed in Figure C.2.
Forest plots by level of education are displayed in Figure C.3.
Forest plots by mode of CAI program are displayed in Figure C.4.
Forest plots by type of CAI program are displayed in Figure C.5.
Forest plots by level of interactivity of CAI program are displayed in
Figure C.6.
Forest plots by instructional role of CAI program are displayed in Figure C.7.
Forest plots by sample size are displayed in Figure C.8.
116
11
13
27
7
28
20
6
12
29
26
4
15
16
25
22
18
10
67
92
77
24
o
o
1
14
9
17
19
21
52
59
54
78
22
41
56
04
09
15
39
59
61
88
03
07
47
27
13
07
18
12
23
37
49
50
54
43
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1[
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
max
2.8023190765
*
[*]
0
[--0-*
—
-]
]
[0-— *
— ]
[ -0— — - *
[--0— — *
—]
]
[0-— — *
0[
—*
[
0
0 [-*-]
[~ *0— — ]
[ --0*-- ]
[ -0-*— -3
-*--—]
0 [—
-]
[0-— -*
* —- ]
0 [[ — * - —]
0
*0 [0 [-* - ]
-*--0-]
[
— *-0— — - ]
[- *0— - ]
[— -0-*— —]
0[- — - * — ]
[-0— — — ^
—]
DC
]
0 [[
0
[-*-]
[ -*— - ]
[ --0*- — ]
]
-0-*—
[0[ ]
[- *]
0
[- *]
0
[ --0— — *
-]
[*]
0
1
1
o
8
21
0
0
0
0
0
0
1
1
0
-0
0
0
0
0
0
0
1
1
0
-0
-0
-0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
min
-0.906125548
1
*
\
1
1
1
2
5
24
*Total
Ware
Myers
Athey
White
Athey
Dinkins
Olsen
*1985-1989
Gratz
Hollowell
Rosen
Gilligan
Marcoulides
Dorn
Sterling
Marcoulides
Dorn
*1990-1994
High
Wang
Raymondo
Christmann
Jones
Koch
Porter
McBride
*1995-1999
Aberson
Hurlburt
Gonzalez
Lane
Lane
Lane
Gonzalez
*2000-2002
OC
30
23
3
31
d
CJl
00
Author(s)
Figure C.l Forest plots for effect sizes grouped by publication year.
117
8
23
3
10
31
7
2
6
5
29
26
1
27
4
30
21
16
28
25
20
24
22
12
11
13
14
17
19
18
15
*Total
Gonzalez
Gilligan
Myers
Athey
Gonzalez
White
Dorn
Athey
Dorn
Dinkins
•Dissertation
Wang
Raymondo
Aberson
Rosen
Christmann
Ware
Marcoulides
Koch
Sterling
Porter
Marcoulides
Olsen
McBride
*Journal
High
Gratz
Hollowell
Hurlburt
Lcine
Lane
Lcine
Jones
*ERIC
0
0
0
0
0
0
0
0
0
1
1
0
-0
-0
0
0
0
0
0
0
0
0
1
1
1
0
-0
-0
0
0
0
0
0
0
0
43
23
39
52
59
54
54
61
78
07
22
59
13
07
00
15
18
21
59
67
88
92
03
41
77
43
27
04
09
12
37
49
50
58
41
min
-0.906125548
0
1
1
1
1
1
1
1
*
]
-—]
- ]
—
—
]
- ]
]
—
[0
0[
[- - * - ]
0
—*-0— — - ]
*-
[- — -*o— - ]
1
[ —>lt—
[ —0-* — ~]
[ - - —0-* — —]
[ —0-* —
0 [— -*— ]
1
1
1
--
- ]
--
1
1
1
- ]
1
1
1
1
1
1
1
— ]
—
[ — — 0— — -*
[ — 0— — *
*
c- — 0—
[0-— - *
—*
[0-
1[
1
1
1
]
—
[0--
1
1
1
1
1
]
[- - - —0-*
1
1
1
1
1
max
2.8023190765
*
1
9
d
t—1
O1
Author(s)
[
— ]
[-0-—
*— - - ]
[0
*—
]
0 [[-—*- — ]
0
—*
[
0
[
0
0 [ - *- ]
-*-0-]
— ]
[-— -*o— — ]
[ —0*[ —0*- — ]
0[ ]
[0
[0
- - ]
* ]
* ]
—]
0[- —
0 [-* - ]
Figure C.2 Forest plots for effect sizes grouped by publication source.
118
1
7
6
1
1
1
1
1
*
O
1
1
1
1
1
p—1
1
I
O
1
*
1
1
1
1
L_J
]
1
1
1
1
1
1
1
[
0-*
]
0 [-*-]
[0
*
]
0 [-*]
0 [-*]
[0
*
]
[__o
*
]
1
1
o[
0[
*
*
1
1
1
1
1
1
1[
1
1
1
1
1
1
1
1
1
]
]
1
29
4
3
5
24
0-*
nr
1
20
22
[
1
1
1
1
1
1
25
1
1
1
1
1
1
1
1
O
1
2
28
*
21
16
[-0*—]
1
1
1
1
1
1
1
1
1
1
31
15
1
1
1
1
1
10
0 [*]
[_
*__o-]
[
*0 ]
O
23
1
1
1
1
18
43
27
07
04
09
12
15
21
23
37
39
49
50
52
54
54
58
59
67
78
88
92
03
77
43
13
18
59
22
41
53
00
61
07
31
1—1
1
1
1
1
O
1
1
8
19
0
-0
-0
-0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
-0
0
0
1
1
0
0
0
1
0
*
13
14
27
30
9
17
*Total
High
Raymondo
Gratz
Hollowell
Hurlburt
Rosen
Ware
Gonzalez
Lane
Gilligan
Lane
Lane
Myers
Gonzalez
White
Jones
Marcoulides
Koch
Athey
Sterling
Porter
Marcoulides
McBride
*Undergraduate
Wang
Christmann
Athey
Dinkins
Olsen
*Gradu
Aberson
Dorn
Dorn
*Mixed
max
2.8023190765
*
min
-0.906125548
r—i
12
26
11
d
r—I
Author(s)
[0
*
]
0 [
*
]
0 [
*
]
0
[
*
]
0
[
0 [*]
*_o
]
[
0-*
]
[
0
*
]
0[
*
0
[
*—
0 [—* ]
[—*—]
[0
*
]
0 [
*
[-*—]
Figure C.3 Forest plots for effect sizes grouped by level of education.
119
Author(s)
1—1
1
1
1
1
0
1
*
1
1
1
1
111
o-^
[-_o-*
]
1
1
1
1
1
0
1
*
1
1
1
1
1
1
1
111
]
to
to
r—1
*
]
1
1
1
1
1
1
O
r—1
I
1
1
1
1
1
1
1
*
1
111
1
1
1
[0
—]
0
[
[©•-]
0 [-•-]
0 [-•]
0 [-•]
[0
•
1
1
1
0
[__o
[0
[
1
•
]
1
1
1
1
1
1
]
•—]
1
1
1
1
1
-]
-]
*
1
1
1
1
1
1
1
O
1
J_L
1
]
-]
*
o[
—]
[--•-]
1
1
—]
0]
0 [
*—]
O
1
1
1
*-
1—r
1
1
1—1
o
24
OC OC
2
20
[
*
3
*
16
*0
]
[-O^—]
1
o
o
1
14
15
]
1
1
1
1
1
1
1
1
6
—]
*
•
J—I
7
1
1
[
O
10
*
28
]
f-n
1
1
1
1
1
1
1
*
1
18
23
1
1
O
17
19
[
•
1—r
22
0[
0[
0 [
*__o-]
]
]
1
1
1
1 1—1
1 11
1 1
0 O
1
1
1
1 1
1 1
1 i__i 1
1
1
1
111
8
31
*
11
13
27
4
30
9
*
26
1
1
1
1
1
1
1
1
1
1
1
11
12
29
[*]
1
1
1
1
1
1
1
11
5
0 43
0 92
0 92
0 59
1
0 69
0 27
0 13
0 07
0 04
0 09
0 15
0 18
0 21
0 23
0 39
0 54
1 77
0 12
0 37
0 49
0 50
0 52
0 88
0 48
0 54
0 61
1 07
0 74
0
0 12
0 58
0 67
0 23
0 59
0
1
1 41
0.99
max
2.8023190765
•
oo
21
*Total
Porter
*Drill
Marcoulides
Dinkins
•Tutorial
High
Wang
Raymondo
Gratz
Hollowell
Rosen
Christmann
Ware
Gonzalez
Gilligan
White
McBride
•Computation
Lane
Lane
Lane
Myers
Sterling
•Simulation
Gonzalez
Dorn
Dorn
•Multimedia
Aberson
Hurlburt
Jones
Koch
•Web
Athey
Athey
Marcoulides
Olsen
•Expert Systems
* *
25
mm
-0.906125548
•
1
0
[
0
[0
0
0
0
—]
•
*
]
[
•-—]
— *
[
[—•— -]
Figure C.4 Forest plots for effect sizes grouped by mode.
—]
120
8
1—11
11
1
•X0
11
1
]
—]
1
1
1
1
1
1
1
[—0
*
—]
-]
l __l
0[
*
]
-]
[0
*
—]
[-0
*
]
[0
*
0 [
*—~]
]
0
[
*-—]
0 [
*-
1
1
1
0
1
1
0
0
1
1
1
1
1
1
1
1
1
o
92
07
22
41
50
[
0 [-*-]
0 [-*]
0 [-*]
0
*
1 —
t
1
1
1
[
0
[
[*-]
* ]
[-0*
]
1
1
1
1
1
1
1
1
11
5
24
]
*
O
20
6
*
25
1
1
1
1
00
16
2
28
*
1
7
[0
[0
1 —1
1
1
1
1
1
1
1
1
O11
1
1
1
21
*
15
77
14
00
12
23
37
49
50
59
54
58
59
61
67
i __i 1
3
10
*0
]
[-0*—]
[
0-*
]
[
0-*
]
1
1
*
18
1
1
[
1
11
11
11
11
11
O11
11
1
14
9
17
19
1
1
J __L
22
0 [*]
*__o-]
*_o
]
OL
*
23
31
1
1
[
1[
L_J
11
13
27
4
30
27
13
07
04
09
15
18
21
39
52
max
2.8023190765
*
p—11
011
*
11
11
26
0
-0
-0
-0
-0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
min
-0.906125548
00
00
29
*Total
High
Wang
Raymondo
Gratz
Hollowell
Rosen
Christmann
Ware
Gilligan
Myers
White
McBride
*Conimercial
Aberson
Hurlburt
Gonzalez
Lane
Lane
Lane
Athey
Gonzalez
Jones
Marcoulides
Dorn
Koch
Athey
Sterling
Porter
Marcoulides
Dorn
Dinkins
Olsen
*teacher-made
o
CO
12
d
CO
Author(s)
c
— ]
[
[*]
—*—
Figure C.5 Forest plots for effect sizes grouped by type.
— ]
121
Author(s)
12
29
26
11
1
13
14
27
4
9
17
19
18
23
3
10
15
21
7
16
2
28
25
20
6
5
24
22
8
31
30
d
•Total
0.43
High
-0.27
Wang
-0.13
Raymondo
-0.07
Gratz
-0.04
Aberson
0.00
Hollowell
0.09
Hurlburt
0.12
Rosen
0.15
Christmann
0.18
Gonzalez
0.23
Lane
0.37
Lane
0.49
Lane
0.50
Myers
0.52
Athey
0.59
Gonzalez
0.54
Jones
0.58
Marcoulides
0.59
Dorn
0.61
Koch
0.67
Athey
0.78
Sterling
0.88
Porter
0.92
Marcoulides
1.03
Dorn
1.07
Dinkins
1.22
Olsen
1.41
McBride
1.77
•Interactive PC 0.43
Gilligan
0.39
White
0.54
*Inter Main
0.42
Ware
0.21
*Batch Main
0.21
mm
-0.906125548
*
0
[
max
2.8023190765
*
[*]
*—0-]
[
*_o
[
[
]
*0—]
*0
]
[-—*—]
[—0*—]
[—0*
[
[
[
]
0-*
0-*
0-*
]
]
]
0 [-*-]
0
[-*]
0
[-*]
[0
[
*
0
[__0
0[
0 [
[0
]
*
]
*
*
*
[_0
]
]
*
]
*
]
*
[
]
*
*
]
]
*
[
]
]
*
[
*
[*]
[0—*
[___0
]
]
*
[0
0
[
0 [
0
0
[
0 [
0
0
0
]
*
]
*
]
C-—*—-3
[—0-*
]
[—0-*
]
Figure C.6 Forest plots for effect sizes grouped by level of interactivity.
]
122
12
29
26
11
13
27
4
30
17
8
19
18
31
22
7
16
28
25
20
6
5
22
1
14
9
23
3
10
15
0
-0
-0
-0
-0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
1
0
43
27
13
07
04
09
15
18
21
37
39
49
50
54
59
61
67
88
92
03
07
22
77
44
00
12
23
52
59
54
58
78
41
35
min
-0.906125548
[*]
0
1
1 [— - *--0-]
—
-]
- -*-0—
1[
1
[- — -*o— -]
1
[-- - --*o— —]
[ —0*---]
1
1
[— —0-* — - - ]
1
[— —0-* — —]
[—0-* — - ]
1
0[ ]
1
—-]
[01
0 [-*]
1
0 [-*]
1
— —]
[-—0— — *
1
-*
-]
0 [—
1
—- ]
[0- — - *
1
— —]
—
*
—
[-0-- —
1
—
>l<
[--]
0
1
*—
]
0 [-1
—]
[—
0
1
0 [1
*
0[
1
0
1
[*]
0
1
1
[— !|< - ]
[—0*- —]
1
]
1
[-— —0-* —
-]
[0— — *
1
—— ]
1
[— —0— — -*
—- ]
—
s|c
[—0—
1
-*
-]
—
1
*-—
]
—
1
[— — —* —
0
1
0[ ]
1
max
2.8023190765
*
1
1
o
2
24
*Total
High
Wang
Raymondo
Gratz
Hollowell
Rosen
Christmann
Ware
Lane
Gilligan
Lane
Lane
White
Marcoulides
Dorn
Koch
Sterling
Porter
Marcoulides
Dorn
Dinkins
McBride
*Supplement
Aberson
Hurlburt
Gonzalez
Myers
Athey
Gonzalez
Jones
Athey
Olsen
•Substitute
d
0
f—1
1
Author(s)
— ]
Figure C.7 Forest plots for effect sizes grouped by instructional role.
123
2
25
6
5
24
22
12
26
11
1
18
to
09
12
OC
13
14
17
19
49
50
42
[*]
-—]
]
[
0-*
]
[
0-*
]
[
0-*
]
[0-- - *
*—
]
[
0-—]
[-0-— *
]
[ 0-— *
* —— ]
[0~
:1c
]
[-0-*-]
[0-]
*
0 [
-*
—
0 [
*
0[
1
1
1
1
1
I1
1*
1
0
11
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
c—
[-[-*-]
*-
~
— ]
— ]
[
*0— -]
*0— —]
[—0-*— ]
[0~ — *
[
[
1
o
28
20
00 OC OC o
00 o OC o
21
92
07
22
41
77
58
27
07
04
21
52
58
59
1
J_L
30
23
15
39
59
54
54
61
67
00
^t
16
43
13
15
18
max
2.8023190765
*
1
1
1
1
~]
-]
*— -]
* —
0[
]
*
0 [
[—-*—]
0
0 [-*-]
-]
[—0*---]
[—0*-—]
0 [-*-]
0 [-*]
0 [-*]
[*]
0
1
1
1
1
*
1
1
31
7
0
-0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
-0
-0
-0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
min
-0.906125548
r—1
10
*Total
Wcing
Rosen
Christmann
Gonzalez
Gilligan
Athey
Gonzalez
White
Dorn
Koch
Athey
Porter
Dorn
Dinkins
Olsen
McBride
*N<=50
High
Raymondo
Gratz
Ware
Myers
Jones
Marcoulides
Sterling
Marcoulides
*51<=N<=100
Aberson
Hollowell
Hurlburt
Lane
Lane
Lane
*N>100
CO
29
27
4
9
8
3
d
t—1
Author(s)
1
1
1
1
1
1
Figure C.8 Forest plots for effect sizes grouped by sample size.
124
REFERENCES
References marked with an asterisk indicate studies included in the
meta-analysis.
Aberson, C. L., Berger, D. E., Emerson, E. P., & Romero, V. L. (1997). WISE:
Web interface for statistics education. Behavior Research Methods,
Instruments, & Computers, 29, 217-221.
*Aberson, C. L., Berger, D. E., Healy, M. R., Kyle, D. J., & Romero, V. L.
(2000). Evaluation of an interactive tutorial for teaching the central limit
theorem. Teaching of Psychology, 21, 289-291.
Albert, J. H. (1993). Teaching Bayesian statistics using sampling methods and
Minitab. The American Statistician, ^7, 182-191.
American Psychological Association. (2001). Publication manual of the
American Psychological Association (5th ed.). Washington, DC: Author.
Anderson, D. R., Burnham, K. P., & Thompson, W. L. (2000). Null hypothesis
testing: Problems, prevalence, and an alternative. Journal of Wildlife
Management, 64, 912-923.
*Athey, S. (1987). A mentor system incorporating expertise to guide and teach
statistical decision making. (Doctoral dissertation. The University of
Arizona, 1987). Dissertation Abstracts International, 48, 0238.
Bajgier, S. M., Atkinson, M., &: Prybutok, V. P. (1989). Visual fits in the
teaching of regression concepts. The American Statistician, 43, 229-235.
Bangert-Drowns, R. L., Kulik, J. A., & Kulik, C.-L. C. (1985). Effectiveness of
computer-based education in secondary schools. Journal of Computer-Based
Instruction, 12, 59-68.
Barnet, B. D. (1999). A comparison of the effects of using interactive WWW
simulations versus hands-on activities on the conceptual understanding and
attitudes of introductory statistics students. (Doctoral dissertation, Iowa
State University, 1999). Dissertation Abstracts International, 60, 3940.
Bartz, A. E. (2001). Computer and software use in teaching the beginning
statistics course. Teaching of Psychology, 28, 147-149.
125
Bayrakta, S. (2000). A meta-analysis on the effectiveness of computer-assisted
instruction in science education. (Doctoral dissertation, Ohio University,
1990) Dissertation Abstracts International, 61, 2570.
Becky, B. J. (1996). A look at the literature (and other resources) on teaching
statistics. Journal of Educational and Behavioral Statistics, 21, 71-90.
Begg, C. B. (1994). Publication bias. In H. Cooper & L. V. Hedges (Eds.), The
handbook of research synthesis (pp. 400-409). New York: Russell Sage
Foundation.
Beins, B. C. (1989). A BASIC program for generating integer means and
variances. Teaching of Psychology, 16, 230-231.
Ben-Zvi, D. (2000). Toward understanding the role of technological tools in
statistical learning. Mathematical Thinking & Learning, 2, 127-155.
Birge, R. T. (1932). The calculation of errors by the method of least squares.
Physical Review, 40, 207-227.
Bisgarrd, S. (1991). Teaching statistics to engineers. The American Statistician,
45, 274-283.
Bower, G. H., & Hilgard, E. R. (1981). Theories of learning. Englewood Cliffs,
NJ: Prentice Hall.
Bradley, D. R., Hemstreet, R. L., & Ziegenhagen, S. T. (1992). A simulation
laboratory for statistics. Behavior Research Methods, Instruments, &
Computers, 24, 190-204.
Bradstreet, T. E. (1996). Teaching introductory statistics courses so that
nonstatisticians experience statistical reasoning. The American Statistician,
50, 69-78.
Briggs, N. E., & Sheu, C. F. (1998). Using Java in introductory statistics.
Behavior Research Methods, Instruments, & Computers, 30, 246-249.
Britt, M. A., Sellinger, J., & Stillerman, L. M. (2002). A review of ESTAT: An
innovative program for teaching statistics. Teaching of Psychology, 29,
73-75.
Bureau of Labor Statistics (2002). Occupational outlook handbook, 2002-2003.
Burns, P. K., & Bozeman, W. C. (1981). Computer-assisted instruction and
mathematics achievement: Is there a relationship? Educational Technology,
21, 32-39.
126
Bushman, B. J. (1994). Vote-counting procedures in meta-analysis. In H. Cooper
& L. V. Hedges (Eds.), The handbook of research synthesis (pp. 193-213).
New York: Russell Sage Foundation.
Butler, D. L., &: Eamon, D. B. (1985). An evaluation of statistical software for
research and instruction. Behavior Research Methods, Instruments, &
Computers, 17, 352-358.
Butler, D. L., & Neudecker, W. (1989). A comparison of inexpensive statistical
packages for microcomputers running MS-DOS. Behavior Research
Methods, Instruments, & Computers, 21, 113-120.
Cake, L., & Hostetter, R. C. (1986). DATAGEN: A BASIC program for
generating and analyzing data for use in statistics courses. Teaching of
Psychology, 13, 210-212.
Carpenter, E. H. (1993). Statistics and research methodology: Authoring,
multimedia, and automation of social science research. Social Science
Computer Review, 11, 500-514.
Castellan, N. J. (1982). Computers in psychology: A survey of instructional
applications. Behavior Research Methods and Instrumentation, 14, 198-202.
Cerny, B., & Kaiser, H. F. (1978). Computer program for the canonical analysis
of a contingency table. Educational & Psychological Measurement, 38, 835.
Chadwick, D. K. H. (1997). Computer-assisted instruction in secondary
mathematics classroom: A meta-analysis. (Doctoral dissertation. Drake
University, 1997). Dissertation Abstracts International, 58, 3478.
Christmann, E. P. (1995). A meta-analysis of effect of computer-assisted
instruction on the academic achievement of students in grades 6 through 12:
A comparison of urban, suburban, and rural educational settings
(sixth-grade, 12-grade, urban education, rural education. (Doctoral
dissertation. Old Dominion University, 1995). Dissertation Abstracts
International, 56, 3089.
*Christmann, E. P., &: Badgett, J. L. (1997). Microcomputer-based
computer-assisted instruction within differing subject areas: A statistical
deduction. Journal of Educational Computing Research, 16, 281-296.
Christmann, E. P., & Badgett, J. L. (1999). The comparative effectiveness of
various microcomputer-based software packages on statistical achievement.
Computers in the Schools, 16, 209-220.
127
Christmann, E., & Badgett, J. L. (2000). The comparative effectiveness of CAI
on collegiate academic performance. Journal of Computing in Higher
Education, 11, 91-103.
Cobb, G. (1992). Teaching statistics. In L. A. Steen (Ed.). Heeding the call for
changes: Suggestions for curricular action (pp. 3-43). Washington, D.C.:
Mathematical Association of American.
Cochran, W. G. (1937). Problems arising in the analysis of a series of similar
experiments. Journal of the Royal Statistical Society, 4 (Suppl.), 102-118.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale,
NJ: Erlbaum.
Cohen, J. (1994). The earth is round {p < .05). American Psychologist, 49,
997-1003.
Cohen, P. A., & Dacanay, L. S. (1991). Computer-based instruction and health
professions education: A meta-analysis of outcomes. Evaluation and the
Health Professions, 15, 259-281.
Collis, B. (1983). Teaching descriptive and inferential statistics using a
classroom microcomputer. Mathematics Teacher, 76, 318-322.
Conard, E. H., & Lutz, J. G. (1979). APRIORI: A FORTRAN IV computer
program to select the most powerful a prior comparison methods in an
analysis of variance. Educational & Psychological Measurement, 39,
689-691.
Cooley, W. W. (1969). Computer-assisted instruction in statistics. Paper
presented at the conference on Statistical Computation. Madison:
University of Wisconsin. (ERIC Document Reproduction Service
No. ED035249)
Cooper, H. M. (1979). Statistically combining independent studies: A
meta-analysis of sex differences in conformity research. Journal of
Personality and Social Psychology, 37, 131-146.
Cooper, H., & Hedges, L. V. (Eds.) (1994) The handbook of research synthesis.
New York: Russell Sage Foundation.
Cooper, H., & Hedges, L. V. (1994). Research synthesis as a scientific enterprise.
In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis
(pp. 3-14). New York: Russell Sage Foundation.
128
Couch, J. v., & Stoloff, M. L. (1989). A national survey of microcomputer use
by academic psychologists. Teaching of Psychology, 16, 145-147.
Cruz, R. F., & Sabers, D. L. (1996). All effect sizes are not created equally:
Response to Ritter and Low (1996). Unpublished manuscript.
Dambolena, I. G. (1986). Using simulations in statistics courses. Collegiate
Microcomputer, 4, 339-343.
Derry, S. J., Levin, J. R., & Schauble, L. (1995). Stimulating statistical thinking
through situated simulations. Teaching of Psychology, 22, 51-57.
Derry, S. J., Levin, J. R., Osana, H. P., & Jones, M. S., (1998). Developing
middle-school students' statistical reasoning abilities through simulation
gaming. In S. P. Lajoie (Ed.), Reflections on statistics: Learning, teaching,
and assessment in grades K-12. Studies in mathematical thinking and
learning (175-195). Mahwah, NJ: Erlbaum.
*Dinkins, P. (1985). Development of a computer-assisted instruction courseware
package in statistics and a comparative analysis of three management
strategies. (Doctoral dissertation. Louisiana State University, 1985).
Dissertation Abstracts International, ^7, 800.
Dixon-Krauss, L. (1996). Vygotsky in the classroom: Mediated literacy
instruction and assessment. White Plains, NY: Longman.
Dokter, C., & Heimann, L. (1999). A web site as a tool for learning statistics.
Computers in the Schools, 16, 221-229.
*Dorn, M. J. (1993). The effect of an interactive, problem-based HyperCard
modular instruction on statistical reasoning. (Doctoral dissertation.
Southern Illinois University at Carbondale, 1993). Dissertation Abstracts
International, 55, 2770.
Duchastel, P. C. (1974). Computer applications and instructional innovation: A
case study in the teaching of statistics. International Journal of
Mathematical Education in Science & Technology, 5, 713-716.
Eamon, D. B. (1992). Data generation and analysis using spreadsheets.
Behavioral Research Methods, Instruments, & Computers, 24, 174-179.
Earley, M. A. (2001). Improving statistics education through simulations: The
case of the sampling distribution. Paper presented at the Annual Meeting of
the Mid-Western Educational Research Association, Chicago, IL. (ERIC
Document Reproduction Service No. ED458282)
129
Edgar, S. M. (1973). Teaching statistics; while simultaneously saving time, chalk,
etc... Paper presented at the conference on Computers in the
Undergraduate Curricula, Claremont, CA. (ERIC Document Reproduction
Service No. ED079993)
Egger, M., Smith, G. D., & Altma, D. G. (2001). Systematic reviews in health
care: Meta-analysis in context (2nd ed.). London: BMJ books.
Elmore, P. B., & Rotou, 0. (2001). A primer on basic effect size concepts. Paper
presented at the Annual Meeting of the American Educational Research
Association, Seattle, WA. (ERIC Document Reproduction Service
No. ED453260)
Emond, W. J. (1982). Some benefits of micro-computers in teaching statistics.
Computers & Education, 6, 51-54.
Erickson, M. L., & Jacobson, R. B. (1973). On computer applications and
statistics in sociology: Toward the passing away of an antiquated
technology. Teaching Sociology, 1, 84-102.
Evans, G. E., & Newman, W. A. (1988). A comparison of SPSS PC+, SAS PC,
and BMDP PC. Collegiate Microcomputer, 6, 97-106.
Eysenck, H. J. (1952). The effects of psychotherapy: An evaluation. Journal of
Consulting Psychology, 16, 319-324.
Fisher, R. A. (1932). Statistical methods for research workers (4th ed.).
London: Oliver and Boyd.
Fletcher, J. D. (1990). Effectiveness and cost of interactive videodisc instruction
in defense training and education (IDA Paper P-2372). Alexandria, VA:
Institute for Defense Analyses.
Fletcher-Flinn, C. M., & Gravatt, B. (1995). The efficacy of computer-assisted
instruction (CAI): A meta-analysis. Journal of Educational Computing
Research, 12, 219-242.
Friel, S. N., Corwin, R. B., & Rowan, T. E. (1990). The statistics standards in
K-8 mathematics (Implementing the standards). Arithmetic Teacher, 38,
35-40.
Furtuck, L. (1981). The TREE system as a teaching aid in statistics, modeling
and business courses. Computers and Education, 5, 31-36.
130
Gagne, R. (1985). The conditions of learning and theory of instruction (4th ed.).
New York: Holt, Rinehart and Winston.
Gagne, R., & Briggs, L. J. (1979). Principles of instructional design (2nd ed.).
New York: Holt, Rinehart and Winston.
Gagne, R., Wager, W., & Rojas, A. (1981). Planning and authoring
computer-assisted instruction lessons. Educational Technology, 11, 17-21.
Garfield, J. (1995). How students learn statistics. International Statistical
Review, 63, 25-34.
Garfield, J., k Ahlgren, A. (1994). Student reactions to learning about
probability and statistics: Evaluating the quantitative literacy project.
School Science and Mathematics, 94 , 89-97.
*Gilligan, W. P. (1990). The use of a computer statistical package in teaching
a unit of descriptive statistics. (Doctoral dissertation, Boston University,
1990). Dissertation Abstracts International, 51, 2302.
Glass, G. V. (1976). Primary, secondary, and meta-analysis. Educational
Researcher, 5, 3-8.
Glass, G. V. (1978). Integrating findings: The meta-analysis of research. In
L. L. Shulman, Review of research in education (pp. 351-379). Itasca, IL:
F. E. Peacock.
Glass, G. v., McGaw, B., & Smith, M. L. (1981). Meta-analysis in social
research. Beverly Hills, CA: Sage.
Glass, G. v., & Smith, M. L. (1979). Meta-analysis of research on the
relationship of class size and achievement. Educational Evaluation and
Policy Analysis, 1, 2-16.
Gleser, L. J., & Olkin, I. (1994). Stochastically dependent effect sizes. In
H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis
(pp. 339-355). New York: Russell Sage Foundation.
*Gonzalez, G. M., & Birch, M. A. (2000). Evaluating the instructional efficacy of
computer-mediated interactive multimedia: Comparing three elementary
statistics tutorial modules. Journal of Educational Computing Research,
22, 411-430.
Goodman, T. (1986). Using the microcomputer to teach statistics. Mathematics
Teacher, 79, 210^215.
131
Gordon, F. S., & Gordon, S. P. (1989). Computer graphics simulation of
sampling distributions. Collegiate Microcomputer, 7, 185-189.
*Gratz, Z. S., Volpe, G. D., & Kind, B. M. (1993). Attitudes and achievement in
introductory psychological statistics classes: Traditional versus
computer-supported instruction. Proceedings of the Annual Conference on
Undergraduate Teaching of Psychology, Ellenville, New York. (ERIC
Document Reproduction Service No. ED365405)
Gredler, M. E. (2001). Learning and instruction: Theory into practice (4th ed.).
Upper Saddle River, NJ: Merrill/Prentice Hall.
Grubb, R. E., & Selfridge, L. D. (1964). Computer tutoring in statistics.
Computers and Automation, March, 20-26.
Hahn, G. J. (1985). More intelligent statistical software and statistical expert
systems; Future directions. The American Statistician, 39, 1-16.
Hall, J. A., Tickle-Degnen, L., Rosenthal, R., & Hosteller, F. (1994). Hypotheses
and problems in research synthesis. In H. Cooper & L. V. Hedges (Eds.),
The handbook of research synthesis (pp. 17-28). New York: Russell Sage
Foundation.
Hartley, S. (1978). Meta-analysis of the effects of individually paced instruction
in mathematics. (Doctoral dissertation. University of Colorado, 1978).
Dissertation Abstracts International, 38, 4003.
Hassebrock, F., & Snyder, R. (1997). Applications of a computer algebra system
for teaching bivariate relationships in statistics courses. Behavior Research
Methods, Instruments, & Computers, 29, 246-249.
Hatchette, V., Zivian, A. R., Zivian, M. T., & Okada, R. (1999). STAZ:
Interactive software for undergraduate statistics. Behavioral Research
Methods, Instruments, & Computers, 31, 19-23.
Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and
related estimators. Journal of Educational Statistics, 6, 107-128.
Hedges, L. V. (1982) Estimation of effect size from a series of independent
experiments. Psychological Bulletin, 92, 490-499.
Hedges, L. V. (1984). Advances in statistical methods for meta-analysis. New
Directions for Program Evaluation, 24, 25-42.
132
Hedges, L. V. (1986). Issues in meta-analysis. In E. Z. Zothkoph (Ed.), Review
of Research in Education, 13 (pp. 353-398). Washington, DC: American
Education Research Association.
Hedges, L. V. (1994). Fixed effects models. In H. Cooper & L. V. Hedges (Eds.),
The handbook of research synthesis (pp. 285-299). New York: Russell Sage
Foundation.
Hedges, L. V., & Olkin, I. (1980). Vote-counting methods in research synthesis.
Psychological Bulletin, 88, 359-369.
Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. New
York: Academic Press.
Hergenhahn, B. R. (1988). An introduction to theories of learning (3rd ed.).
Englewood Cliffs, NJ: Prentice Hall.
*High, R. V. (1998). Some variables in relation to students' choice of statistics
classes: Traditional versus computer-supported instruction. (ERIC
Document Reproduction Service No. ED427762)
Hilton, S. C., Grimshaw, S. D., & Anderson, G. T. (2001). Statistics in
preschool. The American Statistician, 55, 332-336.
Hogg, R. V. (1991). Statistical education: Improvements are badly needed. The
American Statistician 45, 342-343.
Hogg, R. V. (1992). Towards lean and lively course in statistics. In F. Gordon
& S. Gordon (Eds.), Statistics for the twenty-first century. MAA notes
No. 26 (pp. 3-13). Washington: Mathematical Association of America.
*Hollowell, K. A., & Duch, B. J. (1991). Functions and statistics with computers
at college level. Paper presented at the Annual Meeting of the American
Educational Research Association, Chicago, IL.
Holcomb, J. P., Jr., & Ruffer, R. L. (2000). Using a term-long project sequence
in introductory statistics. The American Statistician, 54, 49-53.
Howe, K. R., & Berv, J. (2000). Constructing constructivism, epistemological
and pedagogical. In D. C. Phillips (Ed.), Ninety-ninth yearbook of the
national society for the study of education: Part I. Constructivism in
education opinions and second opinions on controversial issues (pp. 19-40).
Chicago: The University of Chicago Press.
133
Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis: Correcting
error and bias in research findings. Newbury Park, CA: Sage.
Hunter, J. E., & Schmidt, F. L. (1997). Eight common but false objections to
the discontinuation of significance testing in the analysis of research data.
In L. L. Harlow, S. A. Mulaik, &: J. H. Steiger (Eds.), What if there were no
significance tests? (pp. 37-64). Mahwah, NJ: Erlbaum.
*Hurlburt, R. T. (2001). "Lectlets" deliver content at a distance: Introductory
statistics as a case study. Teaching of Psychology, 28, 15-20.
James, M. (1979). Mixed effects MANOVA using BMD12V. Educational &
Psychological Measurement, 39, 45-47.
Johnson, M. C. (1965). Note on the computer as an instructional tool in
statistics. The American Statistician, 19, 32, 36.
Jonassen, D. (1996). Computers in the classroom: Mindtools for critical thinking.
Englewood Cliffs, NJ: Merrill/Prentice Hall.
Jonassen, D., Peck, K., & Wilson, B. (1999). Learning with technology: A
constructivist perspective. Englewood Cliffs, NJ: Prentice Hall.
*Jones, E. R. (1999). A comparison of an all web-based class to a traditional
class. Paper presented at the meeting of Society for Information Technology
& Teacher Education International Conference, San Antonio, TX.
Kao, M. T., & Lehman, J. D. (1997). Scaffolding in a computer-based
constructivist environment for teaching statistics to college learners. Paper
presented at the Annual Meeting of the American Educational Research
Association, Chicago, IL. (ERIC Document Reproduction Service
No. ED408317)
Khalili, A., & Shashaani, L. (1994). The effectiveness of computer application: A
meta-analysis. Journal of Research on Computing in Education, 21, 48-61.
Khamis, H. J. (1991). Manual computations — A tool for reinforcing concepts
and techniques. The American Statistician, 45, 294-299.
Kidwell, P. A., & Ceruzzi, P. E. (1994). Landmarks in digital computing: A
Smithsonian pictorial history. Washington, DC: Smithsonian Institution
Press.
Kirk, R. (1996). Practical significance: A concept whose time has come.
Educational & Psychological Measurement, 56, 746-759.
134
Knief, L. M., & Cunningham, G. K. (1976). Effects of tutorial CBI on
performance in statistics. AEDS Journal, 9, 43-45.
*Koch, C., & Gobell, J. (1999). A hypertext-based tutorial with links to the
Web for teaching statistics and research methods. Behavioral Research
Methods, Instruments, & Computers, 31, 7-13.
Krieger, H., & James, P. L. (1992). Computer graphics and simulations in
teaching statistics. In F. Gordon & S. Gordon (Eds.), Statistics for the
twenty-first century. MAA notes No. 26 (pp. 167-188). Washington:
Mathematical Association of America.
Kuchler, J. M. (1998). The effectiveness of using computers to teach secondary
school (grades 6-12) mathematics: A meta-analysis. (Doctoral dissertation.
University of Massachusetts Lowell, 1999). Dissertation Abstracts
International, 59, 3764.
Kulik, C.-L. C., & Kulik, J. A. (1986). Effectiveness of computer-based
education in college. AEDS Journal, 19, 81-108.
Kulik, C.-L. C., Kulik, J. A., & Bangert-Drowns, R. L. (1985). Effectiveness of
computer-based education in elementary schools. Computer in Human
Behavior, 1, 59-74.
Kulik, C.-L. C., Kulik, J. A., & Shwalb, B. J. (1986). The effectiveness of
computer-based adult education: A meta-analysis. Journal of Educational
Computing Research, 2, 235-252.
Kulik, J. A. (1994). Meta-analytic studies of findings on computer-based
instruction. In E. L. Baker & H. F. O'Neil, Jr. (Eds.), Technology
assessment in education and training (pp. 9-33). Hillsdale, NJ: Erlbaum.
Lamb, A. (1992). Multimedia and the teaching-learning process in higher
education. In J. Albright and D. Graf (Eds.), Teaching in the information
age: The role of educational technology (pp. 33-42). San Francisco:
Jossey-Bass.
Lane, D. M. (1999). The Rice virtual lab in statistic. Behavioral Research
Methods, Instruments, & Computers, 31, 24-33.
Lane, D. M., & Tang, Z. (2000). Effectiveness of simulation training on transfer
of statistical concepts. Journal of Educational Computing Research, 22,
383-396.
135
*Lane, J. L., & Aleksic, M. (2002). Transforming elementary statistics to
enhance student learning. Paper presented at the Annual Meeting of the
American Educational Research Association, New Orleans, LA. (ERIC
Document Reproduction Service No. ED463332)
Langdon, J. S. (1989). The effects of the use of software on students'
understanding of selected statistical concepts. (Doctoral dissertation. The
American University, 1989). Dissertation Abstracts International, 50, 1971.
Lehman, R. S. (1972). The use of the unknown in teaching statistics. Paper
presented at the EPA convention, Boston, MA. (ERIC Document
Reproduction Service No. ED068581)
Lee, C. (1999). Computer-assisted approach for teaching statistical concepts.
Computers in the Schools, 16, 193-208.
Leon, R. V., & Parr, W. C. (2000). Use of course home pages in teaching
statistics. The American Statistician, 54, 44-48.
Lewis, S., & Clark, M. (2001). Forest plots: Trying to see the wood and the
trees. BMJ, 322, 1479-1480. Retrieved April 13, 2003, from
http://bmj.eom/cgi/content/full/322/7300/1479
Liao, Y. C. (1998). Effects of hypermedia versus traditional instruction on
students' achievement: A meta-analysis. Journal of Research on Computing
in Education, 30, 341-359.
Light, R. J. (1984). Six evaluation issues that synthesis can resolve better than
single studies. In W. H. Yeaton & P. M. Wortman (Eds.), New directions
for program evaluation: Issues in data synthesis, 26 (pp. 57-73). San
Francisco: Jossey-Bass.
Light, R. J., & Smith, P. V. (1971). Accumulating evidence: Procedures for
resolving contradictions among different research studies. Harvard
Educational Review, 41, 429-471.
Lipsey, M. W. (1994). Identifying potentially interesting variables and analysis
opportunities. In H. Cooper & L. V. Hedges (Eds.), The handbook of
research synthesis (pp. 111-123). New York: Russell Sage Foundation.
Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks,
CA: Sage.
136
Lockard, J. D. (1967). Computers in undergraduate education: Mathematics,
physics, statistics, and chemistry. College Park, MD: The University of
Maryland, Science Teaching Center.
Loftsgarrden, D. O., Rung, D. C. &: Watkins, A. E. (1997). Statistical abstract of
undergraduate programs in the mathematical science: Fall 1995 CBMS
Survey, MAA Notes, No. 2. Washington, DC: Mathematical Association of
America.
Loftsgarrden, D. O., & Watkins, A. E. (1998). Statistics teaching in colleges and
universities: Courses, instructors, and degrees in Fall 1995. The American
Statistician, 52, 308-314.
Long, K. E. (1998). Statistics in the high school mathematics curriculum: Is the
curriculum preparing students to be quantitatively literate? (Doctoral
dissertation. The American University, 1998). Dissertation Abstracts
International, 60, 87.
Maddux, C. D., & Cummings, R. (1999). Constructivism; Has the term outlived
its usefulness? Computers in the Schools, 16, 5-19.
Malloy, T. E., &: Jensen, G. C. (2001). Utah Virtual Lab: Java interactivity for
teaching science and statistics on line. Behavioral Research Methods,
Instruments, & Computers, 33, 282-286.
Marasinghe, M. G., & Meeker, W. Q. (1996). Using graphics and simulation to
teach statistical concepts. The American Statistician, 50, 342-351.
*Marcoulides, G. A. (1990). Improving learning performance with computer
based programs. Journal of Computing Research, 6, 147-155.
Matthews, M. R. (2000). Appraising constructivism in science and mathematics
education. In D. C. Phillips (Ed.), Ninety-ninth yearbook of the national
society for the study of education: Pari I. Constructivism in education
opinions and second opinions on controversial issues (pp. 161-191).
Chicago: The University of Chicago Press.
Mausner, B., Wolff, E. F., Evans, R. W., DeBoer, M. M., Gulkus, S. P.,
D'Amore, A., et al. (1983). A program of computer assisted instruction for
a personalized instructional course in statistics. Teaching of Psychology, 10,
195-200.
Maxwell, S. E., & Delaney, H. D. (1990). Designing experiments and analyzing
data: A model comparison perspective. Belmont, CA: Wadsworth.
137
*McBride, A. B. (1996). Creating a critical thinking learning environment:
Teaching statistics to social science undergraduates. PS: Political Science &
Politics, 29, 517-521.
McCarty, L. P., & Schwandt, T. A. (2000). Seductive illusions: Von Glasersfeld
and Gergen on epistemology and education. In D. C. Phillips (Ed.),
Ninety-ninth yearbook of the national society for the study of education:
Part I. Constructivism in education opinions and second opinions on
controversial issues (pp. 41-85). Chicago: The University of Chicago Press.
Mead, R. (1974). The use of computer simulation games in the teaching of
elementary statistics to agriculturists. International Journal of
Mathematical Education in Science & Technology, 5, 705-712.
Milligan, G. W. (1979). A computer program for calculating power of the
chi-square test. Educational & Psychological Measurement, 39, 681-684.
Mills, J. D. (2002). Using computer simulation methods to teach statistics: A
review of the literature. Journal of Statistics Education, 10. Retrieved
November 19, 2002, from
http://www.amstat.org/publications/jse/vlOnl/mills.html
Mitchell, M. L., & Jolley, J. M. (1999). The Correlator: A self-guided tutorial.
Teaching of Psychology, 26, 298-299.
Mittag, K. C. (1993). A Delphi study to determine standards for essential topics
and suggested instructional approaches for an introductory
non-calculus-based college-level statistics course. (Doctoral dissertation,
Texas A&M University, 1993). Dissertation Abstracts International, 54,
2933.
Moore, C. N. (1974). Computer-assisted laboratory experiments of teaching
business and economic statistics. International Journal of Mathematical
Education in Science & Technology, 5, 713-716.
Moore, D. S. (1993). The place of video in new styles of teaching and learning
statistics. The American Statistician, 47, 172-176.
Moore, D. S. (1997). New pedagogy and new content: The case of statistics.
International Statistical Review, 65, 123-165.
138
*Myers, K. N. (1989). An exploratory study of the effectiveness of computer
graphics and simulations in a computer-student interactive environment in
illustrating random sampling and the central limit theorem. (Doctoral
dissertation, The Florida State University, 1989). Dissertation Abstracts
International, 51, 441.
National Center for Education Statistics. Retrieved November 22, 2002, from
http://nces.ed.gov
National Council of Teachers of Mathematics. (2000). Principles and standards
for school mathematics. Reston, VA: National Council of Teachers of
Mathematics.
Newby, T. J. (1996). Instructional technology for teaching and learning:
Designing instruction, integrating computers, and using media. Englewood
Cliffs, NJ; Merrill.
Newmark, J. (1996). Statistics and probability in modern life. Fort Worth:
Saunders College Publishing.
Niemiec, R. P., & Walberg, H. J. (1985). Computers and achievement in the
elementary schools. Journal of Educational Computing Research, 1,
435-440.
O'Keeffe, L., & Klagge, J. (1986). Statistical packages for the IBM PC family.
New York: McGraw-Hill.
Olkin, I. (1990). History and goals. In K. W. Wachter & M. L. Straf (Eds.). The
future of meta-analysis (pp. 3-10). New York: Russell Sage Foundation.
*01son, C. R., & Bozeman, W. C. (1988). Decision support systems:
Applications in statistics and hypothesis testing. Journal of Research on
Computing in Education, 20, 206-212.
Orwin, R. G. (1983). A fail-safe N for effect size in meta-analysis. Journal of
Educational Statistics, 8, 157-159.
Ouyang, R. (1993). A meta-analysis: Effectiveness of computer-assisted
instruction at the level of elementary education (K-6). (Doctoral
dissertation. Indiana University of Pennsylvania, 1993). Dissertation
Abstracts International, 54 , 0421.
Papert, S. (1980). Mindstorms: Children, computers, and powerful ideas. New
York: Basic Books.
139
Pearson, K. (1933). On a method of determining whether a sample of size n
supposed to have been drawn from a parent population having a known
probability integral has probably been drawn at random. Biometrika, 25,
379-410.
Perry, M., & Kader, G. (1995). Using simulation to study estimation.
Mathematics and Computer Education, 29, 53-64.
Phillips, D. C. (2000). An opinionated account of the constructivist landscape.
In D. C. Phillips (Ed.), Ninety-ninth yearbook of the national society for the
study of education: Part /. Constructivism in education opinions and second
opinions on controversial issues (pp. 1-16). Chicago: The University of
Chicago Press.
Pollane, L. P., & Schnittjer, C. J. (1977). The relative performance of five
computer program packages which perform factorial-univariate analysis of
covariance. Educational & Psychological Measurement, 37, 227-231.
*Porter, T. S., & Riley, T. M. (1996). The effectiveness of computer exercises in
introductory statistics. Journal of Economic Education, 21, 291-299.
Pregibon, D., & Gale, W. A. (1984). REX: an expert system for regression
analysis. Computational Statistics Quarterly, 1, 242-248.
*Raymondo, J. C., & Garrett, J. R. (1998). Assessing the introduction of a
computer laboratory experience into a behavioral science statistics course.
Teaching Sociology, 26, 29-37.
Ritter, M., & Low, K. G. (1996). Effects of dance/movement therapy: A
meta-analysis. Arts in Psychotherapy, 23, 249-260.
Roblyer, M. D. (1988). The effectiveness of microcomputers in education: A
review of the research from 1980-1987. Technological Horizons in
Educational Journal, 16, 85-89.
Roblyer, M. D., & Edwards, J. (2000). Integrating educational technology
into teaching (2nd ed.). Upper Saddle River, NJ: Merrill/Prentice Hall.
Rogers, R. L. (1987). A microcomputer-based statistics course with
individualized assignment. Teaching of Psychology, 14, 109-111.
Romero, V. L., Berger, D. E., Healy, M. R., & Aberson, C. L. (2000). Using
cognitive learning theory to design effective on-line statistics tutorials.
Behavior Research Methods, Instruments, & Computers, 32, 246-249.
140
*Rosen, E., Feeney, B., & Petty, L. C. (1994). An introductory statistics class
and examination using SPSS/PC. Behavior Research Methods, Instruments,
& computers, 26, 242-244.
Rosenthal, R. (1984). Meta-analytic procedures for social research. Beverly
Hills, CA: Sage.
Rosenthal, R. (1991). Meta-analytic procedures for social research (Rev. ed.).
Newbury Park, CA; Sage Publication.
Rosenthal, R. (1994). Parametric measures of effect size. In H. Cooper &
L. V. Hedges (Eds.), The handbook of research synthesis (pp. 231-281).
New York: Russell Sage Foundation.
Rosenthal, R., & Rubin, D. B. (1978). Interpersonal expectancy effects: The first
345 studies. Behavioral and Brain Science, 3, 377-386.
Ryan, T. A., Joiner, B., & Ryan, B. (1976). Minitab student handbook. North
Scituate, MA: Duxbury Press.
Sandals, L. H., & Pyryt, M. C. (1992). New directions for teaching research
methods and statistics: The development of a computer-based expert system.
Paper presented at the Annual Meeting of the American Education
Research Association, San Francisco, CA. (ERIC Document Reproduction
Service No. ED349960)
Scalzo, F., & Hughes, R. (1976). Integration of prepackaged computer programs
into an undergraduate introductory statistics course. Journal of
Computer-Based Instruction, 2, 73-79.
Schacter, J., & Fagnano, C. (1999). Does computer technology improve student
learning and achievement? How, when, and under what condition? Journal
of Educational Computing Research, 20, 329-343.
Scheaffer, R. L. (1990). Toward a more quantitatively literate citizenry. The
American Statistician, 44-1 2-3.
Scheaffer, R. L. (2001). In a world of data, Statisticians count. Retrieved
November 10, 2002, from
http://www.amstat.org/publications/arastat_news/2001/pres09.html
Schmidt, F. (1996). Statistical significance testing and cumulative knowledge in
psychology: Implications for training of researchers. Psychological
Methods, 1, 115-129.
141
Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the
problem of validity generalization. Journal of Applied Psychology, 62,
529-540.
Schmidt, M., Weinstein, T., Niemiec, R., & Walberg, H. J. (1985).
Computer-assisted instruction with exceptional children: A meta-analysis of
research findings. Journal of Special Education, 19, 493-501.
Schnittjer, C. J. (1976). Canonical correlation program: A comparative analysis
of performance. Educational & Psychological Measurement, 36, 179-182.
Schram, C. M. (1996). A meta-analysis of gender differences in applied statistics
achievement. Journal of Educational and Behavioral Statistics, 21, 55-70.
Sedlmeier, P. (1997). BasicBayes: A tutor system for simple Bayesian inference.
Behavioral Research Methods, Instruments, & Computers, 29, 328-336.
Skavaril, R. V. (1974). Computer-based instruction of introductory statistics.
Journal of Computer-Based Instruction, 1, 32-40.
Skinner, B. F. (1989). Recent issues in the analysis of behavior. Upper Saddle
River, NJ: Merrill/Prentice Hall.
Smith, M. L., & Glass, G. V. (1977). Meta-analysis of psychotherapy outcome
studies. American Psychologist, 32, 752-760.
Snee, R. D. (1993). What's missing in statistical education? The American
Statistician, Jf.1, 149-154.
Snell, J. L., & Peterson, W. P. (1992). Does the computer help us understand
statistics? In F. Gordon & S. Gordon (Eds.), Statistics for the twenty-first
century. MAA notes No. 26 (pp. 167-188). Washington: Mathematical
Association of America.
Snyder, P., & Lawson, S. (1993). Effect size estimates. The Journal of
Experimental Education, 61, 334-349.
Snyder, R. R. (1977) Computer simulations in teaching psychology. Paper
presented at the annual meeting of the American Educational Research
Association, New York. (ERIC Document Reproduction Service
No. ED143313)
Steiger, J. H. (1979). MULTICORR: A computer program for fast, accurate,
small sample testing of correlational pattern hypotheses. Educational &
Psychological Measurement, 39, 677-680.
142
Steinberg, E. R. (1991). Computer-assisted instruction: A synthesis of theory,
practice, and technology. Hillsdale, NJ: Erlbaum.
Stemmer, P. M., & Berger, C. F. (1985). Microcomputer programs for
educational statistics: A review of popular programs. (ERIC Document
Reproduction Service No. ED269442)
Stephenson, W. R. (1990). A study of student reaction to the use of Minitab in
an introductory statistics course. The American Statistician, 44> 231-235.
*Sterling, J., & Gray, M. W. (1991). The effect of simulation software on
students' attitudes and understanding in introductory statistics. Journal of
Computers in Mathematics & Science Teaching, 10, 51-56.
Sterrett, A., & Karian, Z. A. (1978). A laboratory for an elementary statistics
course. American Mathematical Monthly, 85, 113-116.
Stockburger, D. W. (1982). Evaluation of three simulation exercises in an
introductory statistic course. Contemporary Educational Psychology, 7,
365-370.
Strube, M. (1991). Demonstrating the influence of sample size and reliability on
study outcome. Teaching of Psychology, 18, 113-115.
Strube, M. J., & Goldstein, M. D. (1995). A computer program that
demonstrates the difference between main effects and interactions. Teaching
of Psychology, 22, 207-208.
Susman, E. B. (1998). Cooperative learning: A review of factors that increase
the effectiveness of cooperative computer-based instruction. Journal of
Educational Computing Research, 18, 303-322.
Tanis, E. A. (1973). A computer laboratory for mathematical probability and
statistics. Paper presented at the conference on Computers in the
Undergraduate Curricula, Claremont, CA. (ERIC Document Reproduction
Service No. ED079985)
Thomas, D. B. (1971). STATSIM: Exercises in statistics. Tallahassee: Florida
State University. Computer-Assisted Instruction Center. (ERIC
Document Reproduction Service No. ED055440)
Thompson, B. (1994). Guidelines for authors. Educational & Psychological
Measurement, 54, 837-847.
143
Thompson, B. (1999). If statistical significance tests are broken/misused, what
practices should supplement or replace them? Theory & Psychology, 9,
165-181.
Thompson, B. (2001). Significance, efi'ect sizes, stepwise methods, and other
issues: Strong arguments move the field. The Journal of Experimental
Education, 70, 80-93.
Thompson, B., & Frankiewicz, R. G. (1979). CANON: A computer program
which produces canonical structure and index coefficients. Educational &
Psychological Measurement, 39, 219-222.
Tippett, L. H. C. (1931). The methods of statistics. London: Williams &:
Norgate.
Tubb, G. W. (1977). Current use of computer in the teaching of statistics. Paper
presented at the Computer Science and Statistics Annual Symposium,
Gaitehersbury, MD. (ERIC Document Reproduction Service
No. ED141109)
Varnhagen, C. K., & Zumbo, B. D. (1990). CAI as an adjunct to teaching
introductory statistics: Affecting mediates learning. Journal of Educational
Computing Research, 6, 29-40.
Velleman, P. F., & Moore, D. S. (1996). Multimedia for teaching statistics:
Promises and pitfalls. The American Statistician, 50, 217-225.
Vogel, D., & Klassen, J. (2001). Technology-supported learning: Status, issues
and trends. Journal of Computer Assisted Learning, 17, 104-114.
Walker, H. M. (1929). Studies in the history of statistical method, with special
reference to certain educational problems. Baltimore: Williams & Wilkins.
Walsh, J. F. (1993). Crafting questionnaire-style data: An SAS implementation.
Teaching of Psychology, 20, 188-190.
Walsh, J. F. (1994). One-way between subjects design: Simulated data and
analysis using SAS. Teaching of Psychology, 21, 53-55.
Wang, M. C., & Bushman, B. J. (1999). Integrating results through
meta-analytic review using SAS software. Gary, NC: SAS Institute.
*Wang, X. (1999). Effectiveness of statistical assignments in MPA education: An
experiment. Journal of Public Affairs Education, 4, 319-326.
144
*Ware, M. E., & Chastain, J. D. (1989). Computer-assisted statistical analysis:
A teaching innovation? Teaching of Psychology, 16, 222-227.
Warner, C. B., & Meehan, A. M. (2001). Microsoft Excel-super(TM) as a tool
for teaching basic statistics. Teaching of Psychology, 28, 295-298.
Watts, D. G. (1991). Why is introductory statistics difficult to learn and what
can we do to make it easier? The American Statistician, 45, 290-291.
Wegman, E. J. (1974). Computer graphics in undergraduate statistics.
International Journal of Mathematical Education in Science & Technology,
5, 15-23.
West, R. W., Ogden, R. T., & Rossini, A. J. (1998). Statistical tools on the
World Wide Web. The American Statistician, 52, 257-262.
White, A. P. (1995). An expert system for choosing statistical tests. The New
Review of Applied of Expert Systems, 1, 111-121.
*White, S. L. (1985). Teaching introductory statistics: Hand calculations
versus computer data analysis. Unpublished master's thesis, California
State University.
Wilkinson, L., & American Psychological Association Task Force on Statistical
Inference. (1999). Statistical methods in psychology journals: Guidelines
and explanations. American Psychologist, 54, 594-604.
Willett, J. B., Yamashita, J. J., & Anderson, R. D. (1983). A meta-analysis of
instructional systems applied in science teaching. Journal of Research in
Science Teaching, 20, 405-417.
Wimberley, R. C. (1978). Comparing package programs for factor analysis.
Educational & Psychological Measurement, 38, 143-145.
Wolf, F. M. (1986). Meta-Analysis: Quantitative methods for research synthesis.
Beverly Hills, CA: Sage.
Wurster, C. (2001). Computers: An illustrated history. New York: Taschen.
Yates, F., & Cochran, W. G. (1938). The analysis of groups of experiments.
Journal of Agricultural Science, 28, 556-580.
Download