Uploaded by A random guy

Randomization in Educational Research: A Critical Analysis

advertisement
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/233027753
The Use of Randomisation in Educational Research and Evaluation: A Critical
Analysis of Underlying Assumptions
Article in Evaluation & Research in Education · November 2008
DOI: 10.1080/09500790802307837
CITATIONS
READS
14
1,297
3 authors, including:
Randall S. Davies
Stephen C. Yanchar
Brigham Young University
Brigham Young University
61 PUBLICATIONS 2,850 CITATIONS
59 PUBLICATIONS 2,418 CITATIONS
SEE PROFILE
All content following this page was uploaded by Randall S. Davies on 04 December 2017.
The user has requested enhancement of the downloaded file.
SEE PROFILE
Randomization in Educational Research
Running head: RANDOMIZATION IN EDUCATIONAL RESEARCH
The Use of Randomization in Educational Research and Evaluation:
A critical analysis of underlying assumptions.
ABSTRACT
This paper considers the underlying assumptions related to the use of random assignment in
educational research and evaluation; more specifically, the ability of random assignment to
create similar comparison groups for the purpose of determining the effectiveness of educational
programs. In theory, randomly assigning individuals to comparison groups is considered to be
the best method available to maximize the likelihood that groups used in this type of research
will be similar; however, in educational research designed to identify proven best practices,
random assignment of individuals is rarely possible; other methods including random assignment
of intact units and non-random selection techniques are often used. Using a database simulation,
this study set out to determine the degree to which various selection methods might be effective
at creating comparable groups. Given the complex dynamics of the teaching and learning process
and the abundance of potentially confounding variables, it seems likely that comparison groups
will always be dissimilar to some degree. While random assignment of individuals performed as
expected when controlling for a single extraneous factor, the likelihood that comparison groups
created in this manner will differ is extremely likely when multiple confounding variable are
present. Based on the results of this study, random assignment of intact units is not an acceptable
alternative to random assignment of individuals. In fact, when using intact units, non-random
selection techniques were considerably more effective at controlling for potentially confounding
influences than randomly assigning existing classrooms to treatment and control groups.
Randomization in Educational Research 2
The use of randomization in educational research and evaluation:
A critical analysis of underlying assumptions.
When attempting to determine which instructional approaches work in education, the
use of an experimental design with random assignment is considered by many to be the strongest
method for determining the net impact of a specific treatment or intervention (Dennis, 1994;
Gay, 1996; Rossi, Freeman, & Lipsey, 1999; Shadish, Cook, & Campbell, 2002; Slavin, 2007;
USDOE, 2003; Winston, 1988). As a result, in a funding priority proposed by the U.S.
Department of Education (2003), the government stated that:
―Evaluation methods using an experimental design are best for determining project
effectiveness. Thus, the project should use an experimental design under which
participants—e.g., students, teachers, classrooms, or schools—are randomly
assigned to participate in the project activities being evaluated or to a control group
that does not participate in the project activities being evaluated.‖ (p.62446)
The reason for using random assignment in evaluation research is to eliminate the
confounding effect of selection bias that occurs when experimental treatment and control groups
are not similar (Slavin, 2007). Without randomization, the researcher would be obligated to
account for all threats to validity such as maturation, selection, experimental mortality, the
Hawthorne effect, and the presents of any history threats that might affect the observed outcome
(Campbell & Stanley, 1963; Gay, 1996; Shadish, Cook, & Campbell, 2002). While proponents
of experimental methods believe that random assignment reduces the chance that systematic
differences will exist between groups, there are those who question the feasibility and usefulness
of random assignment to accomplish this task (Leaf, 1993). It is possible that incorrect
assumptions about random assignment’s ability to control for all potential threats have resulted
in an unwarranted reliance on experiments using random assignment to revolutionize educational
practice (Schon, 1983; Teddlie, & Tashakkori, 2003; Yanchar, Gantt, & Clay, 2005).
Given the trend in philosophy of science which persuasively argues that all methods are
limited and so must be critically examined for their appropriateness, it is important to critically
analyze the underlying assumptions related to the use of randomization in educational research
(Burgess-Limerick, Abernathy, & Limerick, 1994). Either the use of randomization is best
practice or it is not; however, if a well implemented experiment using random assignment is the
best method for determining ―proven practices‖, and thus what teachers should use in their
Randomization in Educational Research 3
classrooms, we need to be sure that relying on random assignment to create comparably similar
experimental groups is a reasonable assumption.
Randomization Assumptions
Some researchers advocating the use of random controlled trials in education
acknowledge that it would be incorrect to imply or claim that randomization will adequately
control for all threats to validity, or that there is a single research method that can and should be
used in all cases (Abel & Koch, 1997; Feinstein, 1997; Rossi, Freeman, & Lipsey, 1999;
Shadish, Cook, & Campbell, 2002). Still, the assumption is made that experimental methods
utilizing random assignment are best for determining program effectiveness.
―Such examples of proven effectiveness are rare – in fact, nonexistent in many
areas of education – because randomized trials are uncommon in educational
research and evaluation. Meanwhile, the study designs that are often used
(including pre-post designs and most comparison-group designs) have been
shown in careful empirical investigations to produce erroneous conclusions in
many cases.‖ (Paige, 2002, p. 1)
The assertion that non-random trials are less reliable than randomized trials is based on
the belief that experiments using random assignment will provide accurate, strong evidence of
cause and effect relationships; thus, non-random trials results that do not replicate random trial
results must be problematic in some way (Shadish & Ragsdale, 1996; USDOE, 2003). Lack of
treatment and control group comparability is the main argument against the acceptance of nonrandom experimental studies (Abel & Koch, 1997). From a traditional methodological
standpoint, the use of comparison groups that are dissimilar increases the likelihood of a biased
or erroneous research result.
There are two aspects to randomization that must be considered: random selection and
random assignment. While random selection is seldom possible in experiments, random
assignment is usually possible at some level and is believed to be crucial. The theoretical basis
for the assumption that randomization will create equivalent comparison groups relies on the
assertion that when selecting a sample from a given population, if the sample is large enough and
drawn at random, the sample will likely resemble the population from which it is drawn (Krejcie
& Morgan, 1970). Thus two groups created using random selection are considered to be suitable
for the purpose of making generalizable comparisons. However, generalizations based on the
Randomization in Educational Research 4
results of an experiment are only valid for the population from which the individual participants
have been randomly sampled (Gay, 1996).
Unlike random sampling, random assignment of participants to treatment and control
groups is believed to solve the problem of selection bias, thus improving a study’s ability to
establish cause and effect relationships (Gay, 1996; Slavin, 2007). Random assignment of
individuals to treatment and control groups is believed to maximize the probability that
confounding extraneous variables will affect both comparison groups in a similar fashion. In
fact, random assignment is considered to be the only research technique able to control for
unknown extraneous variables as potentially confounding influences (Gay, 1996; Johnson, &
Christenson, 2004; Shadish, Cook, & Campbell, 2002; USDOE, 2003). In the case of
experimental research, if two groups are selected at random from a population and randomly
assigned to treatment and control groups, it is assumed that the only important difference
between them will be the intervention or educational practice being tested.
Based on the volume of literature supporting this supposition, it may be true. If done
properly, an experiment using randomly sampled and randomly assigned groups of individuals
might be helpful in identifying best practices. Unfortunately, several questions about the use of
randomization in education exist. And in many ways, some of the assumptions supporting the
use of these practices in educational research remain largely unsubstantiated; specifically the
claim that random assignment of individuals, classrooms and schools can be used systematically
to create similar comparison groups which are equally likely to be adversely affected by
confounding influences. A critical analysis of these underlying assumptions is needed, more
specifically, the need to think critically about the methods we use to determine program
effectiveness, or ―what works,‖ and whether they are reasonable (Burgess-Limerick, Abernathy,
& Limerick, 1994; Yanchar, Gantt, & Clay, 2005; Yanchar & Williams, 2006).
Control Challenged
The use of control and treatment group comparisons is fundamental to the experimental
method’s ability to determine program effectiveness. Ideally the two comparison groups will be
equal in all aspects except for the independent variable which is manipulated by amount, type,
absence or presence (Gay, 1996; Johnson, & Christenson, 2004; Shadish, Cook, & Campbell,
2002). The belief that groups will be similar if created using randomization is reliant on the
premise that certain attributes and aspects of human behavior regulate learning (e.g., learning
Randomization in Educational Research 5
ability, natural intelligence, interest, and student effort). These personality traits or attributes are
considered to vary by degree, are somewhat stable, and are normally distributed in the population
(Johnson & Christenson, 2004; Linn & Miller, 2005). In other words, students can be expected to
react predictably and somewhat consistently when introduced to an effective educational
program or intervention. In addition, the theory that randomization can control for extraneous,
potentially confounding variables relies on the belief that all the external influences that might
affect the learning of an individual each vary by degree, are equally likely to occur within the
population, and can be reduced to a single comprehensive factor or influence. Statistical theory
suggests that only 5% of the comparison groups created through randomization will differ
significantly on any single factor, when Ho is true and α = .05 (Rossi, Freeman, & Lipsey, 1999;
Shadish, Cook, & Campbell, 2002). However, it is possible that the true nature of teaching and
learning process cannot be understood in this way (Brentano, 1973). If the influences that affect
student learning are independent of one another (i.e., they exist and affect learning independently
of other influences) and cannot be consistently combined into a stable single predictive
influence, then we must consider additional evidence regarding the probability that the process of
randomization will produce suitable groups for experiential comparisons.
It is a commonly held belief that there are many factors that have the potential to affect
learning outcomes (Gay, 1996; Johnson, & Christenson, 2004; Rossi, Freeman, & Lipsey, 1999;
Shadish, Cook, & Campbell, 2002). If these influences are by nature independent of one another,
then it is likely that randomization will be far less effective at controlling for these variables than
might be expected. The mathematical probability that comparison groups will differ on one or
more factors is based on the number of independent factors involved and the probability that they
will occur. In fact, the probability that one of fourteen independent factors will differ
significantly in any given two-group comparison is approximately 51%.1 Given the existence of
numerous independent and potentially confounding variables in the teaching and learning
process, if one hundred or more variables can be expected to influence the result, the probability
of getting a significant group difference on at least one potentially confounding variable is
greater than 99%. It should be noted that while the formula mentioned here does not take account
of sample size, the samples used in this study were within the limits of randomization
1 P(A or B) = 1 – (1- α)k , K = number of potentially confounding independent factors; α = .05
Randomization in Educational Research 6
acceptability set out in Krejcie’s and Morgan’s (1970) paper, whose formula takes account of
differential populations sizes.
Both the theories regarding randomization and the mathematical probability calculations
governing the occurrence of independent events are well established; it may depend on how you
see the teaching and learning process – a predictable and controllable process where all the
factors that affect learning can be reduced to a single influence that is somewhat stable or, a
complex and dynamic process that varies constantly and depends on a number of factor
influences and interactions. If the teaching and learning process is more complex and dynamic
than predictable and controllable, the ability to creating equal comparison groups while
theoretically possible is highly unlikely; it is more likely that group comparability is something
that cannot be achieved in research dealing with human subjects and cannot be verified given the
plethora of influential extraneous variables to account for. In practice, many of the extraneous
variables that have the potential to affect achievement outcomes are typically not measured,
difficult to measure, or impossible to measure. Some examples include: peer influence, student
interest, motivation, health, eye sight, hearing, home life, personality, learning style preference,
motivation, effort expenditure, learner intent, academic potential, test anxiety, or the amount of
sleep students get (Gay, 1996; Johnson, & Christenson, 2004; Rossi, Freeman, & Lipsey, 1999;
Shadish, Cook, & Campbell, 2002). The best any researcher might hope for is that the groups
employed will be similar, to within a tolerable degree of difference, because total control of
extraneous variables is impossible.
Research Focus
In practice, it is impossible to verify whether randomization accomplishes the task of
creating suitable comparison groups. The influence of all potentially confounding variables are
not or cannot be measured, most notably, the unknown influences for which random assignment
is expected to control. In order to explore the claim that randomization is the best method for
creating comparison groups a database simulation was created. This simulation was used to
assess whether, and how well random assignment might be expected to accomplish the task of
ensuring comparison group equivalence or similarity. The use of random assignment (i.e.,
randomly assigning individuals and existing classrooms) to create comparison groups was
compared to quasi-experimental methods (i.e., group selection techniques that do not use random
assignment).
Randomization in Educational Research 7
Method
A database simulation was created consisting of over 10,000 fictitious students, placed in
252 schools, and 490 different classrooms. Class sizes of between 12 and 32 were determined
randomly. Demographic factors (i.e., race, gender, location, and socio economic status [SES])
were randomly or purposively assigned to each of these hypothetical students simulating the
distribution of the demographic characteristics for students in the state of Indiana. Some
individual attributes (e.g., student ability) are believed to be normally distributed in the
population independent of other demographic factors. Student ability was randomly assigned a
value between 80 and 130 based on a normal distribution within the population (i.e., more
students around the average than toward the extremes). Developmental factors which often affect
learning (e.g., student age) are not normally distributed but more systematically distributed
across the population. Student age was randomly assigned to fall within a range of one year and
two months. This range in ages was thought to be typical of most public school classrooms.
In order to explore how well random assignment controls for extraneous influences, each
student was randomly assigned a student effort factor, teacher/student interaction factor, friend
influence factor, home influence factor, and two other influence factors representing unknown
influences. It is important to note that these factors are really only labels and could represent any
number of different extraneous influences. Labels were attached simple for convenience. For the
purpose of this analysis it is supposed that these influences can and were all measured accurately,
that they remain somewhat constant, that these variables are accurately distributed in the
simulated population and are independent of the other factors under consideration. Each school
and classroom was also randomly assigned a value representing the influence they might have.
This was based on the assumption that a school environment, for example, will have a similar
affect on each of the students in the class or school. In practice, this may or may not be true. The
values assigned each of these factors ranged from positive 10 to negative 10, from having an
extremely positive influence to a very negative influence academically. For the purposes of this
analysis, these factors are assumed to exist as independent influences; that is for example, the
expertise of an individual teacher is independent of a child’s ability, his or her home life, SES,
school location, and other factors the teacher has no control over.
In all, fourteen factors were used in the simulation. They include: gender, SES, location,
race, student ability, age, school environment, classroom environment, student effort,
Randomization in Educational Research 8
teacher/student interaction, friend influence, home influence, and two other factors representing
unknown influences. Again, the extraneous influences mentioned are for the most part simply
labels that might represent a number of potentially confounding variables. The creation of a
simulated database is required as most such variables typically are not or cannot be measured in
practice. The purpose of the simulation is to determine whether random assignment can
reasonably be expected to control for extraneous variables that are not or cannot be measured.
Research Procedures
Comparability estimates. The fourteen factors recorded in the database were used to
estimate how equivalent comparison groups might be when random assignment and other
methods of comparison group creation are used. The similarity of the selected groups was
calculated based on the average effect of each numerical factor (i.e., one way ANOVA) and the
distribution similarity of categorical factors (i.e., chi-square analysis). Several trials of each
selection technique were conducted and compared including: random assignment of individual
students, random assignment of existing classrooms, randomly selecting schools with a control
and a treatment group in each school, and non-random assignment of schools and classrooms to
treatment and control groups based matching location, school, and classroom similarities.
Sample size matters. Statistically speaking, with a population of 10,000 students, a
sample size of 370 students is suggested (Krejcie & Morgan, 1970). With this many students
sampled randomly from the population, the sample groups are likely to resemble the population
95% of the time (based on an α = .05). In these trials, care was taken to ensure that the selection
techniques used resulted in groups with greater than 370 students in each group.
Sampling. When assigning individual students to comparison groups, each student was
randomly assigned using a random number generator to one of 25 groups. Each group contained
more than 370 subjects, closer to 400 on average. Given 25 groups, there are 300 possible
comparison group pairings. Assignment was done without replacement. Using a pairwise
analysis, each of the 300 pairs was tested for comparability on each of the factors separately and
collectively. Separately, the number of times a pairing differed significantly on a specific
variable was counted and the frequency recorded. Collectively, the number of times a specific
paring differed significantly on one or more factors was also counted and the frequency
recorded. The same was done for random assignment of intact classrooms. Each classroom was
Randomization in Educational Research 9
randomly assigned to one of 25 groups. Each of the 300 possible pairs was tested for
comparability.
In practice it is rare for researchers to use random assignment of individual students for a
variety of practical reasons; random assignment of existing classroom is far more common
(Johnson, & Christenson, 2004; Shadish, Cook, & Campbell, 2002). Another common research
practice is to use various non-random selection methods to create comparison groups. To test
some of these selection techniques twenty trials were completed for each of following selection
techniques: randomly selected groups of schools with a control and a treatment group in each
school; comparison groups purposively selected based on similar location, school, and classroom
attributes; and randomly selected pairs of existing classrooms.
Results
Random Assignment of Individual Students
In order to test the assumption that random assignment can ensure comparison group
similarity, each of the students in the database was randomly assigned to a group based on the
previously described method. Three trials were conducted with 300 comparisons in each trial.
The result is displayed in Table 1. Random assignment seems to control for individual factors as
predicted; in an infinite number of trials it could be expected that approximately 95% of the time
the two groups will be similar on a specific single factor. Notably, factors such as race may have
had more occurrences of significant difference due to the existence of small subgroups within the
population distribution. For example, the number of students of Asian descent are few in the
overall population relative to the number of Caucasian or African American students and thus are
more likely to be underrepresented in any given comparison group sample resulting in a
significant difference being identified.
Also as might be expected, given 14 different factors, the number of comparison groups
identified as problematic (i.e., significantly different on one or more factors than could be
expected by chance) was much larger than for any single factor. Given 14 factors, the probability
that one or more of the individual factors would differ significantly in any given pairing is
expected to be 51%; based on the results of this simulation, averaging the three trials
approximately 45% of the potential pairs were deemed problematic in terms of comparability. In
an infinite number of trials it is reasonable to expect that approximately 51% of the time any two
groups will differ significantly on one or more of the 14 factors just as predicted.
Randomization in Educational Research 10
Table 1
Rate of Significant Difference by Factor in Randomly Assigned Groups of Individuals
Separate Analysis of
Individual Factors
Gender
Social Economic Status
Location
Race
Age
Student Ability
School factor
Classroom factor
Student Effort
Student Teacher Interaction
Friend factor
Home factor
Unknown factor 1
Unknown factor 2
Trial 1*
Freq.
%
3
1.0
0
0.0
0
0.0
43
14.3
4
1.3
13
4.3
17
5.7
18
6.0
37
12.3
20
6.7
13
4.3
19
6.3
17
5.7
16
5.3
Trial 2
Freq.
%
0
0.0
0
0.0
1
0.3
17
5.7
22
7.3
10
3.3
17
5.7
5
1.7
24
8.0
34
11.3
29
9.7
5
1.7
10
3.3
8
2.7
Trial 3
Freq.
%
0
0.0
0
0.0
0
0.0
23
7.7
6
2.0
2
0.7
22
7.3
30
10.0
17
5.7
1
0.3
12
4.0
18
6.0
14
4.7
4
1.3
152
132
122
Average
%
0.3
0.0
0.1
9.2
3.6
2.8
5.1
5.9
8.7
6.1
6.0
4.7
4.6
3.1
Collective Analysis
One or more factors different
50.7
44.0
40.7
45.1
* 300 individual group comparisons considered in each trial
Looking at the data from Trial 1 as an example, the majority of the 14 individual factors
fell within the range of expected difference. Of the 300 possible pairs, only about 5% of the time
did a comparison group differ significantly on any single factor. However, in any given
comparison group pairing, while the groups may be adequately similar on one factor, other
aspects of that pairing were often significantly dissimilar. In several instances the groups were
similar on several factors but not on all the factors. In this example over half the 300 possible
comparison groups differed significantly on one or more of the 14 factors.
Random Assignment of Intact Classrooms
Because random assignment of individual students is rarely possible (Shadish, Cook, &
Campbell, 2002), many studies use existing classrooms or schools when creating comparison
groups. The assumption is that random assignment will work equally well for creating
comparison groups from intact units (e.g., existing classrooms or schools) as it does for a sample
Randomization in Educational Research 11
of individuals (U.S. Department of Education, 2003). To test this assumption, three trials were
conducted with 300 comparisons in each trial. Table 2 shows the results of these comparisons.
Table 2
Rate of Significant Difference by Factor in Randomly Assigned Groups of Intact Classrooms
Separate Analysis of
Individual Factors
Gender
Social Economic Status
Location
Race
Age
Student Ability
School factor
Classroom factor
Student Effort
Student/Teacher Interaction
Friend factor
Home factor
Unknown factor 1
Unknown factor 2
Trial 1
Freq.
%
6
2.0
104
34.7
247
82.3
64
21.3
30
10.0
55
18.3
202
67.3
215
71.7
5
1.7
163
54.3
42
14.0
6
2.0
24
8.0
5
1.7
Trial 2
Freq.
%
8
2.7
50
16.7
267
89.0
60
20.0
1
0.3
54
18.0
215
71.7
219
73.0
10
3.3
166
55.3
52
17.3
22
7.3
20
6.7
5
1.7
Trial 3
Freq.
%
2
0.7
31
10.3
253
84.3
69
23.0
5
1.7
81
27.0
207
69.0
211
70.3
16
5.3
172
57.3
34
11.3
36
12.0
7
2.3
14
4.7
298
298
299
Average
%
1.7
20.6
85.2
21.4
4.0
21.1
69.3
71.7
3.4
55.7
14.2
7.1
5.7
2.7
Collective Analysis
One or more factors different
99.3
99.3
99.7
99.4
* 300 individual group comparisons considered in each trial
Based on these results it is unlikely that randomly assigning existing classrooms to
treatment and control groups will create suitable comparison groups for individual analysis. With
the exception of factors like gender and age which are more likely to be similarly distributed
within intact units; group differences were evident for a considerable number of individual
factors. Difference in the distribution of these influences is likely magnified by the lack of
overall homogeneity in classrooms resulting in a considerable number of problematic cases. In
each of the trials, 99% of the comparison groups were found to be significantly different on one
or more of the fourteen factors used in this simulation. It seems that the conditions that govern
and control the effectiveness of randomization when using individuals do not hold true for
random assignment of intact units. The use of this technique to produce similar groups for
comparison purposes is extremely likely to produce unacceptable results. Clearly one cannot
sample schools or classrooms and then analyse the individuals within them the same as one
Randomization in Educational Research 12
might when using random assignment of individuals; if intact units are sampled then it is the
units that must be representative requiring a much larger sample than is typically used.
Randomly assigning intact units (e.g., classrooms or schools) to treatment and control groups
will likely produce comparison groups that are significantly different from one another on one or
more factors every time.
Non-random Comparison Group Selection
A common quasi-experimental practice some studies utilize when creating comparison
groups is to take two classrooms within the same school and assign them as a control and
treatment group pair. This technique is used as a control for the location and school influence. A
notable limitation of this technique is that it excludes schools with only one class at that specific
grade level. To overcome this limitations some studies use purposive or stratified matching
techniques to create comparison groups. It is believed that this method of group selection will
control for a few known factors that might reasonably be expected to affect the result. In this
case the selection was done based on location, school, and classroom similarities. The results of
20 group comparisons using these techniques are displayed in Table 3 along with 20 group
comparisons of randomly assigned intact units. Each of these comparisons involves the use of
existing classrooms.
Based on the results of these trials, the selection technique does seem to control for the
targeted extraneous variables. Both non-random techniques for comparison group selection
performed better than randomly assigning existing classrooms to treatment and control groups.
Far fewer individual factors were problematic when the research attempted to control for known
factors that might result in comparison group difference. In addition, the magnitude of the
differences in individual factors was greater when randomly assigning existing classes. When
using intact units, it seems more likely that you will produce suitable comparison groups using
non-random quasi-experimental selection techniques than random selection techniques.
Randomization in Educational Research 13
Table 3
Rate of Significant Difference by Factor in Groups of Intact Classrooms based on Selection Method
Separate Analysis of
Individual Factors
Gender
Social Economic Status
Location
Race
Age
Student Ability
School factor
Classroom factor
Student Effort
Student/Teacher Interaction
Friend factor
Home factor
Unknown factor 1
Unknown factor 2
In-school Pairings
Freq.
%
3
15.0
1
5.0
0
0.0
0
0.0
0
0.0
2
10.0
0
0.0
16
80.0
0
0.0
12
60.0
0
0.0
0
0.0
1
5.0
0
0.0
Purposive Pairings
Freq.
%
0
0.0
4
20.0
0
0.0
6
30.0
0
0.0
0
0.0
3
15.0
8
40.0
2
10.0
2
10.0
2
10.0
0
0.0
1
5.0
4
20.0
Random Pairings
Freq.
%
2
10.0
9
45.0
19
95.0
8
40.0
0
0.0
5
25.0
11
55.0
15
75.0
6
30.0
14
70.0
0
0.0
1
5.0
1
5.0
1
5.0
Collective Analysis
One or more factors different
18
90.0
17
85.0
20
100.0
* 20 individual group comparisons considered in each trial
Discussion and Conclusions
A fundamental basis for determining effective practices using experimental studies
depends on the researcher’s ability to create similar comparison groups. Most research textbooks
acknowledge that random assignment does not guarantee the creation of similar comparison
groups; the literature only suggests that the use of random assignment maximizes the probability
that systematic differences will not exist; and that if sufficient sample sizes are drawn the
likelihood that comparison groups will be significantly different is small (Gay, 1996; Rossi,
Freeman, & Lipsey, 1999; Shadish, Cook, & Campbell, 2002). However, a reliance on
randomization to produce equivalent comparison groups may be unwarranted.
Based on the results of this simulation, random assignment of individuals to treatment
and control groups may very well be the best method one might use to maximize the chances of
Randomization in Educational Research 14
comparison group similarity if you are primarily concerned with a single potentially confounding
influence; however, there appears to be an extremely high probability that random assignment
will not do an adequate job of controlling for multiple extraneous variables. Given the complex
nature of the teaching and learning process and the number of potentially confounding variables
that may affect the results of any experiment (DBRC, 2003; Patton, 2007), the chance that
comparison groups will be different in some way is much larger than the 5% prediction (when α
= .05). In fact, while the use of random assignment to assign individual students to control and
treatment groups did perform the best of any of the techniques simulated in this study, given the
fourteen factors used in this simulation, about 45% of the potential comparison groups were
found to be significantly different on one or more factors. Given the number of factors likely to
affect learning, the question is not whether randomization will produce equal groups, but
whether the differences that inevitably exist in randomly assigned comparison groups are severe
enough to affect the results.
While random assignment of individual students to treatment and control groups seems to
be the best method for selecting comparison groups, it is rarely possible (Shadish, Cook, &
Campbell, 2002). The realities of conducting randomized controlled trials make the use of this
technique impractical when attempting to determine the general effectiveness of educational
practices. Moreover, the suggestion that using random assignment of intact units will be an
acceptable alternative to randomly assigning individuals appears to be unfounded. Non-random
assignment techniques did considerably better than random assignment methods in this respect.
Given that random assignment of individuals, randomly sampled from the entire population is
impractical and often impossible; and that random assignment of intact groups is an inadequate
Randomization in Educational Research 15
alternative; reliance on random assignment to adequately control for multiple extraneous,
potentially confounding variables seems to be misguided, just as Leaf (1993) suggested.
We conclude from the results of these simulations that given the number of potentially
confounding variables that affect learning, random assignment of individuals to comparison
groups will not result in the creation of equivalent groups; this technique does however seem to
function as expected when controlling for individual influences. Furthermore, we conclude that
the practice of randomly assigning existing classrooms is not a reasonable alternative to
randomly assigning individuals. Based on the results of these simulations, the use of non-random
assignment with careful, systematic comparison group creation efforts that attempt to control for
known extraneous variables is likely to produce better comparison groups than randomly
assigning intact units to treatment and control groups. And while there is a chance that the
researcher may systematically, or unknowingly cause groups to be different, this seems to be an
inevitable outcome with or without random assignment. In real world applications of these
methods, we will simply never know how poorly random assignment functions as a control for
potentially confounding variables. Thus the assumption that random assignment adequately
controls for potentially confounding variables when attempting to determine program
effectiveness remains unsubstantiated, especially when assigning intact units.
One might reasonably conclude from this study that the assumption that proven program
effectiveness can be determined using randomized experiments is faulty, especially if intact units
are used. This type of experimentation provides solutions that are too simplistic given the
complexity of most educational situations (Patton, 2007). On the issue of comparability,
randomization likely will not produce adequately similar treatment and control groups. Results
obtained from evaluation research utilizing these methods will always be bound to a specific
Randomization in Educational Research 16
local context and an abundance of potentially confounding influences and interactions (DBRC,
2003). If assumptions about the use of randomization and this technique’s ability to create
comparable groups are fundamentally incorrect, then the results of these studies will always be
inconsistent and potentially misleading. This may be the reason why random controlled trials in
education to date have failed to revolutionize the effectiveness of the education process to any
great extent (Branson & Hirumi, 1994; DBRC, 2003). In practice, after promising practices have
been identified, repeated trials using quasi-experimental (i.e., non-random) comparison group
selection techniques and a wide variety of other qualitative and quantitative methods in many
settings and times by multiple researchers may be the only and most reasonable method for
determining ―what works‖ in education.
Randomization in Educational Research 17
REFERENCES
Abel, U., & Koch, A. (1997). The mythology of randomization. Conference proceedings of the
international conference of nonrandomized comparative clinical studies in Heidelberg,
April 10-11. Retrieved Sept.12, 2006 from http://www.symosion.com/nrccs/abel.htm
Branson, R. & Hirumi, A. (1994). Designing the future: The Florida Schoolyear 2000 Initiative.
In G. Kearkly & W. Lynch (Eds.) Educational technology: Leadership perspectives (pp.
91-112). Englewood Cliffs, NJ: Educational Technology Publications.
Burgess-Limerick, R., Abernathy, B., & Limerick, B. (1994). Identification of underlying
assumptions is an integral part of research: An example from motor control. Theory &
Psychology, 4, 139-146.
Brentano, F. (1973). Psychology from an empirical standpoint. New York: Humanities Press.
Campbell, D.T., & Stanley, J. (1963). Experimental and Quasi-experimental Designs for
Research on Teaching. Boston, MA: Houghton Mifflin.
Dennis, M. L. (1994). Ethical and practical randomized field experiments. In J. Wholey, H.
Hatry, & K. Newcomer (Eds.), Handbook of practical program evaluation. San
Francisco, CA: Jossey-Bass Publications.
Design-Based Research Collective. (2003). Design-based research: An emerging paradigm for
educational inquiry. Educational Researcher, 32(1), 5-8.
Feinstein, A. (1997). Problems of randomized trials. Conference proceedings of the international
conference of nonrandomized comparative clinical studies in Heidelberg, April 10-11.
Retrieved Sept.12, 2006 from http://www.symosion.com/nrccs/feinstein.htm
Gay, L. R. (1996). Educational research: Competencies for analysis and application (5th ed.).
Englewood Cliffs, NJ: Prentice-Hall.
Johnson, B., & Christensen, L. (2004) Educational research: Qualitative, quantitative, and
mixed methods approaches. (2nd ed.). Boston: Pearson Education.
Koch, S. (1999). Psychology in human context: Essays in dissidence and reconstruction.
Chicago: The University of Chicago Press.
Randomization in Educational Research 18
Krejcie, R.V., & Morgan, D.W. (1970). Determining sample size for research activities.
Educational and Psychological Measurement, 30, 607-610.
Leaf, R. C. (1993). Control, Volition, and the "Experimental Method." New Ideas in
Psychology, 11 (1), 3-33.
Linn, R. L. & Miller, M. D.(2005). Measurement and assessment in teaching (9th Ed.). Upper
Saddle River, NJ: Pearson Education, Inc.
Paige, R. (2002). Rigorous Evidence: The Key To Progress in Education? Lessons from
Medicine, Welfare, and Other Fields. Policy forum of the Council for Excellence in
Government. Retrieved November 11, 2006 from
http://www.excelgov.org/index.php?keyword=a4339246667652
Patton, M.Q. (2007, Nov). Facilitating Fast-paced Learning: Developmental evaluation for
complex emergent innovations. Conference presentation at the American Evaluation
Association, Baltimore, MA.
Rossi, P.H., Freeman, H.E. & Lipsey, M.W. (1999). Evaluation: A Systematic Approach, 6th Ed.
Newbury Park, CA: Sage.
Schon, D. (1983). The Reflective Practitioner: How Professionals Think in Action. New York,
NY: Basic Books.
Shadish, W., Cook, T., & Campbell, D. (2002). Experimental and quasi-experimental designs for
generalized casual inference. Boston, MA: Houghton Migglin Company.
Shadish, W. & Ragsdale, K. (1997). Random versus nonrandom assignment in controlled
experiments: Do you get the same answers? Journal of Consulting and Clinical
Psychology 64(6), p. 1290 – 1305.
Slavin, R. (2007). Educational Research in an age of accountability. Boston, MA: Pearson
Education Inc.
Teddlie, C., & Tashakkori, A. (2003). Major Issues and Controversies in the Use of Mixed
Methods in the Social and Behavioral Sciences. In Tashakkori and Teddlie (Eds.)
Handbook of mixed methods in social and behavioral research. Thousand Oaks, CA:
Sage.
Randomization in Educational Research 19
U.S. Department of Education (2003). Scientifically Based Evaluation Methods: Notice of
proposed priority. Federal Register. Vol. 68, No. 213, p. 62446. Retrieved Nov. 11,
2006 from http://www.ed.gov/legislation/FedRegister/proprule/2003-4/110403b.pdf
USDOE, Institute of Education Sciences, National Center for Education Evaluation and Regional
Assistance (2003). Identifying and implementing educational practices supported by
rigorous evidence: a user friendly guide. Retrieved November 11, 2006 from
http://www.ed.gov/rschstat/research/pubs/rigorousevid/rigorousevid.pdf
Winston, A. S. (1988). Cause and experiment in introductory psychology: An analysis of R. S.
Woodworth’s textbooks. Teaching of Psychology, 15, 79-83.
Yanchar, S. C., Gantt, E. E., & Clay, S. L. (2005). On the nature of a critical methodology.
Theory and Psychology, 15, 27-50.
Yanchar, S. C., & Williams, D. D. (2006). Reconsidering the compatibility thesis and
eclecticism: Five proposed guidelines for method use. Educational Researcher, 35 (9), 312.
View publication stats
Download