See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/233027753 The Use of Randomisation in Educational Research and Evaluation: A Critical Analysis of Underlying Assumptions Article in Evaluation & Research in Education · November 2008 DOI: 10.1080/09500790802307837 CITATIONS READS 14 1,297 3 authors, including: Randall S. Davies Stephen C. Yanchar Brigham Young University Brigham Young University 61 PUBLICATIONS 2,850 CITATIONS 59 PUBLICATIONS 2,418 CITATIONS SEE PROFILE All content following this page was uploaded by Randall S. Davies on 04 December 2017. The user has requested enhancement of the downloaded file. SEE PROFILE Randomization in Educational Research Running head: RANDOMIZATION IN EDUCATIONAL RESEARCH The Use of Randomization in Educational Research and Evaluation: A critical analysis of underlying assumptions. ABSTRACT This paper considers the underlying assumptions related to the use of random assignment in educational research and evaluation; more specifically, the ability of random assignment to create similar comparison groups for the purpose of determining the effectiveness of educational programs. In theory, randomly assigning individuals to comparison groups is considered to be the best method available to maximize the likelihood that groups used in this type of research will be similar; however, in educational research designed to identify proven best practices, random assignment of individuals is rarely possible; other methods including random assignment of intact units and non-random selection techniques are often used. Using a database simulation, this study set out to determine the degree to which various selection methods might be effective at creating comparable groups. Given the complex dynamics of the teaching and learning process and the abundance of potentially confounding variables, it seems likely that comparison groups will always be dissimilar to some degree. While random assignment of individuals performed as expected when controlling for a single extraneous factor, the likelihood that comparison groups created in this manner will differ is extremely likely when multiple confounding variable are present. Based on the results of this study, random assignment of intact units is not an acceptable alternative to random assignment of individuals. In fact, when using intact units, non-random selection techniques were considerably more effective at controlling for potentially confounding influences than randomly assigning existing classrooms to treatment and control groups. Randomization in Educational Research 2 The use of randomization in educational research and evaluation: A critical analysis of underlying assumptions. When attempting to determine which instructional approaches work in education, the use of an experimental design with random assignment is considered by many to be the strongest method for determining the net impact of a specific treatment or intervention (Dennis, 1994; Gay, 1996; Rossi, Freeman, & Lipsey, 1999; Shadish, Cook, & Campbell, 2002; Slavin, 2007; USDOE, 2003; Winston, 1988). As a result, in a funding priority proposed by the U.S. Department of Education (2003), the government stated that: ―Evaluation methods using an experimental design are best for determining project effectiveness. Thus, the project should use an experimental design under which participants—e.g., students, teachers, classrooms, or schools—are randomly assigned to participate in the project activities being evaluated or to a control group that does not participate in the project activities being evaluated.‖ (p.62446) The reason for using random assignment in evaluation research is to eliminate the confounding effect of selection bias that occurs when experimental treatment and control groups are not similar (Slavin, 2007). Without randomization, the researcher would be obligated to account for all threats to validity such as maturation, selection, experimental mortality, the Hawthorne effect, and the presents of any history threats that might affect the observed outcome (Campbell & Stanley, 1963; Gay, 1996; Shadish, Cook, & Campbell, 2002). While proponents of experimental methods believe that random assignment reduces the chance that systematic differences will exist between groups, there are those who question the feasibility and usefulness of random assignment to accomplish this task (Leaf, 1993). It is possible that incorrect assumptions about random assignment’s ability to control for all potential threats have resulted in an unwarranted reliance on experiments using random assignment to revolutionize educational practice (Schon, 1983; Teddlie, & Tashakkori, 2003; Yanchar, Gantt, & Clay, 2005). Given the trend in philosophy of science which persuasively argues that all methods are limited and so must be critically examined for their appropriateness, it is important to critically analyze the underlying assumptions related to the use of randomization in educational research (Burgess-Limerick, Abernathy, & Limerick, 1994). Either the use of randomization is best practice or it is not; however, if a well implemented experiment using random assignment is the best method for determining ―proven practices‖, and thus what teachers should use in their Randomization in Educational Research 3 classrooms, we need to be sure that relying on random assignment to create comparably similar experimental groups is a reasonable assumption. Randomization Assumptions Some researchers advocating the use of random controlled trials in education acknowledge that it would be incorrect to imply or claim that randomization will adequately control for all threats to validity, or that there is a single research method that can and should be used in all cases (Abel & Koch, 1997; Feinstein, 1997; Rossi, Freeman, & Lipsey, 1999; Shadish, Cook, & Campbell, 2002). Still, the assumption is made that experimental methods utilizing random assignment are best for determining program effectiveness. ―Such examples of proven effectiveness are rare – in fact, nonexistent in many areas of education – because randomized trials are uncommon in educational research and evaluation. Meanwhile, the study designs that are often used (including pre-post designs and most comparison-group designs) have been shown in careful empirical investigations to produce erroneous conclusions in many cases.‖ (Paige, 2002, p. 1) The assertion that non-random trials are less reliable than randomized trials is based on the belief that experiments using random assignment will provide accurate, strong evidence of cause and effect relationships; thus, non-random trials results that do not replicate random trial results must be problematic in some way (Shadish & Ragsdale, 1996; USDOE, 2003). Lack of treatment and control group comparability is the main argument against the acceptance of nonrandom experimental studies (Abel & Koch, 1997). From a traditional methodological standpoint, the use of comparison groups that are dissimilar increases the likelihood of a biased or erroneous research result. There are two aspects to randomization that must be considered: random selection and random assignment. While random selection is seldom possible in experiments, random assignment is usually possible at some level and is believed to be crucial. The theoretical basis for the assumption that randomization will create equivalent comparison groups relies on the assertion that when selecting a sample from a given population, if the sample is large enough and drawn at random, the sample will likely resemble the population from which it is drawn (Krejcie & Morgan, 1970). Thus two groups created using random selection are considered to be suitable for the purpose of making generalizable comparisons. However, generalizations based on the Randomization in Educational Research 4 results of an experiment are only valid for the population from which the individual participants have been randomly sampled (Gay, 1996). Unlike random sampling, random assignment of participants to treatment and control groups is believed to solve the problem of selection bias, thus improving a study’s ability to establish cause and effect relationships (Gay, 1996; Slavin, 2007). Random assignment of individuals to treatment and control groups is believed to maximize the probability that confounding extraneous variables will affect both comparison groups in a similar fashion. In fact, random assignment is considered to be the only research technique able to control for unknown extraneous variables as potentially confounding influences (Gay, 1996; Johnson, & Christenson, 2004; Shadish, Cook, & Campbell, 2002; USDOE, 2003). In the case of experimental research, if two groups are selected at random from a population and randomly assigned to treatment and control groups, it is assumed that the only important difference between them will be the intervention or educational practice being tested. Based on the volume of literature supporting this supposition, it may be true. If done properly, an experiment using randomly sampled and randomly assigned groups of individuals might be helpful in identifying best practices. Unfortunately, several questions about the use of randomization in education exist. And in many ways, some of the assumptions supporting the use of these practices in educational research remain largely unsubstantiated; specifically the claim that random assignment of individuals, classrooms and schools can be used systematically to create similar comparison groups which are equally likely to be adversely affected by confounding influences. A critical analysis of these underlying assumptions is needed, more specifically, the need to think critically about the methods we use to determine program effectiveness, or ―what works,‖ and whether they are reasonable (Burgess-Limerick, Abernathy, & Limerick, 1994; Yanchar, Gantt, & Clay, 2005; Yanchar & Williams, 2006). Control Challenged The use of control and treatment group comparisons is fundamental to the experimental method’s ability to determine program effectiveness. Ideally the two comparison groups will be equal in all aspects except for the independent variable which is manipulated by amount, type, absence or presence (Gay, 1996; Johnson, & Christenson, 2004; Shadish, Cook, & Campbell, 2002). The belief that groups will be similar if created using randomization is reliant on the premise that certain attributes and aspects of human behavior regulate learning (e.g., learning Randomization in Educational Research 5 ability, natural intelligence, interest, and student effort). These personality traits or attributes are considered to vary by degree, are somewhat stable, and are normally distributed in the population (Johnson & Christenson, 2004; Linn & Miller, 2005). In other words, students can be expected to react predictably and somewhat consistently when introduced to an effective educational program or intervention. In addition, the theory that randomization can control for extraneous, potentially confounding variables relies on the belief that all the external influences that might affect the learning of an individual each vary by degree, are equally likely to occur within the population, and can be reduced to a single comprehensive factor or influence. Statistical theory suggests that only 5% of the comparison groups created through randomization will differ significantly on any single factor, when Ho is true and α = .05 (Rossi, Freeman, & Lipsey, 1999; Shadish, Cook, & Campbell, 2002). However, it is possible that the true nature of teaching and learning process cannot be understood in this way (Brentano, 1973). If the influences that affect student learning are independent of one another (i.e., they exist and affect learning independently of other influences) and cannot be consistently combined into a stable single predictive influence, then we must consider additional evidence regarding the probability that the process of randomization will produce suitable groups for experiential comparisons. It is a commonly held belief that there are many factors that have the potential to affect learning outcomes (Gay, 1996; Johnson, & Christenson, 2004; Rossi, Freeman, & Lipsey, 1999; Shadish, Cook, & Campbell, 2002). If these influences are by nature independent of one another, then it is likely that randomization will be far less effective at controlling for these variables than might be expected. The mathematical probability that comparison groups will differ on one or more factors is based on the number of independent factors involved and the probability that they will occur. In fact, the probability that one of fourteen independent factors will differ significantly in any given two-group comparison is approximately 51%.1 Given the existence of numerous independent and potentially confounding variables in the teaching and learning process, if one hundred or more variables can be expected to influence the result, the probability of getting a significant group difference on at least one potentially confounding variable is greater than 99%. It should be noted that while the formula mentioned here does not take account of sample size, the samples used in this study were within the limits of randomization 1 P(A or B) = 1 – (1- α)k , K = number of potentially confounding independent factors; α = .05 Randomization in Educational Research 6 acceptability set out in Krejcie’s and Morgan’s (1970) paper, whose formula takes account of differential populations sizes. Both the theories regarding randomization and the mathematical probability calculations governing the occurrence of independent events are well established; it may depend on how you see the teaching and learning process – a predictable and controllable process where all the factors that affect learning can be reduced to a single influence that is somewhat stable or, a complex and dynamic process that varies constantly and depends on a number of factor influences and interactions. If the teaching and learning process is more complex and dynamic than predictable and controllable, the ability to creating equal comparison groups while theoretically possible is highly unlikely; it is more likely that group comparability is something that cannot be achieved in research dealing with human subjects and cannot be verified given the plethora of influential extraneous variables to account for. In practice, many of the extraneous variables that have the potential to affect achievement outcomes are typically not measured, difficult to measure, or impossible to measure. Some examples include: peer influence, student interest, motivation, health, eye sight, hearing, home life, personality, learning style preference, motivation, effort expenditure, learner intent, academic potential, test anxiety, or the amount of sleep students get (Gay, 1996; Johnson, & Christenson, 2004; Rossi, Freeman, & Lipsey, 1999; Shadish, Cook, & Campbell, 2002). The best any researcher might hope for is that the groups employed will be similar, to within a tolerable degree of difference, because total control of extraneous variables is impossible. Research Focus In practice, it is impossible to verify whether randomization accomplishes the task of creating suitable comparison groups. The influence of all potentially confounding variables are not or cannot be measured, most notably, the unknown influences for which random assignment is expected to control. In order to explore the claim that randomization is the best method for creating comparison groups a database simulation was created. This simulation was used to assess whether, and how well random assignment might be expected to accomplish the task of ensuring comparison group equivalence or similarity. The use of random assignment (i.e., randomly assigning individuals and existing classrooms) to create comparison groups was compared to quasi-experimental methods (i.e., group selection techniques that do not use random assignment). Randomization in Educational Research 7 Method A database simulation was created consisting of over 10,000 fictitious students, placed in 252 schools, and 490 different classrooms. Class sizes of between 12 and 32 were determined randomly. Demographic factors (i.e., race, gender, location, and socio economic status [SES]) were randomly or purposively assigned to each of these hypothetical students simulating the distribution of the demographic characteristics for students in the state of Indiana. Some individual attributes (e.g., student ability) are believed to be normally distributed in the population independent of other demographic factors. Student ability was randomly assigned a value between 80 and 130 based on a normal distribution within the population (i.e., more students around the average than toward the extremes). Developmental factors which often affect learning (e.g., student age) are not normally distributed but more systematically distributed across the population. Student age was randomly assigned to fall within a range of one year and two months. This range in ages was thought to be typical of most public school classrooms. In order to explore how well random assignment controls for extraneous influences, each student was randomly assigned a student effort factor, teacher/student interaction factor, friend influence factor, home influence factor, and two other influence factors representing unknown influences. It is important to note that these factors are really only labels and could represent any number of different extraneous influences. Labels were attached simple for convenience. For the purpose of this analysis it is supposed that these influences can and were all measured accurately, that they remain somewhat constant, that these variables are accurately distributed in the simulated population and are independent of the other factors under consideration. Each school and classroom was also randomly assigned a value representing the influence they might have. This was based on the assumption that a school environment, for example, will have a similar affect on each of the students in the class or school. In practice, this may or may not be true. The values assigned each of these factors ranged from positive 10 to negative 10, from having an extremely positive influence to a very negative influence academically. For the purposes of this analysis, these factors are assumed to exist as independent influences; that is for example, the expertise of an individual teacher is independent of a child’s ability, his or her home life, SES, school location, and other factors the teacher has no control over. In all, fourteen factors were used in the simulation. They include: gender, SES, location, race, student ability, age, school environment, classroom environment, student effort, Randomization in Educational Research 8 teacher/student interaction, friend influence, home influence, and two other factors representing unknown influences. Again, the extraneous influences mentioned are for the most part simply labels that might represent a number of potentially confounding variables. The creation of a simulated database is required as most such variables typically are not or cannot be measured in practice. The purpose of the simulation is to determine whether random assignment can reasonably be expected to control for extraneous variables that are not or cannot be measured. Research Procedures Comparability estimates. The fourteen factors recorded in the database were used to estimate how equivalent comparison groups might be when random assignment and other methods of comparison group creation are used. The similarity of the selected groups was calculated based on the average effect of each numerical factor (i.e., one way ANOVA) and the distribution similarity of categorical factors (i.e., chi-square analysis). Several trials of each selection technique were conducted and compared including: random assignment of individual students, random assignment of existing classrooms, randomly selecting schools with a control and a treatment group in each school, and non-random assignment of schools and classrooms to treatment and control groups based matching location, school, and classroom similarities. Sample size matters. Statistically speaking, with a population of 10,000 students, a sample size of 370 students is suggested (Krejcie & Morgan, 1970). With this many students sampled randomly from the population, the sample groups are likely to resemble the population 95% of the time (based on an α = .05). In these trials, care was taken to ensure that the selection techniques used resulted in groups with greater than 370 students in each group. Sampling. When assigning individual students to comparison groups, each student was randomly assigned using a random number generator to one of 25 groups. Each group contained more than 370 subjects, closer to 400 on average. Given 25 groups, there are 300 possible comparison group pairings. Assignment was done without replacement. Using a pairwise analysis, each of the 300 pairs was tested for comparability on each of the factors separately and collectively. Separately, the number of times a pairing differed significantly on a specific variable was counted and the frequency recorded. Collectively, the number of times a specific paring differed significantly on one or more factors was also counted and the frequency recorded. The same was done for random assignment of intact classrooms. Each classroom was Randomization in Educational Research 9 randomly assigned to one of 25 groups. Each of the 300 possible pairs was tested for comparability. In practice it is rare for researchers to use random assignment of individual students for a variety of practical reasons; random assignment of existing classroom is far more common (Johnson, & Christenson, 2004; Shadish, Cook, & Campbell, 2002). Another common research practice is to use various non-random selection methods to create comparison groups. To test some of these selection techniques twenty trials were completed for each of following selection techniques: randomly selected groups of schools with a control and a treatment group in each school; comparison groups purposively selected based on similar location, school, and classroom attributes; and randomly selected pairs of existing classrooms. Results Random Assignment of Individual Students In order to test the assumption that random assignment can ensure comparison group similarity, each of the students in the database was randomly assigned to a group based on the previously described method. Three trials were conducted with 300 comparisons in each trial. The result is displayed in Table 1. Random assignment seems to control for individual factors as predicted; in an infinite number of trials it could be expected that approximately 95% of the time the two groups will be similar on a specific single factor. Notably, factors such as race may have had more occurrences of significant difference due to the existence of small subgroups within the population distribution. For example, the number of students of Asian descent are few in the overall population relative to the number of Caucasian or African American students and thus are more likely to be underrepresented in any given comparison group sample resulting in a significant difference being identified. Also as might be expected, given 14 different factors, the number of comparison groups identified as problematic (i.e., significantly different on one or more factors than could be expected by chance) was much larger than for any single factor. Given 14 factors, the probability that one or more of the individual factors would differ significantly in any given pairing is expected to be 51%; based on the results of this simulation, averaging the three trials approximately 45% of the potential pairs were deemed problematic in terms of comparability. In an infinite number of trials it is reasonable to expect that approximately 51% of the time any two groups will differ significantly on one or more of the 14 factors just as predicted. Randomization in Educational Research 10 Table 1 Rate of Significant Difference by Factor in Randomly Assigned Groups of Individuals Separate Analysis of Individual Factors Gender Social Economic Status Location Race Age Student Ability School factor Classroom factor Student Effort Student Teacher Interaction Friend factor Home factor Unknown factor 1 Unknown factor 2 Trial 1* Freq. % 3 1.0 0 0.0 0 0.0 43 14.3 4 1.3 13 4.3 17 5.7 18 6.0 37 12.3 20 6.7 13 4.3 19 6.3 17 5.7 16 5.3 Trial 2 Freq. % 0 0.0 0 0.0 1 0.3 17 5.7 22 7.3 10 3.3 17 5.7 5 1.7 24 8.0 34 11.3 29 9.7 5 1.7 10 3.3 8 2.7 Trial 3 Freq. % 0 0.0 0 0.0 0 0.0 23 7.7 6 2.0 2 0.7 22 7.3 30 10.0 17 5.7 1 0.3 12 4.0 18 6.0 14 4.7 4 1.3 152 132 122 Average % 0.3 0.0 0.1 9.2 3.6 2.8 5.1 5.9 8.7 6.1 6.0 4.7 4.6 3.1 Collective Analysis One or more factors different 50.7 44.0 40.7 45.1 * 300 individual group comparisons considered in each trial Looking at the data from Trial 1 as an example, the majority of the 14 individual factors fell within the range of expected difference. Of the 300 possible pairs, only about 5% of the time did a comparison group differ significantly on any single factor. However, in any given comparison group pairing, while the groups may be adequately similar on one factor, other aspects of that pairing were often significantly dissimilar. In several instances the groups were similar on several factors but not on all the factors. In this example over half the 300 possible comparison groups differed significantly on one or more of the 14 factors. Random Assignment of Intact Classrooms Because random assignment of individual students is rarely possible (Shadish, Cook, & Campbell, 2002), many studies use existing classrooms or schools when creating comparison groups. The assumption is that random assignment will work equally well for creating comparison groups from intact units (e.g., existing classrooms or schools) as it does for a sample Randomization in Educational Research 11 of individuals (U.S. Department of Education, 2003). To test this assumption, three trials were conducted with 300 comparisons in each trial. Table 2 shows the results of these comparisons. Table 2 Rate of Significant Difference by Factor in Randomly Assigned Groups of Intact Classrooms Separate Analysis of Individual Factors Gender Social Economic Status Location Race Age Student Ability School factor Classroom factor Student Effort Student/Teacher Interaction Friend factor Home factor Unknown factor 1 Unknown factor 2 Trial 1 Freq. % 6 2.0 104 34.7 247 82.3 64 21.3 30 10.0 55 18.3 202 67.3 215 71.7 5 1.7 163 54.3 42 14.0 6 2.0 24 8.0 5 1.7 Trial 2 Freq. % 8 2.7 50 16.7 267 89.0 60 20.0 1 0.3 54 18.0 215 71.7 219 73.0 10 3.3 166 55.3 52 17.3 22 7.3 20 6.7 5 1.7 Trial 3 Freq. % 2 0.7 31 10.3 253 84.3 69 23.0 5 1.7 81 27.0 207 69.0 211 70.3 16 5.3 172 57.3 34 11.3 36 12.0 7 2.3 14 4.7 298 298 299 Average % 1.7 20.6 85.2 21.4 4.0 21.1 69.3 71.7 3.4 55.7 14.2 7.1 5.7 2.7 Collective Analysis One or more factors different 99.3 99.3 99.7 99.4 * 300 individual group comparisons considered in each trial Based on these results it is unlikely that randomly assigning existing classrooms to treatment and control groups will create suitable comparison groups for individual analysis. With the exception of factors like gender and age which are more likely to be similarly distributed within intact units; group differences were evident for a considerable number of individual factors. Difference in the distribution of these influences is likely magnified by the lack of overall homogeneity in classrooms resulting in a considerable number of problematic cases. In each of the trials, 99% of the comparison groups were found to be significantly different on one or more of the fourteen factors used in this simulation. It seems that the conditions that govern and control the effectiveness of randomization when using individuals do not hold true for random assignment of intact units. The use of this technique to produce similar groups for comparison purposes is extremely likely to produce unacceptable results. Clearly one cannot sample schools or classrooms and then analyse the individuals within them the same as one Randomization in Educational Research 12 might when using random assignment of individuals; if intact units are sampled then it is the units that must be representative requiring a much larger sample than is typically used. Randomly assigning intact units (e.g., classrooms or schools) to treatment and control groups will likely produce comparison groups that are significantly different from one another on one or more factors every time. Non-random Comparison Group Selection A common quasi-experimental practice some studies utilize when creating comparison groups is to take two classrooms within the same school and assign them as a control and treatment group pair. This technique is used as a control for the location and school influence. A notable limitation of this technique is that it excludes schools with only one class at that specific grade level. To overcome this limitations some studies use purposive or stratified matching techniques to create comparison groups. It is believed that this method of group selection will control for a few known factors that might reasonably be expected to affect the result. In this case the selection was done based on location, school, and classroom similarities. The results of 20 group comparisons using these techniques are displayed in Table 3 along with 20 group comparisons of randomly assigned intact units. Each of these comparisons involves the use of existing classrooms. Based on the results of these trials, the selection technique does seem to control for the targeted extraneous variables. Both non-random techniques for comparison group selection performed better than randomly assigning existing classrooms to treatment and control groups. Far fewer individual factors were problematic when the research attempted to control for known factors that might result in comparison group difference. In addition, the magnitude of the differences in individual factors was greater when randomly assigning existing classes. When using intact units, it seems more likely that you will produce suitable comparison groups using non-random quasi-experimental selection techniques than random selection techniques. Randomization in Educational Research 13 Table 3 Rate of Significant Difference by Factor in Groups of Intact Classrooms based on Selection Method Separate Analysis of Individual Factors Gender Social Economic Status Location Race Age Student Ability School factor Classroom factor Student Effort Student/Teacher Interaction Friend factor Home factor Unknown factor 1 Unknown factor 2 In-school Pairings Freq. % 3 15.0 1 5.0 0 0.0 0 0.0 0 0.0 2 10.0 0 0.0 16 80.0 0 0.0 12 60.0 0 0.0 0 0.0 1 5.0 0 0.0 Purposive Pairings Freq. % 0 0.0 4 20.0 0 0.0 6 30.0 0 0.0 0 0.0 3 15.0 8 40.0 2 10.0 2 10.0 2 10.0 0 0.0 1 5.0 4 20.0 Random Pairings Freq. % 2 10.0 9 45.0 19 95.0 8 40.0 0 0.0 5 25.0 11 55.0 15 75.0 6 30.0 14 70.0 0 0.0 1 5.0 1 5.0 1 5.0 Collective Analysis One or more factors different 18 90.0 17 85.0 20 100.0 * 20 individual group comparisons considered in each trial Discussion and Conclusions A fundamental basis for determining effective practices using experimental studies depends on the researcher’s ability to create similar comparison groups. Most research textbooks acknowledge that random assignment does not guarantee the creation of similar comparison groups; the literature only suggests that the use of random assignment maximizes the probability that systematic differences will not exist; and that if sufficient sample sizes are drawn the likelihood that comparison groups will be significantly different is small (Gay, 1996; Rossi, Freeman, & Lipsey, 1999; Shadish, Cook, & Campbell, 2002). However, a reliance on randomization to produce equivalent comparison groups may be unwarranted. Based on the results of this simulation, random assignment of individuals to treatment and control groups may very well be the best method one might use to maximize the chances of Randomization in Educational Research 14 comparison group similarity if you are primarily concerned with a single potentially confounding influence; however, there appears to be an extremely high probability that random assignment will not do an adequate job of controlling for multiple extraneous variables. Given the complex nature of the teaching and learning process and the number of potentially confounding variables that may affect the results of any experiment (DBRC, 2003; Patton, 2007), the chance that comparison groups will be different in some way is much larger than the 5% prediction (when α = .05). In fact, while the use of random assignment to assign individual students to control and treatment groups did perform the best of any of the techniques simulated in this study, given the fourteen factors used in this simulation, about 45% of the potential comparison groups were found to be significantly different on one or more factors. Given the number of factors likely to affect learning, the question is not whether randomization will produce equal groups, but whether the differences that inevitably exist in randomly assigned comparison groups are severe enough to affect the results. While random assignment of individual students to treatment and control groups seems to be the best method for selecting comparison groups, it is rarely possible (Shadish, Cook, & Campbell, 2002). The realities of conducting randomized controlled trials make the use of this technique impractical when attempting to determine the general effectiveness of educational practices. Moreover, the suggestion that using random assignment of intact units will be an acceptable alternative to randomly assigning individuals appears to be unfounded. Non-random assignment techniques did considerably better than random assignment methods in this respect. Given that random assignment of individuals, randomly sampled from the entire population is impractical and often impossible; and that random assignment of intact groups is an inadequate Randomization in Educational Research 15 alternative; reliance on random assignment to adequately control for multiple extraneous, potentially confounding variables seems to be misguided, just as Leaf (1993) suggested. We conclude from the results of these simulations that given the number of potentially confounding variables that affect learning, random assignment of individuals to comparison groups will not result in the creation of equivalent groups; this technique does however seem to function as expected when controlling for individual influences. Furthermore, we conclude that the practice of randomly assigning existing classrooms is not a reasonable alternative to randomly assigning individuals. Based on the results of these simulations, the use of non-random assignment with careful, systematic comparison group creation efforts that attempt to control for known extraneous variables is likely to produce better comparison groups than randomly assigning intact units to treatment and control groups. And while there is a chance that the researcher may systematically, or unknowingly cause groups to be different, this seems to be an inevitable outcome with or without random assignment. In real world applications of these methods, we will simply never know how poorly random assignment functions as a control for potentially confounding variables. Thus the assumption that random assignment adequately controls for potentially confounding variables when attempting to determine program effectiveness remains unsubstantiated, especially when assigning intact units. One might reasonably conclude from this study that the assumption that proven program effectiveness can be determined using randomized experiments is faulty, especially if intact units are used. This type of experimentation provides solutions that are too simplistic given the complexity of most educational situations (Patton, 2007). On the issue of comparability, randomization likely will not produce adequately similar treatment and control groups. Results obtained from evaluation research utilizing these methods will always be bound to a specific Randomization in Educational Research 16 local context and an abundance of potentially confounding influences and interactions (DBRC, 2003). If assumptions about the use of randomization and this technique’s ability to create comparable groups are fundamentally incorrect, then the results of these studies will always be inconsistent and potentially misleading. This may be the reason why random controlled trials in education to date have failed to revolutionize the effectiveness of the education process to any great extent (Branson & Hirumi, 1994; DBRC, 2003). In practice, after promising practices have been identified, repeated trials using quasi-experimental (i.e., non-random) comparison group selection techniques and a wide variety of other qualitative and quantitative methods in many settings and times by multiple researchers may be the only and most reasonable method for determining ―what works‖ in education. Randomization in Educational Research 17 REFERENCES Abel, U., & Koch, A. (1997). The mythology of randomization. Conference proceedings of the international conference of nonrandomized comparative clinical studies in Heidelberg, April 10-11. Retrieved Sept.12, 2006 from http://www.symosion.com/nrccs/abel.htm Branson, R. & Hirumi, A. (1994). Designing the future: The Florida Schoolyear 2000 Initiative. In G. Kearkly & W. Lynch (Eds.) Educational technology: Leadership perspectives (pp. 91-112). Englewood Cliffs, NJ: Educational Technology Publications. Burgess-Limerick, R., Abernathy, B., & Limerick, B. (1994). Identification of underlying assumptions is an integral part of research: An example from motor control. Theory & Psychology, 4, 139-146. Brentano, F. (1973). Psychology from an empirical standpoint. New York: Humanities Press. Campbell, D.T., & Stanley, J. (1963). Experimental and Quasi-experimental Designs for Research on Teaching. Boston, MA: Houghton Mifflin. Dennis, M. L. (1994). Ethical and practical randomized field experiments. In J. Wholey, H. Hatry, & K. Newcomer (Eds.), Handbook of practical program evaluation. San Francisco, CA: Jossey-Bass Publications. Design-Based Research Collective. (2003). Design-based research: An emerging paradigm for educational inquiry. Educational Researcher, 32(1), 5-8. Feinstein, A. (1997). Problems of randomized trials. Conference proceedings of the international conference of nonrandomized comparative clinical studies in Heidelberg, April 10-11. Retrieved Sept.12, 2006 from http://www.symosion.com/nrccs/feinstein.htm Gay, L. R. (1996). Educational research: Competencies for analysis and application (5th ed.). Englewood Cliffs, NJ: Prentice-Hall. Johnson, B., & Christensen, L. (2004) Educational research: Qualitative, quantitative, and mixed methods approaches. (2nd ed.). Boston: Pearson Education. Koch, S. (1999). Psychology in human context: Essays in dissidence and reconstruction. Chicago: The University of Chicago Press. Randomization in Educational Research 18 Krejcie, R.V., & Morgan, D.W. (1970). Determining sample size for research activities. Educational and Psychological Measurement, 30, 607-610. Leaf, R. C. (1993). Control, Volition, and the "Experimental Method." New Ideas in Psychology, 11 (1), 3-33. Linn, R. L. & Miller, M. D.(2005). Measurement and assessment in teaching (9th Ed.). Upper Saddle River, NJ: Pearson Education, Inc. Paige, R. (2002). Rigorous Evidence: The Key To Progress in Education? Lessons from Medicine, Welfare, and Other Fields. Policy forum of the Council for Excellence in Government. Retrieved November 11, 2006 from http://www.excelgov.org/index.php?keyword=a4339246667652 Patton, M.Q. (2007, Nov). Facilitating Fast-paced Learning: Developmental evaluation for complex emergent innovations. Conference presentation at the American Evaluation Association, Baltimore, MA. Rossi, P.H., Freeman, H.E. & Lipsey, M.W. (1999). Evaluation: A Systematic Approach, 6th Ed. Newbury Park, CA: Sage. Schon, D. (1983). The Reflective Practitioner: How Professionals Think in Action. New York, NY: Basic Books. Shadish, W., Cook, T., & Campbell, D. (2002). Experimental and quasi-experimental designs for generalized casual inference. Boston, MA: Houghton Migglin Company. Shadish, W. & Ragsdale, K. (1997). Random versus nonrandom assignment in controlled experiments: Do you get the same answers? Journal of Consulting and Clinical Psychology 64(6), p. 1290 – 1305. Slavin, R. (2007). Educational Research in an age of accountability. Boston, MA: Pearson Education Inc. Teddlie, C., & Tashakkori, A. (2003). Major Issues and Controversies in the Use of Mixed Methods in the Social and Behavioral Sciences. In Tashakkori and Teddlie (Eds.) Handbook of mixed methods in social and behavioral research. Thousand Oaks, CA: Sage. Randomization in Educational Research 19 U.S. Department of Education (2003). Scientifically Based Evaluation Methods: Notice of proposed priority. Federal Register. Vol. 68, No. 213, p. 62446. Retrieved Nov. 11, 2006 from http://www.ed.gov/legislation/FedRegister/proprule/2003-4/110403b.pdf USDOE, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance (2003). Identifying and implementing educational practices supported by rigorous evidence: a user friendly guide. Retrieved November 11, 2006 from http://www.ed.gov/rschstat/research/pubs/rigorousevid/rigorousevid.pdf Winston, A. S. (1988). Cause and experiment in introductory psychology: An analysis of R. S. Woodworth’s textbooks. Teaching of Psychology, 15, 79-83. Yanchar, S. C., Gantt, E. E., & Clay, S. L. (2005). On the nature of a critical methodology. Theory and Psychology, 15, 27-50. Yanchar, S. C., & Williams, D. D. (2006). Reconsidering the compatibility thesis and eclecticism: Five proposed guidelines for method use. Educational Researcher, 35 (9), 312. View publication stats