Abstract Title Page Not included in page count. Title: The Implications of “Contamination” for Experimental Design in Education. Author(s): Chris Rhoads, Ph.D., Northwestern University. 2009 SREE Conference Abstract Template Abstract Body Limit 5 pages single spaced. Background/context: Description of prior research and/or its intellectual context and/or its policy context. Researchers planning a randomized field trial to evaluate the effectiveness of an educational intervention often face the following dilemma. They plan to recruit schools to participate in their study. The question is, “Should researchers randomly assign individuals (either students or teachers, depending on the intervention) within schools to treatment conditions, or should all participating students (teachers) at a given school be assigned to the same treatment condition?”. That is, should we randomize schools (clusters), or individuals within schools? Of course, for certain interventions, individual randomization is not an option, since by their very nature they must be delivered to entire groups of students or teachers. Whole school reforms must be delivered at the school level and curricular interventions generally must be delivered at the classroom level. However, other interventions can be delivered directly to individuals. For instance, tutoring that is provided to below grade level readers. One reason often given for preferring cluster level randomization is a fear of “diffusion of treatment” (Raudenbush, 1997) or “contamination” (Donner and Klar, 2000). Contamination occurs when contact between members of the control group and members of the experimental group causes control group participants to behave more like experimental group participants than they would have had that contact not occurred. For instance, if an intervention that is designed to reduce dropout rates by increasing engagement with school is delivered to some at-risk children but not others within a given school, it is possible that the children not receiving the intervention will also have their engagement with school increased as a result of the increased engagement of their peers. It is also possible for certain interventions that contamination could cause experimental subjects to behave more like control group subjects than they would otherwise. Note that both experimental subjects acting more like control subjects and control subjects acting like experimental subjects are processes that would tend to decrease the observed effect size of the experiment. I assume for the purposes of this paper that the contamination dilutes the observed effect size in this fashion. The methods considered would not apply to a contamination process that tends to spuriously increase the effect size, such as control group demoralization (Shadish, Cook & Campbell, 2002). Cornfield (1978) noted that two penalties are paid for randomization by cluster rather than by individual. First, the variance of the estimated treatment effect increases. Second, the degrees of freedom available to estimate that variance decrease. Thus, in the absence of contamination, randomizing individuals within clusters is a more powerful design than randomizing whole clusters. This paper argues that the threat of contamination should not necessarily cause experimenters to opt for a cluster randomized design. Researchers need to also consider the extent of the expected contamination in order to make a well informed decision about which is a more powerful design. As far as I am aware, the existing work on this problem consists of a few papers that have appeared in the health sciences literature. The statistical model underlying the work in these 2009 SREE Conference Abstract Template 1 papers is left unstated, but implicitly the calculations assume that the outcome of individual k in cluster j who receives treatment i obeys the following model: Yijk i j ijk , i 1, 2; j 1, , 2m; k 1, , nij ; n1 j n2 j n (1) where and i are fixed parameters representing, respectively, the overall mean and the deviation of the mean in treatment group i from the overall mean, j is a mean zero, normally distributed random effect associated with cluster j having variance B2 , and ijk is a mean zero, normally distributed random effect associated with individual k within cluster j and has variance 2 . Many results will depend on the intracluster correlation coefficient (ICC), defined as B2 ( B2 2 ) . Notice that this model does not contain a term for the interaction between clusters and treatment and thus assumes that treatment effects are homogeneous across clusters. Slymen and Hovell (1997) and Torgerson (2001) explored the issue of choosing between cluster and individual level randomization in the health sciences, however, the most thorough treatment of the issue to data is Borm, Melis, Teerenstra and Peer (2005) (hereafter BMTP). BMTP suggest an interesting compromise between cluster randomization and individual randomization, a method they call “pseudo cluster randomization.” They label the two treatments that are under investigation as treatments A and B. Suppose that the data can be modeled by equation (1) so that 2m clusters, each of size n, are available for the experiment. Pseudo cluster randomization is a two step randomization procedure. First, the clusters are randomly assigned to two groups labeled a and b, resulting in m clusters per group. Then, for each cluster in group a, a fraction f ( 0.5 f 1 ) of the individuals in that cluster are randomly assigned to treatment A and the rest to treatment B. In cluster group B, the same fraction f of individuals in each cluster are assigned to treatment B and the rest to treatment A. Using the same fraction f in each cluster group ensures that the design is balanced on both the individual and the cluster level. Next, define yAa j and yAb j to be the mean outcome of individuals in cluster j assigned to treatment A in cluster groups a and b, respectively. Notation for the mean outcomes of those receiving treatment B are defined analogously. It may be advantageous to weight responses in the different cluster groups differently. Thus, BMTP propose the following estimator of the mean outcome for those receiving A yA 1 m fy Aa 1 m w(1 f ) y Ab j f w(1 f ) j . (2) Then the variance of the estimated treatment difference yA - yB is var 2 2 f w2 (1 f ) nq{ f w(1 f )}2 , mn { f w(1 f )}2 2009 SREE Conference Abstract Template (3) 2 where q is defined by q= /(1 ) . Contamination is conceptualized as follows. Let A and B be population mean outcomes under treatments A and B in the absence of contamination and let A B . Then the expected outcome for those receiving treatment A in group a is A c f , A ( 0 c f 1 ). This is the contamination of A by B given that a fraction, f, receive A in each cluster. Defining other contamination factors in an analogous fashion we find that the expected value of y A yB with contamination is given by f (c f , A c f , B ) w(1 f )(c1 f , A c1 f , B ) . (4) E f w(1 f ) BMTP note that efficiency is optimized by optimizing the ratio t 2 E 2 / var . They proceed to derive three weights, w, that will, respectively; minimize variance, minimize contamination, and maximize t 2 . They then compare pseudo cluster randomization (with various fractions and weights) to cluster randomization and 1:1 individual randomization in terms of inverse t 2 ratios (which ratio is proportional to the number of subjects required to achieve fixed type I and type II error rates using z critical values). They find that cluster randomization will be preferred if cluster sizes are small, if the value of the ICC is small, or if the total amount of contamination is large. However, even if contamination reduces the uncontaminated average treatment effect by 30-50% individual randomization will often be preferred. Purpose/objective/research question/focus of study: Description of what the research focused on and why. There are two major limitations to the work done by BMTP that the current work overcomes. First, the BMTP results are asymptotic, and hence assume a large number of clusters enrolled in the experiments. This assumption has two implications. The first is that Cornfield’s second penalty can be ignored. The second implication is that BMTP can assume that the value of is known. The current work explores these two issues. That is, it looks at the impact on power of the additional degrees of freedom available under the individually randomized design and at what can be done when the value of cannot be assumed known. The other limitation of the BMTP work is that it assumes homogeneous treatment effects across clusters. This assumption appears to be standard in the health sciences literature, however, it is not recommended in educational experiments where it is possible, if not likely, that the effect of treatment may vary across different educational settings. Thus, in addition to presenting some results based on the model given in (1), the current paper also considers the following model: Yijk i j ( )ij ijk , i 1, 2; j 1,, 2m; k 1, , nij , n1 j n2 j n . (5) This model differs from (1) only by the addition of random effects for the interaction of 2 treatment with clusters, ( )ij , which have a mean zero normal distribution with variance TC . 2009 SREE Conference Abstract Template 3 Findings/Results: Description of main findings with specific details. I begin by noting that, in general, the weighted estimators derived by BMTP all depend on a known value of (with the exception of the minimal contamination estimator, which BMTP show is always inferior to other approaches in terms of sample size requirements). Since I don’t wish to assume knowledge of , I focus attention on comparing full cluster randomization to randomization within cluster at a 1:1 ratio (the cases of f=1 and f=0.5). Results depend on the total contamination at f=0.5, which I denote cht c0.5, A c0.5, B and the value of nq. Figure 1 examines the ratio of the total sample size required under 1:1 randomization to that required under cluster randomization under model (1) and assuming z critical values. I note that even when contamination is as large as 0.7, 1:1 randomization is preferable provided n and/or is large enough. For instance, 1:1 randomization is preferred when n=100 and =0.1. Figure 1 actually understates the extent to which 1:1 randomization should be preferred to cluster randomization under model (1). This is due to the additional degrees of freedom available to estimate the variance of the estimated treatment effect under 1:1 randomization. With 2m clusters available for the experiment, the cluster randomized design has only 2m-2 degrees of freedom, while the 1:1 randomized design has 2mn-2m-1. For many designs this difference in degrees of freedom will have only a negligible effect, however, if the number of clusters is in the experiment is quite small (say 2m < 10) then the additional degrees of freedom available under the 1:1 design can substantially add to its power advantages versus the cluster randomized design. In the interest of space, I omit further discussion of this issue here. An issue apparently overlooked by BMTP is that the choice of w=f/(1-f) will allow the use of a pseudo cluster randomized design without assuming a large number of cluster enrolled or prior knowledge of . I will refer to a treatment effect estimate constructed in this fashion as a weighted-invariant (WI) estimator. The expected value of the estimated treatment effect for the WI estimator will depend on the total contamination at fractional randomization f, c ft c f , A c1 f , A c f , B c1 f , B . While BMTP allow values of cf,i to vary from 0 to 1 in an unrestricted fashion, this would imply that contamination could cause all control subjects to behave as though they were uncontaminated experimental subjects and all experimental subjects to behave as though they were uncontaminated control subjects, resulting in the absolute value of staying the same but changing sign. This seems implausible, and so the restrictions c f , A c1 f , B 1 and c f , B c1 f , A 1 seem logical. The degrees of freedom of the WI estimator are the same as the degrees of freedom for 1:1 randomization, so a very good approximation to the relative sample size required under the two designs is given by 2 tWI (1 cht ) 2 . t1:12 4 f (1 f ){(2 c ft ) 2}2 (6) Since f (1 f ) is at most 0.25, in the absence of contamination we obtain the well known result that a balanced experimental design is the most efficient. The WI design can improve on the 1:1 design only if we can remove a substantial amount of contamination by using f > 0.5. For 2009 SREE Conference Abstract Template 4 instance, suppose that we believe that experimental subjects are unlikely to be contaminated by control subjects and that at 1:1 randomization the control group mean will move halfway towards the experimental group mean. That is, we think c0.5, A 0 and c0.5, B 0.5 . Now suppose that at a fractional allocation f=0.8 experimental subjects are still uncontaminated by control subjects. However, when 80% of the subjects in a cluster are in the experimental group contamination of controls increases to c0.2, B 0.6 , however when only 20% are in the experimental group, contamination decreases to c0.8, B 0.1 . Then equation (6) results in 0.925 and the WI design would be preferred. On the other hand, if c0.8, B 0.2 and other parameters remain unchanged, the 1:1 design is preferred. Finally, I present results under the assumption of heterogeneous treatment effects as 2 B2 . described by model (5). Results under this model will depend on the parameter TC Hedges and Rhoads (under review) suggest that values of 0.5 are reasonable in educational experiments. Under this model expected value calculations are unchanged, but the variance of the estimated treatment difference yA - yB under pseudo cluster randomization becomes varhet 2 2 f w2 (1 f ) nq{ f w(1 f )}2 nq{ f 2 w2 (1 f ) 2 } . mn { f w(1 f )}2 (7) I note that under the heterogeneous treatment effects model there no longer exists a weight w that will result in a variance not depending on the unknown and parameters. Furthermore, unlike the homogeneous treatment effects model, the degrees of freedom available to estimate the variance in (7) are quite similar under the cluster randomized and 1:1 designs. They remain 2m-2 under the cluster randomized design and are 2m-1 under the 1:1 design. Given the similarity in degrees of freedom, I compare the two designs in Figure 2 assuming z critical values. As expected more heterogeneity implies that the 1:1 design is relatively disadvantaged, although perhaps not by as much as one might expect. Conclusions: Given the power advantages of randomization within clusters vs. randomization of intact clusters, researchers should be wary of opting for randomization by cluster when individual randomization is feasible, even when contamination in the individually randomized design is likely. Even with a conservative treatment effect heterogeneity value of =0.5, when clusters are of size n=100 and the ICC is 0.1, contamination can be as high as 50% and individual randomization would still be preferable. I conclude that situations where cluster randomization should be preferred to individual randomization are quite rare. By opting for randomization within clusters, researchers should be aware that if contamination exists unbiased estimation of treatment effects will not be possible. However, power to detect non-null treatment effects may be of more primacy than unbiased estimation of treatment effects in efficacy studies. Put another way, we might be willing to accept some bias in exchange for lower mean square error. Furthermore, by allocating one subset of clusters for 2009 SREE Conference Abstract Template 5 randomization by clusters and another subset for randomization by individuals within clusters, contamination effects may be estimated and an unbiased estimate of the treatment effect can be constructed. Continuing research of mine explores issues involved in the estimation of contamination effects. 2009 SREE Conference Abstract Template 6 Appendixes Not included in page count. Appendix A. References References are to be in APA format. (See APA style examples at the end of the document.) Borm, G.F., Melis, R.J.F., Teerenstra, S. and Peer, P.G. (2005). Pseudo cluster randomization: a treatment allocation method to minimize contamination and selection bias. Statistics in Medicine, 24, 3535-3547. Cornfield, J. (1978). Randomization by Group: a formal analysis. American Journal of Epidemiology, 108, 100-2. Donner, A. and Klar, N. (2000). Design and Analysis of Cluster Randomization Trials in Health Research. New York: Oxford University Press. Raudenbush, S.W. (1997). Statistical analysis and optimal design for cluster randomization trials. Psychological Methods, 2, 173-85. Shadish, W.R., Cook, T.D. and Campbell, D.T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin. Slymen, D.J. and Hovell, M.F. (1997). Cluster versus individual randomization in adolescent tobacco and alcohol studies: illustrations for design decisions, International Journal of Epidemiology, 26(4), 765-771. Torgerson, D.J. (2001). Contamination in trials: is cluster randomization the answer? BMJ, 322, 355-357. 2009 SREE Conference Abstract Template A–1 Appendix B. Tables and Figures Not included in page count. Figure 1. Ratio of number of subjects required with 1:1 randomization within cluster vs. number required with cluster randomization. Homogeneous treatment effects, infinite degrees of freedom. 2009 SREE Conference Abstract Template B–1 Figure 2: Ratio of number of subjects required with 1:1 randomization within cluster vs. number required with cluster randomization. Heterogeneous treatment effects, infinite degrees of freedom. 2009 SREE Conference Abstract Template B–2 APA Reference Style Examples Sample Citation: Journal Article Hypericum Depression Trial Study Group. (2002). Effect of Hypericum perforatum (St John’s Wort) in major depressive disorder: A randomized controlled trial. JAMA, 287, 1807– 1814. Sample Citation: Newsletter/Newspaper Article Brown, L. S. (1993, Spring). My research with oranges. The Psychology Department Newsletter, 3, 2. Sample Citation: Book American Psychiatric Association. (1990). Diagnostic and statistical manual of mental disorders (3rd ed.). Washington, DC: Author. Booth, W. C., Colomb, G. G., & Williams, J. M. (1995). The craft of research. Chicago: University of Chicago Press. Sample Citation: Chapter or Section in a Book Stephan, W. G. (1985). Intergroup relations. In G. Lindzey & E. Aronson (Eds.), The handbook of social psychology (3rd ed., Vol. 2, pp. 599–658). New York: Random House. Sample Citation: Web Page Dewey, R. A. (2004). APA Style Resources by Russ Dewey. Retrieved September 8, 2004, from http://www.psywww.com/resource/apacrib.htm 2009 SREE Conference Abstract Template