Appendix B. Tables and Figures Not included in page count.

advertisement
Abstract Title Page
Not included in page count.
Title:
The Implications of “Contamination” for Experimental Design in Education.
Author(s): Chris Rhoads, Ph.D., Northwestern University.
2009 SREE Conference Abstract Template
Abstract Body
Limit 5 pages single spaced.
Background/context:
Description of prior research and/or its intellectual context and/or its policy context.
Researchers planning a randomized field trial to evaluate the effectiveness of an educational
intervention often face the following dilemma. They plan to recruit schools to participate in their
study. The question is, “Should researchers randomly assign individuals (either students or
teachers, depending on the intervention) within schools to treatment conditions, or should all
participating students (teachers) at a given school be assigned to the same treatment condition?”.
That is, should we randomize schools (clusters), or individuals within schools? Of course, for
certain interventions, individual randomization is not an option, since by their very nature they
must be delivered to entire groups of students or teachers. Whole school reforms must be
delivered at the school level and curricular interventions generally must be delivered at the
classroom level. However, other interventions can be delivered directly to individuals. For
instance, tutoring that is provided to below grade level readers.
One reason often given for preferring cluster level randomization is a fear of “diffusion of
treatment” (Raudenbush, 1997) or “contamination” (Donner and Klar, 2000). Contamination
occurs when contact between members of the control group and members of the experimental
group causes control group participants to behave more like experimental group participants than
they would have had that contact not occurred. For instance, if an intervention that is designed
to reduce dropout rates by increasing engagement with school is delivered to some at-risk
children but not others within a given school, it is possible that the children not receiving the
intervention will also have their engagement with school increased as a result of the increased
engagement of their peers. It is also possible for certain interventions that contamination could
cause experimental subjects to behave more like control group subjects than they would
otherwise. Note that both experimental subjects acting more like control subjects and control
subjects acting like experimental subjects are processes that would tend to decrease the observed
effect size of the experiment. I assume for the purposes of this paper that the contamination
dilutes the observed effect size in this fashion. The methods considered would not apply to a
contamination process that tends to spuriously increase the effect size, such as control group
demoralization (Shadish, Cook & Campbell, 2002).
Cornfield (1978) noted that two penalties are paid for randomization by cluster rather than by
individual. First, the variance of the estimated treatment effect increases. Second, the degrees of
freedom available to estimate that variance decrease. Thus, in the absence of contamination,
randomizing individuals within clusters is a more powerful design than randomizing whole
clusters. This paper argues that the threat of contamination should not necessarily cause
experimenters to opt for a cluster randomized design. Researchers need to also consider the
extent of the expected contamination in order to make a well informed decision about which is a
more powerful design.
As far as I am aware, the existing work on this problem consists of a few papers that have
appeared in the health sciences literature. The statistical model underlying the work in these
2009 SREE Conference Abstract Template
1
papers is left unstated, but implicitly the calculations assume that the outcome of individual k in
cluster j who receives treatment i obeys the following model:
Yijk    i   j   ijk , i  1, 2; j  1, , 2m; k  1,
, nij ; n1 j  n2 j  n
(1)
where  and  i are fixed parameters representing, respectively, the overall mean and the
deviation of the mean in treatment group i from the overall mean,  j is a mean zero, normally
distributed random effect associated with cluster j having variance  B2 , and  ijk is a mean zero,
normally distributed random effect associated with individual k within cluster j and has variance
 2 . Many results will depend on the intracluster correlation coefficient (ICC), defined as
   B2 ( B2   2 ) . Notice that this model does not contain a term for the interaction between
clusters and treatment and thus assumes that treatment effects are homogeneous across clusters.
Slymen and Hovell (1997) and Torgerson (2001) explored the issue of choosing between
cluster and individual level randomization in the health sciences, however, the most thorough
treatment of the issue to data is Borm, Melis, Teerenstra and Peer (2005) (hereafter BMTP).
BMTP suggest an interesting compromise between cluster randomization and individual
randomization, a method they call “pseudo cluster randomization.” They label the two
treatments that are under investigation as treatments A and B. Suppose that the data can be
modeled by equation (1) so that 2m clusters, each of size n, are available for the experiment.
Pseudo cluster randomization is a two step randomization procedure. First, the clusters are
randomly assigned to two groups labeled a and b, resulting in m clusters per group. Then, for
each cluster in group a, a fraction f ( 0.5  f  1 ) of the individuals in that cluster are randomly
assigned to treatment A and the rest to treatment B. In cluster group B, the same fraction f of
individuals in each cluster are assigned to treatment B and the rest to treatment A. Using the
same fraction f in each cluster group ensures that the design is balanced on both the individual
and the cluster level.
Next, define yAa j and yAb j to be the mean outcome of individuals in cluster j assigned to
treatment A in cluster groups a and b, respectively. Notation for the mean outcomes of those
receiving treatment B are defined analogously. It may be advantageous to weight responses in
the different cluster groups differently. Thus, BMTP propose the following estimator of the
mean outcome for those receiving A
yA 
1 m fy Aa  1 m w(1  f ) y Ab
j
f  w(1  f )
j
.
(2)
Then the variance of the estimated treatment difference yA - yB is
var 
2 2 f  w2 (1  f )  nq{ f  w(1  f )}2
,
mn
{ f  w(1  f )}2
2009 SREE Conference Abstract Template
(3)
2
where q is defined by q=  /(1   ) .
Contamination is conceptualized as follows. Let  A and B be population mean outcomes
under treatments A and B in the absence of contamination and let    A   B . Then the
expected outcome for those receiving treatment A in group a is  A  c f , A ( 0  c f  1 ). This is
the contamination of A by B given that a fraction, f, receive A in each cluster. Defining other
contamination factors in an analogous fashion we find that the expected value of y A  yB with
contamination is given by
f (c f , A  c f , B )  w(1  f )(c1 f , A  c1 f , B )
.
(4)
E   
f  w(1  f )
BMTP note that efficiency is optimized by optimizing the ratio t 2  E 2 / var . They proceed
to derive three weights, w, that will, respectively; minimize variance, minimize contamination,
and maximize t 2 . They then compare pseudo cluster randomization (with various fractions and
weights) to cluster randomization and 1:1 individual randomization in terms of inverse t 2 ratios
(which ratio is proportional to the number of subjects required to achieve fixed type I and type II
error rates using z critical values). They find that cluster randomization will be preferred if
cluster sizes are small, if the value of the ICC is small, or if the total amount of contamination is
large. However, even if contamination reduces the uncontaminated average treatment effect by
30-50% individual randomization will often be preferred.
Purpose/objective/research question/focus of study:
Description of what the research focused on and why.
There are two major limitations to the work done by BMTP that the current work overcomes.
First, the BMTP results are asymptotic, and hence assume a large number of clusters enrolled in
the experiments. This assumption has two implications. The first is that Cornfield’s second
penalty can be ignored. The second implication is that BMTP can assume that the value of  is
known. The current work explores these two issues. That is, it looks at the impact on power of
the additional degrees of freedom available under the individually randomized design and at
what can be done when the value of  cannot be assumed known. The other limitation of the
BMTP work is that it assumes homogeneous treatment effects across clusters. This assumption
appears to be standard in the health sciences literature, however, it is not recommended in
educational experiments where it is possible, if not likely, that the effect of treatment may vary
across different educational settings. Thus, in addition to presenting some results based on the
model given in (1), the current paper also considers the following model:
Yijk    i   j  ( )ij   ijk , i  1, 2; j  1,, 2m; k  1,
, nij , n1 j  n2 j  n .
(5)
This model differs from (1) only by the addition of random effects for the interaction of
2
treatment with clusters, ( )ij , which have a mean zero normal distribution with variance  TC
.
2009 SREE Conference Abstract Template
3
Findings/Results:
Description of main findings with specific details.
I begin by noting that, in general, the weighted estimators derived by BMTP all depend on a
known value of  (with the exception of the minimal contamination estimator, which BMTP
show is always inferior to other approaches in terms of sample size requirements). Since I don’t
wish to assume knowledge of  , I focus attention on comparing full cluster randomization to
randomization within cluster at a 1:1 ratio (the cases of f=1 and f=0.5). Results depend on the
total contamination at f=0.5, which I denote cht  c0.5, A  c0.5, B and the value of nq. Figure 1
examines the ratio of the total sample size required under 1:1 randomization to that required
under cluster randomization under model (1) and assuming z critical values. I note that even
when contamination is as large as 0.7, 1:1 randomization is preferable provided n and/or  is
large enough. For instance, 1:1 randomization is preferred when n=100 and  =0.1.
Figure 1 actually understates the extent to which 1:1 randomization should be preferred to
cluster randomization under model (1). This is due to the additional degrees of freedom
available to estimate the variance of the estimated treatment effect under 1:1 randomization.
With 2m clusters available for the experiment, the cluster randomized design has only 2m-2
degrees of freedom, while the 1:1 randomized design has 2mn-2m-1. For many designs this
difference in degrees of freedom will have only a negligible effect, however, if the number of
clusters is in the experiment is quite small (say 2m < 10) then the additional degrees of freedom
available under the 1:1 design can substantially add to its power advantages versus the cluster
randomized design. In the interest of space, I omit further discussion of this issue here.
An issue apparently overlooked by BMTP is that the choice of w=f/(1-f) will allow the use of
a pseudo cluster randomized design without assuming a large number of cluster enrolled or prior
knowledge of  . I will refer to a treatment effect estimate constructed in this fashion as a
weighted-invariant (WI) estimator. The expected value of the estimated treatment effect for the
WI estimator will depend on the total contamination at fractional randomization f,
c ft  c f , A  c1 f , A  c f , B  c1 f , B . While BMTP allow values of cf,i to vary from 0 to 1 in an
unrestricted fashion, this would imply that contamination could cause all control subjects to
behave as though they were uncontaminated experimental subjects and all experimental subjects
to behave as though they were uncontaminated control subjects, resulting in the absolute value of
 staying the same but  changing sign. This seems implausible, and so the restrictions
c f , A  c1 f , B  1 and c f , B  c1 f , A  1 seem logical. The degrees of freedom of the WI estimator are
the same as the degrees of freedom for 1:1 randomization, so a very good approximation to the
relative sample size required under the two designs is given by
2
tWI
(1  cht ) 2

.
t1:12 4 f (1  f ){(2  c ft ) 2}2
(6)
Since f (1  f ) is at most 0.25, in the absence of contamination we obtain the well known
result that a balanced experimental design is the most efficient. The WI design can improve on
the 1:1 design only if we can remove a substantial amount of contamination by using f > 0.5. For
2009 SREE Conference Abstract Template
4
instance, suppose that we believe that experimental subjects are unlikely to be contaminated by
control subjects and that at 1:1 randomization the control group mean will move halfway towards
the experimental group mean. That is, we think c0.5, A  0 and c0.5, B  0.5 . Now suppose that at a
fractional allocation f=0.8 experimental subjects are still uncontaminated by control subjects.
However, when 80% of the subjects in a cluster are in the experimental group contamination of
controls increases to c0.2, B  0.6 , however when only 20% are in the experimental group,
contamination decreases to c0.8, B  0.1 . Then equation (6) results in 0.925 and the WI design
would be preferred. On the other hand, if c0.8, B  0.2 and other parameters remain unchanged,
the 1:1 design is preferred.
Finally, I present results under the assumption of heterogeneous treatment effects as
2
 B2 .
described by model (5). Results under this model will depend on the parameter    TC
Hedges and Rhoads (under review) suggest that values of   0.5 are reasonable in educational
experiments. Under this model expected value calculations are unchanged, but the variance of
the estimated treatment difference yA - yB under pseudo cluster randomization becomes
varhet
2 2 f  w2 (1  f )  nq{ f  w(1  f )}2  nq{ f 2  w2 (1  f ) 2 }
.

mn
{ f  w(1  f )}2
(7)
I note that under the heterogeneous treatment effects model there no longer exists a weight w
that will result in a variance not depending on the unknown  and  parameters. Furthermore,
unlike the homogeneous treatment effects model, the degrees of freedom available to estimate
the variance in (7) are quite similar under the cluster randomized and 1:1 designs. They remain
2m-2 under the cluster randomized design and are 2m-1 under the 1:1 design. Given the
similarity in degrees of freedom, I compare the two designs in Figure 2 assuming z critical
values. As expected more heterogeneity implies that the 1:1 design is relatively disadvantaged,
although perhaps not by as much as one might expect.
Conclusions:
Given the power advantages of randomization within clusters vs. randomization of intact
clusters, researchers should be wary of opting for randomization by cluster when individual
randomization is feasible, even when contamination in the individually randomized design is
likely. Even with a conservative treatment effect heterogeneity value of  =0.5, when clusters
are of size n=100 and the ICC is 0.1, contamination can be as high as 50% and individual
randomization would still be preferable. I conclude that situations where cluster randomization
should be preferred to individual randomization are quite rare.
By opting for randomization within clusters, researchers should be aware that if
contamination exists unbiased estimation of treatment effects will not be possible. However,
power to detect non-null treatment effects may be of more primacy than unbiased estimation of
treatment effects in efficacy studies. Put another way, we might be willing to accept some bias
in exchange for lower mean square error. Furthermore, by allocating one subset of clusters for
2009 SREE Conference Abstract Template
5
randomization by clusters and another subset for randomization by individuals within clusters,
contamination effects may be estimated and an unbiased estimate of the treatment effect can be
constructed. Continuing research of mine explores issues involved in the estimation of
contamination effects.
2009 SREE Conference Abstract Template
6
Appendixes
Not included in page count.
Appendix A. References
References are to be in APA format. (See APA style examples at the end of the document.)
Borm, G.F., Melis, R.J.F., Teerenstra, S. and Peer, P.G. (2005). Pseudo cluster randomization: a
treatment allocation method to minimize contamination and selection bias. Statistics in
Medicine, 24, 3535-3547.
Cornfield, J. (1978). Randomization by Group: a formal analysis. American Journal of
Epidemiology, 108, 100-2.
Donner, A. and Klar, N. (2000). Design and Analysis of Cluster Randomization Trials in Health
Research. New York: Oxford University Press.
Raudenbush, S.W. (1997). Statistical analysis and optimal design for cluster randomization
trials. Psychological Methods, 2, 173-85.
Shadish, W.R., Cook, T.D. and Campbell, D.T. (2002). Experimental and quasi-experimental
designs for generalized causal inference. Boston: Houghton Mifflin.
Slymen, D.J. and Hovell, M.F. (1997). Cluster versus individual randomization in adolescent
tobacco and alcohol studies: illustrations for design decisions, International Journal of
Epidemiology, 26(4), 765-771.
Torgerson, D.J. (2001). Contamination in trials: is cluster randomization the answer? BMJ,
322, 355-357.
2009 SREE Conference Abstract Template
A–1
Appendix B. Tables and Figures
Not included in page count.
Figure 1. Ratio of number of subjects required with 1:1 randomization within cluster vs. number
required with cluster randomization. Homogeneous treatment effects, infinite degrees of
freedom.
2009 SREE Conference Abstract Template
B–1
Figure 2: Ratio of number of subjects required with 1:1 randomization within cluster vs.
number required with cluster randomization. Heterogeneous treatment effects, infinite degrees
of freedom.
2009 SREE Conference Abstract Template
B–2
APA Reference Style Examples
Sample Citation: Journal Article
Hypericum Depression Trial Study Group. (2002). Effect of Hypericum perforatum (St John’s
Wort) in major depressive disorder: A randomized controlled trial. JAMA, 287, 1807–
1814.
Sample Citation: Newsletter/Newspaper Article
Brown, L. S. (1993, Spring). My research with oranges. The Psychology Department Newsletter,
3, 2.
Sample Citation: Book
American Psychiatric Association. (1990). Diagnostic and statistical manual of mental disorders
(3rd ed.). Washington, DC: Author.
Booth, W. C., Colomb, G. G., & Williams, J. M. (1995). The craft of research. Chicago:
University of Chicago Press.
Sample Citation: Chapter or Section in a Book
Stephan, W. G. (1985). Intergroup relations. In G. Lindzey & E. Aronson (Eds.), The handbook
of social psychology (3rd ed., Vol. 2, pp. 599–658). New York: Random House.
Sample Citation: Web Page
Dewey, R. A. (2004). APA Style Resources by Russ Dewey. Retrieved September 8, 2004, from
http://www.psywww.com/resource/apacrib.htm
2009 SREE Conference Abstract Template
Download