RANDOMIZATION TESTS FOR RESTRICTED ALTERNATING

B&m.
PergamRR
~~7967(93)E~l4-X
RANDOMIZATION
ALTERNATING
PATRICK
ONGHENA’*
Res. Ther. Voi. 32, No. 7, pp. 783-786,
1994
Copyright % 1994 Elsevier Science Ltd
Printed in Great Britain. All rights reserved
00057967:94 $7.00 + 0.00
TESTS FOR RESTRICTED
TREATMENTS
DESIGNS
and EUGENE S. EDGINGTON*
‘Katholieke
Universiteit
Leuven, Department
of Psychology,
Center for Mathematical
Psychology and
Psychological
Methodology,
Tiensestraat
102, B-3000 Leuven, Belgium and ‘The University of Calgary,
2500 University Drive NW, Calgary, Alberta, Canada TZN lN4
(Received
29 Jr.& 1993; received for publication
17 November
1993)
Summary-Alternating
Treatments
Designs (ATD) with random assignment
of the treatments
to the
measurement
times provide very powerful single-case experiments.
However, complete randomization
might cause too many consecutive administrations
of the same treatment to occur in the design. In order
to exclude these possibilities, an ATD with restricted randomization
can be used. In this article we provide
a general rationale for the random assignment procedure
in such a Restricted Alternating
Treatments
Design (RATD), and derive the corresponding
randomization
test. A software package for randomization
tests in RATD, ATD and other single-case experimental
designs [Van Damme & Onghena Single-case
randomizurion rests, version 1.1, Department
of Psychology,
Katholieke
Universiteit Leuven, Belgium] is
discussed.
INTRODUCTION
Single-case experimental
designs can be classified as within-series
designs, between-series
designs
and combined-series
designs (Barlow, Hayes & Nelson, 1984; Hayes, 1981). Randomization
tests
for within-series designs and combined-series
designs were presented by Onghena (1992) using the
rationale
of Edgington
(1987). In this paper, valid randomization
tests will be presented for
between-series
designs.
The Alternating
Treatments
Designs (ATD) is the prototype
between-series
design, which
provides and extremely powerful strategy to study the relative effectiveness
of two or more
treatments
(Barlow & Hayes, 1979; Barlow & Hersen, 1984; KratochwiIl
& Levin, 1992). If
randomization
is introduced,
it concerns random assignment of different levels of the independent
variable (the ~re~~~e~fs) to measurement
times. An ATD is then a Completely Randomized
Design
(CRD) for a single unit, with repeated measures under different levels of the independent
variable,
and a standard independent
t or ANOVA F randomization
test may be used to assess the statistical
significance (Edgington,
1987).
Like Barlow and Hersen (1984) observed, the need for randomization
in ATD’s is obvious:
“Of course, one would not want to proceed in a simple A-B-A-B-A-.B-A-B
fashion. Rather, one would want to
randomize the order of introduction
of the treatments to control for sequential confounding,
or the possibility that
introducing
Treatment
A first, for example, would bias the results in favor of Treatment A.” (p, 253)
On the other hand, for clinical, practical, or research reasons, several of the possible
ations may be proscribed.
As Barlow and Hersen (1984) also remarked:
randomiz-
“Finally, in arranging for random alternation
of treatments to avoid order effects. one must be careful not to bunch
too many administrations
of the same treatment together in a row
For example, if eight alternations
were available.
. then the investigator might want to set an upper limit of three consecutive administrations
of one treatment.”
(P. 265)
Reasons to set an upper limit
consecutively
include impracticality
*Author
on the number of times a treatment
can be administered
in certain situations
of more than some fixed number of
for correspondence.
783
PATRICK
ONGHENA and
784
EUGENE S. EDG~NGTON
consecutive applications of a treatment, the aversive effects of certain enduring treatments, or the
effect of length of exposure to some condition on the subject’s becoming aware of the experimental
manipulation.
As was demonstrated by Edgington (1987) and Onghena (1992), in order to have Type I error
rate control, the permutation of the data to obtain the distribution of test statistics must correspond
to the random assignment that is actually used. Therefore, if certain outcomes of the random
assignment procedure lead to designs that are impossible or undesirable (e.g. A-A-A-A-B-B-B-B)
and would be discarded, then the standard independent d or ANQVA F randomization test
assuming complete randomization would not be a valid test. A vahd randomization test for a
Restricted Alternating Treatments Design (RATD), that is, a randomized single-case design with
an upper limit on the number of times a treatment can be administered consecutively, however,
is possible if the restrictions on the random assignments are taken into account to obtain the
randomization distribution”
The school-board has decided to evaluate the effectiveness of two kinds of individual support
after school hours of a highly gifted child with bad performance on a norm-referenced achievement
test and with inferior grades on the school exams. One wants to compare support after school hours
by the class-room teacher (Treatment A) vs support after school hours by a special-education
teacher (Treatment B). Eight one month periods are available, each period ending with an
examination on 100 points. Suppose four of the periods are randomly assigned to Treatment A
and the others are assigned to Treatment B. There are 70 ways to pick 4 out of 8, and ~onsequentIy
there are 70 possible outcomes to this randomization procedure.
The following designs, however, are considered to be undesirable:
A -A-A-A-B-B-B-B
A--A-A-B-B-B-B-A
A-A-B-B-B-B-A-A
A-B-B-B-R-A-A-A
B-B--B--B-A-A-A-A
B-B-B-A-A-A-A-B
B-B-A-A-A-A-B-B
B-A-A-A-A-B-B-B
because one wants at least some alternation and because it is considered unfair if one teacher could
follow the student for 4 consecutive periods while the other coufd not. This leaves 62 possible
designs. These 62 designs are enumerated and one is randomly sampted to be the design that is
actually used.
The school-board expects Treatment B to be superior to Treatment A and decides to test the
null hypothesis that there is no differential effect of the treatments for any of the measurement
times, using a randomization test on the difference between means B--A. The level of significance
is set at 5%.
Suppose the actual design is A--B-B-A-B-A-A-B
and that the scores are respectively
40-51-56-45-61-54-55-66.
Consequently, the observed value of the test statistic B-A is 10. the
randomization distribution of the RATD randomization test is derived by keeping the scores fixed,
Table I. Four highest and four lowest values of the test statistic 8-A
in the f~fldomi~~ti~~ distrrbution for the restricted alternating
treatments desgn. condiiional on the data
4%51~56-4s61-%.-55-66
Design
A+-8-A-B
A-l-B
A-A-B-A-6-B-A
B
A-B-B-A.-EA-A-B
A-B--A-A-B-A-R-B
*
*
B--A*&B-A-B-A-A
B-A- A--& A-B-S-A
B-B-A--8-A-A-BA
B-8-A-B-A-B-A-A
8-A
12.0
I
I.5
10.0”
9.5
.
.
-‘9.5
- 10.0
-11.5
- 12.0
Prohahitity
Table 2. Four highest and four lowest values of the test statistic 8-A
in the randomization distribution for the completely randomized
design, conditions1 on the data:
40-Sl-56--45-61-5~555h
Probahttity
Design
I362
1,162
1.62
I,%2
.
*
A A-B-A-B
A--R-B
A-A-&-A-R
8-A .B
A-A-A-A-&~__&~
A B-EA.-B- A-A--B
.
.
$2
I:‘62
I x2
I :a
B-AA- B A-B-B-A
B-&B-B-A--A--A--A
B-B-.A-B-A-A
-B-A
B-B A-9-A-B. A A
12.0
II.5
I1
.o
10.0"
.
.
-
-
ko
Il.0
- Il.5
- 12.0
t :70
t :7a
I :70
I,70
.
.
1,;o
I,70
I,‘70
1:70
Randomization
test for restricted
ATD
785
superposing
the 62 possible designs, and calculating
and sorting the randomization
statistics (see
Table 1).
The P-value of the randomization
test is the proportion
of randomization
statistics that are not
smaller than the observed test statistic. As can be seen in Table 1, there are two randomization
statistics larger than, and one as large as, the observed test statistic. Consequently,
the P-value is
3/62 = 0.04839. Because the P-value is not larger than the level of significance, the null hypothesis
of no difference between the treatments
is rejected. For this child, the individual
support after
school hours by the special-education
teacher seemed more efficient and therefore it might be
considered to continue this treatment for the months to come.
Comparison of the CRD and the RA TD randomization test
Before carrying out randomization
tests, one should take account of the lowest possible P-value
that can be attained. The lowest possible P-value is the inverse of the number of randomizations.
Because in a CRD the number of randomizations
is at least as large as in an RATD, the lowest
possible P-value of a CRD randomization
test is never larger than the lowest possible P-value of
an RATD randomization
test.
Notice, however, that with the data as given in the example the CRD randomization
test would
give a P-value of 4/70 = 0.05714 (see Table 2), which is larger than the P-value of the RATD
randomization
test, and larger than the level of significance.
This is because one of the
randomizations
(viz. A-A-A-A-B-B--B-B)
that gives a higher test statistic than the one observed
is included in the CRD randomization
test, but excluded in the RATD randomization
test,
Generating restricted randomizations
In the example, an upper limit on the number of times a treatment
can be administered
consecutively
is set by listing all possible designs before the study is started, discarding those where
the limit is exceeded, and taking at random one of the remaining designs. Because the number of
possible designs increases very rapidly with increasing
numbers of observations,
however, the
computational
load of this procedure
is prohibitive
for many applications.
Two alternative
procedures
are available:
(1) the waste-basket
procedure,
and (2) the constructive-sequential
procedure.
The waste-basket
procedure is interesting when the number of designs exceeding a limit is only
a small proportion
of the possible designs. With this procedure,
the treatments
are randomly
assigned to the measurement
times, and if the resulting design exceeds the upper limit, the design
is discarded and another random assignment
is performed.
The waste-basket
procedure, however, is not efficient if a large proportion
of designs has to be
discarded. In this case, the constructive-sequential
procedure is more efficient. With this procedure,
treatment indicators are randomly sampled without replacement
from a population
with as many
treatment
indicators as there are measurement
times for each treatment,
and if the limit of the
number of consecutive identical treatments is reached, treatment indicators for that treatment are
temporarily
withdrawn
from the population
of treatment
indicators.
For example, with two
treatments
and an upper limit of two consecutive
identical treatments,
Treatment
A may be
randomly assigned to the first and second measurement
time, but consequently
Treatment
B has
to be assigned to the third measurement
time, and so on.
The difference in efficiency between both procedures is obvious if the statistical significance of
the randomization
test is assessment by random data permutation
(see Edgington,
1987, for the
difference between systematic and random data permutation
tests). If the statistical significance of
the randomization
test is assessed by systematic data permutation,
however, one must take account
of the fact that, with the constructive-sequential
procedure, the designs are not equally likely. For
example, with two treatments, 8 measurement
times, and an upper limit of two consecutive identical
treatments,
the design A-B-A-B-A-B-A-B
has a probability
of
while the design
A-A-B-B-A-A-B-B
has a probability
of
(~)f~f(l)(~>(l>f~>(l)
(1) = &
PATRICK ONGHENA and EUGENE S. EDGINGTON
786
to be the
weighted
significance
ation test
algorithm
design that is actually used. Consequently,
the randomization
statistics have to be
with this probability
(Cox, 1956; Kempthorne
& Doerfler, 1969). If the statistical
of the randomization
test is assessed by random data permutation,
a valid randomizis obtained if the algorithm to perform the initial randomization
is the same as the
to generate the randomization
statistics.
Software
availability
Randomization
tests for RATD’s cannot be performed with the usual permutation
algorithms
because of the restrictions on the assignments and the permutation
of the data. The SCRT program
(Van Damme & Onghena,
1993), however, is especially designed to deal with these sorts of
single-case experiments.
In addition to the randomization
tests for within-series
and combinedseries designs, one can perform randomization
tests for RATD’s easily. It is possible to restrict the
number of consecutive administrations
of the same treatment separately for each treatment or to
restrict it for some treatments
and not for others. Other interesting
features of SCRT for the
single-case researcher are: the possibility to read any customized set of possible designs from an
external file, a Statistics Editor to define a tailor-made
test statistic prior to data collection, and
a nonparametric
meta-analytic
procedure to analyze replicated single-case experiments or small-N
designs.
The program
runs on IBM/PC
(80286, 80386, or 80486) and compatibles,
and can be
obtained
together
with a 30-page manual
by e-mail (fpaag02@;blekull
l.earn
or Patrick.
Onghena@psy.kuleuven.ac.be*)
or by writing to the first author.
Ackno~ledgementsPThe
authors wish to thank Luc Delbeke. Geert Van Damme, and two anonymous
reviewers for their
helpful comments on an earlier version of the manuscript.
The first author is Research Assistant of the National Fund for
Scientific Research (Belgium).
REFERENCES
Barlow, D. H. & Hayes, S. C. (1979). Alternating treatments design: One strategy for comparing the effects of two treatments
in a single subject. Journal of Applied Behavior Analysis, 12, 119-210.
Barlow, D. H., Hayes, S. C. & Nelson, R. 0. (1984). The scienti.r/-practitioner:
Research and accountability in clinical and
educational settings. New York: Pergamon
Press.
Barlow, D. H. & Hersen, M. (1984). Single case experimental designs: Strategies for studying heharior change (2nd edn).
New York: Pergamon.
Cox, D. R. (1956). A note on weighted randomization.
Annals of‘ Mathematical
Statistics, 27. 1144-l 151.
Edgington,
E. S. (1987). Randomization tests (2nd edn). New York: Marcel Dekker.
Hayes, S. C. (1981). Single case experimental
design and empirical clinical practice. Journal qf Consulting and Clinical
Psychology, 49, 193-21 I.
Kempthorne,
0. & Doerfler, T. E. (1969). The behaviour of some significance tests under experimental
randomization.
Biometrika,
56, 231-248.
T. R. & Levin, J. R. (Eds). (1992). Single-case research design and analysis: NeLv directionsfor psychology and
education. Hillsdale, NJ: Erlbaum.
Onghena,
P. (1992). Randomization
tests for extensions and variations
of ABAB single-case experimental
designs: A
rejoinder. Behavioral Assessment, 14, 153-l 7 1.
Van Damme, G. & Onghena, P. (1993). Single-case randomization testy (version 1.1) [Computer program]. Department
of
Psychology,
Katholieke
Universiteit Leuven (Belgium).
Kratochwill,