Comparative Analyses of Pretest

advertisement
In this section, The American Statistician publishes articles and
notes of interestto teachers of the firstmathematicalstatistics
course and of applied statisticscourses. To be suitable for this
section,articlesand notes should be usefulto a substantialnumber
of teachers of such a course or should have the potential for
affectingthe way in which the course is taught.
fundamentally
ResearchDesigns
ComparativeAnalysesof Pretest-Posttest
DONNA R. BROGAN AND MICHAEL H. KUTNER*
Two common methodsof analyzingdata froma two-grouppretestposttestresearchdesign are (a) two-samplet test on the difference
score between pretest and posttest and (b) repeated-measures/
split-plot analysis of variance. The repeated-measures/split-plot
analysis subsumes the t test analysis, althoughthe formerrequires
more assumptions to be satisfied. A numericalexample is given
to illustratesome oftheequivalences ofthetwo methodsofanalysis.
The investigatorshould choose the methodof analysis based on the
researchobjective(s).
analysis; t test; PreKEY WORDS: Repeated-measures/split-plot
test-posttestdesigns.
1. INTRODUCTION
A commonresearchdesignis the two-grouppretest/
posttest design with one dependent variable where
subjects are not matched and may or may not be
randomlyassignedto thetwo groups(Cook and Campbell 1979). When the two groups are not formedby
randomassignmentof subjects,a randomsample from
each of the two groups is necessary. This design can
be extended to more than two groups; an example is
the comparison of several differenttreatmentswith
each otheror witha controlgroupin whicheach group
is measured on a pretestand posttest.
The statistical analysis for these designs can be
approached fromseveral viewpoints.If the dependent
variable is measured on an intervalor ratio scale, a
common analysis is to definea differencescore for
each subject (posttestminuspretestor vice versa) or
a relative differencemeasure (the differencedivided
by the pretest)and then test the null hypothesisthat
the means or medians of the (relative) differencesare
equal for each group. In many cases the t test or
analysis of variance is used, althoughnonparametric
tests could also be used, for example, the MannWhitneyU test, or the median test, or theiranalogs
formore thantwo groups.
Covariance analysis,wherethe pretestscore is used
as the covariate,is anothermethodused foranalyzing
thisdesign. The differencescore methodis essentially
* Donna R. Broganis Professorand Michael H. Kutneris Associate Professorin the Statistics and BiometryDepartment,School
of Medicine, Emory University,Atlanta,GA 30322. Work on this
articlewas partiallysupportedby NCI ContractNo. CB-74101 and
USPHS GrantNo. RR39.
a special case of the analysis of covariance where the
regressioncoefficientof the postteston the pretestis
assumed to equal unity.Neter and Wasserman (1974,
p. 717) and Cox (1958, pp. 55-56) pointout thatifthe
common slope is not near one the covariance analysis
probably will be better than the differencescore
analysis. We note thatwhen an experimentalgroupis
to be compared to a controlgroup, it is oftenlikely
thatinequalityofslopes willprevailamonggroups,thus
violatingan assumptionof the analysis of covariance.
of a
Bock (1975, Sec. 7.3) compares the interpretation
analysis and covariance analysis and
difference-score
suggestsguidelinesregardingwhich analysis to use.
Still another method of analyzingthis design is to
view the pretestand posttestas a repeated-measures/
split-plotdesign or as a profileof two measurements
designs
foreach subject. Repeated-measures/split-plot
are discussed in detail by Winer (1971) and Steel and
Torrie (1980), whereas both repeated measures and
profileanalysis are discussed in Morrison(1976, Secs.
4.5, 4.6, and 5.6).
This articleillustratessome of the equivalences and
differencesbetween the differencescore analysis and
or profile analysis.
the repeated-measures/split-plot
The numericalexample and major discussion are for
designwhere subjects are
a two-grouppretest/posttest
not matched.Concludingremarksindicatehow the resultscan be extendedeasily to morethantwo groups.
2. A NUMERICAL EXAMPLE
We consider data fromRikkers et al. (1978), who
report results of a prospective randomized surgical
trial allocating cirrhoticpatients who had bled from
varices to eithera nonselectiveshunt(standardoperation) or to a selective shunt (new operation). The
dependentvariableis themaximalrateofurea synthesis
(MRUS), whichis a quantitativetest of liverfunction.
Poor liver functionis associated with a low MRUS
value. MRUS was measured preoperativelyand early
postoperativelyin eight selective shunt patientsand
thirteennonselectiveshuntpatients.The purposes of
the studywere to assess preoperativelythe comparabilityof the selective and the nonselectivegroupsand
evaluate the change in liver function
to longitudinally
C) The American Statistician,November 1980, Vol. 34, No. 4
229
1. Pre and Post MaximalRate of Urea Synthesis
Level (mg urea N/hr/kg
BW314)and Sample
Cell Means, by Group
Group
Subject
Selective Shunt
(new operation)
1
2
3
4
5
6
7
8
Pre
Post
51
35
66
40
39
46
52
42
48
55
60
35
36
43
46
54
All = 46.375
Mean
Nonselective Shunt
(standard operation)
/L12 =
34
40
34
36
38
32
44
50
60
63
50
42
43
9
10
11
12
13
14
15
16
17
18
19
20
21
Mean
=21
/412 -
/11
422
=
47.125
-
11)
-
(G22
-
tk21) =
0
0
=
0
3. REPEATED-MEASURES ANALYSIS
31.462
of the two groups. Table 1 reportsthe MRUS values
foreach patientforthepreoperativeand postoperative
periods and the respectivecell means.
For completeness Table 2 displays the standard
repeated-measuresanalysis of variance table (analysis
of means method);in thisexample the totalnumberof
subjects, n, is 21. The hypotheses of interestto the
researchers were the interactiontest and the simple
effects test on equality of preoperative population
means between groups. The test for interactionis
significant(F = 11.36 with 1 and 19 df, p < .005);
therefore,it is concluded that the pre/postaverage
differchange in the nonselectivegroupis significantly
ent fromthe pre/postaverage change in the selective
group (see the figure).In the presence of a significant
interactioneffectitis generallyofinterestto testsimple
effectsratherthan main effects(Winer 1971, p. 529).
The Bonferronimultiple-comparisonprocedure (see
Neter and Wasserman 1974) was adopted to test the
followingcontrasts:
G412
-21
=
Using an experimentwise errorrate of .05, we conclude that the interactioneffect is significantand
/21 is significantly
greater than 922. Therefore,significantdeteriorationof liverfunctionoccurredin the
nonselectivepatientsbetween preoperativeand early
postoperativeevaluationperiods,whereasthe selective
group had no apparentdeleteriouseffect.Two points
are worth noting in the example just cited: (a) The
equality of slopes test using the preoperativeMRUS
values as a covariate is rejected(p < .02); and (b) the
significantinteractioneffectrequires special handling
when testingthe last contrast since, for the pretest
level, we have a two-groupexperimentin whichthere
are no repeated measures. Therefore,the appropriate
errortermfor this type of comparison is MS (within
cell). For a more extensive coverage of this pointthe
reader is referredto Winer(1971, pp. 529-532).
16
36
16
18
32
14
20
43
45
67
36
34
32
43.538
/21
/422-
and
0
11 =
We now discuss the statistical properties of the
repeated-measures analysis of variance for this
example and compare it withthe statisticalproperties
of the differencescore analysis. Using the model proposed by Winer(1971, p. 519), we have
Xijk = /U + ai + rk(i) + /3j+ a/i j
+ 131jk(i)
+ Em(ijk)
(3.1)
j = 1, 2 (pretest= 1, posttest= 2),
i = 1, 2 (group 1 = 1, group 2
k = 1, 2, .
. . ,
=
2),
n1, m = 1,
where Xijk is the observed value of subject k within
groupi at timej,
, is the overall mean,
ai is the effectof groupi,
f1k(i) is the effectof subject k nested withingroupi,
,Bjis theeffectoftherepeated-measuresvariable] (i.e.,
pretestand posttest),
aj3ij is the interactionof groupi withlevelj of the repeated measures factor,
2. Repeated-MeasuresAnalysisof VarianceforMaximalRate of Urea SynthesisLevel
230
Source of Variation
df
Between Subjects
Groups
Subjects WithinGroups
WithinGroups
Pre/Post
Groups x Pre/Post
(Pre/Post)x Subjects
WithinGroups
20 (n - 1)
1
19 (n - 2)
21 (n)
1
1
19 (n - 2)
Sum of Squares
Mean
Squares
847.48
4440.00
847.48 (MSG)
233.68 (MSE)
317.69
407.41
681.21
317.69 (MSP)
407.4 (MSGP)
35.85 (MSPE)
C The American Statistician,November 1980, Vol. 34, No. 4
F Ratio
3.63 (MSG/MSE)
8.86 (MSP/MSPE)
11.36 (MSGP/MSPE)
and
is the interactionof subjectk withingroupi with
levelj of the repeated-measuresfactor.
50
13n1jk(i)
The following constraints are imposed on the
parameters:
where
a. = 3.
a 8i. = aBJ3.j= 0,
af.j=
a ,8ij,and so on.
coo
0
-t
"
z40
(3.2)
In the designunderdiscussion,the repeated-measures
factorand the groupfactorare each at two levels.
The generalanalysis of variance also is indicatedin
Table 2, where n is the total number of subjects.
Note thatit is not necessaryforeach groupto contain
the same number of subjects. Assuming the group
factorand thepre/postfactorto be fixedeffects,Winer
(1971) shows thatthe appropriateF tests are as indicated in the F ratio column of Table 2.
It is worthnotingexactly what null hypothesesare
tested in Table 2. The ratio MSG/MSEtests the null
hypothesisthatthereis no group main effect.This is
equivalentto testingwhetherthesum ofthepretestand
posttest observations on each subject has the same
populationmeaninthetwogroups.The ratioMSP/MSpE
tests the nullhypothesisthatthereis no prelpostmain
effectand is equivalentto testingwhetherthe population mean of the pretestobservationsis the same as
the populationmean of the posttestobservations.The
ratio MSGP/MSPE teststhe null hypothesisthatthereis
no interactionbetween the group main effectand the
pre/postmain effect.This ratio also tests whetherthe
differencebetween pretestand posttest observations
has the same populationmean in both groups. This is
the test manyresearchersare interestedin when using
this research design, since they oftenwish to assess
has had any effectupon an experiwhethera treatment
mentalgroup. Note thatthisF test has (1, n - 2) df,
which will correspondto the t test with(n - 2) df,as
discussed in the next section.
Two assumptionsare requiredto arriveat theF tests
indicatedin Table 2 (Winer 1971).
1. The pretest and posttest population variancecovariance matricesforeach group are assumed
equal.
2. The randomeffectsHk(i), PIjk(i), and Em(Ujk) from
the model in (3.1) are all independentlyand
withmeanzero and variances
normallydistributed
(
and
OE2,
cH2,
respectively.
I2.
(TO
Assumption(1), equalityof the variance-covariance
matrices,impliestwo otherresultsworthnoting.First,
the variation of subjects within the two groups is
homogeneous. That is, if each subject's pretestand
posttestobservationsare added together,thissum has
the same population variance in both groups. This
allows pooling over groups to calculate SSE. Second,
the variationof the interactionof subject and the pre!
post factoris homogeneousforthetwo groups.That is,
if the differencescore between pretestand posttestis
N
E
Shunt
v e ~~~Selective
~~~o--o
NonselectiveShuntb
- 30 E
Post (AfterSu rgery)
Pre (Before Su rgery)
Mean Pre and Post />aximal Rate of Urea Synthesis Level (MRUS) by Type of Surgery
defined for each subject, the population variance of
the difference scores is the same for both groups. This
allows pooling over groups to calculate SSPE:.
4. DIFFERENCE-SCORE ANALYSIS
Using
dik
dik
model
(3.1)
and forming a difference score
for each subject k nested in group i yields
=
Xi2k
Xilk-
= (,31
-
+
P2) +(Befor Sureir)
Poslk(i)(t
+ (Em(ilk)
Srk(r)
EM(i2k)) *
(4 .1)
- a,8i2) iS a parameter
associatedwith
The term(MR,US
group i and measures the "effect" of group i on the
differencescore dik. The null hypothesisto be tested
- a12
is How pooli - ove12=g p21 - cal22 or
l1 Ho:
= Ui821 - md822 = a.The differencescores in (4.1) can
be viewed as a one-way classification model in which
the errortermis the sum of the followingtwo terms:
(WIlk(i)
-
P2k(i))
and
(Em(ilk) -
Em(2k))
If we assume this errortermand also homogeneous
variances forthe two groups, an appropriatetest statisticis the Studentt testfortwo independentsamples
with (n - 2) df.
theerortrmi
the sum of the
followin
poteteifrms:
two
5. COMPARISON OF THE TWO ANALYSES
homogeneousedt
term
an
tw-amlso
weassmchisberro
Ifo
thetogroups
m ans appropriatetheststavracesmfore
The followingthree results are useful computationally
samples
t tes
for0 two
indtepsuaendent
titcaisltheStden
and can be verifiedeasily with the example from
etfrGop nTbe2with df.
F(n1,12)
C The American Statistician,November 1980, Vol. 34, No. 4
231
2. Ifthedifference
betweenthepretestand theposttest is formedforeach subject and a two-sample
t testis used to comparethegroupmeans ofthese
differences,thenthe calculated t = 3.371 and is
thesquare rootoftheF testforGroupsx (pre/post)
in Table 2 with(1,19) df.
3. Ifall 21 subjectsare consideredto be inone group,
thenthe t statisticto test the nullhypothesisthat
the mean differencescore is zero has 20 df and
equals 3.158. From Table 2, if we reanalyze the
within-subjectscomponentby assumingthatthe
Group x (pre/post)interactionis zero and SSGP is
pooled with SSPE, then the F test for the main
effectpre/postyields F = 9.97 with (1,20) df,
whichequals thesquareoftheprecedingt statistic.
These resultsdemonstratethatthevariousF testsin
the repeated-measuresanalysis of variance can be
obtainedby usingsimplet testson linearcombinations
ofthepreand post scores. It can be shownalgebraically
thatinterpretations
(1), (2), and (3) of theF tests hold
in the particular research design discussed in this
article,thatis, a two-grouppretest/posttest
design. In
fact,thenumericaloperationsin (1) and (2) of summing
and differencing
observationsare
the pretest/posttest
used by the latestversionof the BMD P2V programin
calculatingsums of squares in repeated-measuresdesigns (Dixon and Brown 1979).
Since the difference-score
analysis is embedded in
therepeated-measuresanalysis,therepeated-measures
about the data at
analysis provides more information
hand. Fewer assumptions,however,are requiredinthe
difference-score
analysis.The difference-score
analysis
assumes only homogeneous variances for the difference scores and a normallydistributederrortermwith
mean zero. It is easy to show thatif the assumptions
oftherepeated-measuresanalysisare satisfied,thenthe
assumptionsof the difference-score
analysis are also
met. However, the converse is not true.
In our experience,theresearcherrarelyis interested
in onlytheinteractiontest,thatis, thedifference-score
analysis. Furthermore,simpleeffectsare commonlyof
interesteven in the no interactioneffectexperiments.
Therefore, we advocate the use of the repeatedmeasures/split-plot
analysis in most instances. However, we urge the user to empiricallyvalidate the
underlyingassumptions.
In theexamplediscussedin Section2, theresearchers
wouldhave been interestedinassessingthesignificance
232
of the main effecttime (pre vs. post) if the Groups x
pre/postinteractionhad been nonsignificant.
That is,
thenonsignificant
interactionwould have indicatedthat
the two groups did not differsignificantly
on their
MRUS differencescores. The pre/postmaineffecttest
would then indicate whether the MRUS difference
score in both groups was significantly
from
different
zero, that is, whetherthe treatmentdid or did not
effectboth groups.
6. GENERALIZATION OF FINDINGS
If there are more than two groups, similarresults
can be obtained. The difference-score
analysis would
no longerbe performedby a t test but by an F test in
a one-way analysis of variance. A furtherextension
can be made wherethedifferent
groupsbeingcompared
maybe definedby severalfactorsin a factorialdesign.
However, there is no logical extension of this discussion to more than two levels of the repeatedmeasuresfactorsince a simpledifference
score analysis
would no longerbe appropriate.
[Received July1977. Revised June 1980.]
REFERENCES
Bock, R. Darrell (1975), Multiv'ariate Statistical Methods in?
Behavioral Research, New York: McGraw-Hill Book Co.
Cook, Thomas D., and Campbell, Donald T. (1979), QuasiExperimentation:Design and Analysis IssiuesJorField Settings,
Chicago: Rand McNally.
Cox, D.R. (1958), Planning of Experiments, New York: John
Wiley & Sons.
Dixon, WilfredJ., and Brown, MortonB. (eds.) (1979), Biomedical
ComputerProgramsP-Series, Berkeley:Universityof California
Press.
Morrison,Donald F. (1976), MultivariateStatistical Methods (2nd
ed.), New York: McGraw-HillBook Co.
Neter, John, and Wasserman, William (1974), Applied Linear
StatisticalModels, Homewood, Ill.: Richard D. Irwin.
Rikkers,Layton F., Rudman, Daniel, Galambos, JohnT., Fulenwider, J. Timothy, Milliken, William J., Kutner, Michael H.,
Smith,Robert B., Salam, AtefA., Sones, Peter J., and Warren,
W. Dean (1978), "A Randomized, ControlledTrial of the Distal
Spenorenal Shunt," Annals of Surgery,188, 271-282.
Steel, Robert G.D., and Torrie, James H. (1980), Principles anld
Procediures of Statistics (2nd ed.), New York: McGraw-Hill
Book Co.
Winer, B.J. (1971), Statistical Principles in ExperimentalDesign
(2nd ed.), New York: McGraw-HillBook Co.
(? The AmericanStatistician,November 1980, Vol. 34, No. 4
Download