Lecture 6: Repeated Measures Analyses

advertisement
Child Psychiatry
Research Methods Lecture Series
Lecture 6:
Repeated Measures Analyses
Elizabeth Garrett
esg@jhu.edu
Outline for Today
Overview
ANOVA models
Repeated Measures ANOVA
Longitudinal Data Analysis
Overview
• Linear and logistic regression thus far:
– assume each individual has one observation
– e.g. one exposure  one outcome
– can’t go back and “unexpose” the individual and see what happens
• Practically,
– useful to do “experiments” with more than one exposure and more than
one outcome per individual
– each individual serves as his own control
– much “tighter” design in terms of variability
– this is called “repeated measures:” the outcome is observed on the
same individual at multiple times under different conditions.
• Generalization of repeated measures: longitudinal analysis
– observe multiple outcomes on the same individual at different times
– might be observational or experimental
– exposure/treatment may or may not vary at different times.
ANOVA models
• ANOVA = analysis of variance (bad name!)
• A simple case of linear regression
– continuous outcome
– categorical dependent variable(s)
• Why do we hear about ANOVA so often if it is just a special
case of linear regression?
– Historically, very popular because….easy to perform WITHOUT a
computer!
– Very prevalent in psychometrics
– Interpretation is nice and simple
– In its simplest form, an ANOVA represents a generalization of the
two sample t-test. It allows for the testing of more than two groups.
– Tests to see if means in all groups are equal.
– Instead of t-statistic, we look at F-statistic
ANOVA for Independent Observations
Example: Drug Study of Hyperactivity in Children under Age 10
• 180 observations on children with hyperactivity.
• Hyperactivity (H) measured by a “scale” instrument
– range is 0 to 30
– child is designated as “hyperactive” if score > 15
– to enter the study, must score > 20
• 3 Treatments: 60 placebo, 60 ritalin, 60 “new” drug.
• Evaluation based on hyperactivity score (H) measured at
study end (2 weeks).
• Questions:
– Do all three treatments have approximately the same effect?
– Is the new drug better than placebo?
– Is the new drug as good as ritalin?
Pl
Ri
a
Ne
c
ta
el
b
w
P lRi
a Ne
c
ta
eb
l
w
io
nDru g
10 15 20 25 30
H
y
p
e
r
a
c
t
i
v
S
o
e
10 15 20 25 30
Intuitive approach
• Estimate mean H in each group:
– p = mean of H in the placebo
group
– r = mean of H in the ritalin group
– n = mean of H in the new drug
group
• Test if the means are the same or
different
– H0: group means are all the same
– H1: at least one group mean is
different than some other group
mean.
H 0:  p  r  n
 p  r
  
 n
r
H1: 



r
n

r  n
 n
 p
 p
 p
Nice thing about ANOVA models…..
Hi  0  1 I (ritalin)  2 I (new drug )  i
^
 p
^
 0  1  r
Hi | placebo  0
Hi | ritalin
^
Hi | new drug  0  2  n
•
•
•
•
0 is the estimated score for kids on placebo
1 is the “treatment” effect of ritalin
2 is the “treatment” effect of the new drug
1 - 2 is the difference in effect between ritalin and the
new drug
Hyperactivity Example Results
Source |
SS
df
MS
---------+-----------------------------Model | 2213.37778
2 1106.68889
Residual | 1611.86667
177 9.10659134
---------+-----------------------------Total | 3825.24444
179 21.3700807
Number of obs =
F( 2,
177) =
Prob > F
=
R-squared
=
Adj R-squared =
Root MSE
=
180
121.53
0.0000
0.5786
0.5739
3.0177
-----------------------------------------------------------------------------H |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------Itrt_2 | -8.366667
.5509565
-15.186
0.000
-9.453956
-7.279378
Itrt_3 | -5.866667
.5509565
-10.648
0.000
-6.953956
-4.779378
_cons |
23.1
.3895851
59.294
0.000
22.33117
23.86883
------------------------------------------------------------------------------
^
Hi  231
.  8.37 I (ritalin)  587
. I ( new drug )
^
  p  231
.
^
 0  1  r  14.7
Hi | placebo  0
Hi | ritalin
^
Hi | new drug  0  2  n  17.2
ANOVA Table
Number of obs =
180
R-squared
Root MSE
= 0.5786
= 3.01771
Adj R-squared =
0.5739
Source | Partial SS
df
MS
F
Prob > F
-----------+---------------------------------------------------Model | 2213.37778
2 1106.68889
121.53
0.0000
|
trt | 2213.37778
2 1106.68889
121.53
0.0000
|
Residual | 1611.86667
177 9.10659134
-----------+---------------------------------------------------Total | 3825.24444
179 21.3700807
.
Answers to scientific questions
1. Do all three treatments have approximately the same
effect?
No. There is evidence that the intercept alone is
not sufficient for describing variability.
Why? Pvalue on Fstatistic < 0.001
2. Is new drug better than placebo?
Yes. Treatment effect is -8.4.
Why? Pvalue on 2 is less than 0.001
3. Is the new drug as good as ritalin?
No. The treatment effect difference is 2.5
Why? Pvalue on 2 - 1 is less than 0.001***
Repeated Measures
What happens when we have more than one treatment per
individual?
Most often “experiments” and not “observational” studies
Need special methods
– each individual is considered more than once
– observations from the same person are likely to be correlated
Example:
– Consider two kids: One has placebo score of 20 and other has placebo
score of 30.
– Child with the LOW placebo score also likely to have a LOW ritalin score.
– Child with the HIGH placebo score also likely to have a HIGH ritalin
score.
 Observations from the same child are CORRELATED.
 Independence assumption of linear regression is violated.
Repeated Measures
Example: Drug Study of Hyperactivity in Children under Age 10
•
•
•
•
•
180 observations on 60 children with hyperactivity.
3 Treatments: placebo, ritalin, “new” drug.
Each child receives one of treatments at times 1,2, and 3.
Order of treatments is random
There is sufficient “wash out” period between treatments to
minimize “carry over” effects
• Evaluation based on hyperactivity score (H) measured at
study end (2 weeks).
• Questions:
– Do all three treatments have approximately the same effect?
– Is the new drug better than placebo?
– Is the new drug as good as ritalin?
H10yperactivSoe 15 20 25 30 10HyperactivSoe 15 20 25 30
H10yperactivSoe 15 20 25 30 10HyperactivSoe 15 20 25 30
1
-
31
15
-
15
45
46
-
P lRi
a Ne
c
ta
eb
l
w
iP
o
nDr
lRi
a Ne
c
u
ta
e
gb
l
-
P lRi
a Ne
c
ta
eb
l
w
iP
o
nDr
lRi
a Ne
c
u
ta
e
gb
l
Repeated Measures ANOVA Results
Source |
SS
df
MS
Number of obs =
180
---------+-----------------------------F( 61,
118) =
10.37
Model | 3223.95556
61 52.8517304
Prob > F
= 0.0000
Residual | 601.288889
118 5.09566855
R-squared
= 0.8428
---------+-----------------------------Adj R-squared = 0.7616
Total | 3825.24444
179 21.3700807
Root MSE
= 2.2574
-----------------------------------------------------------------------------H |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------Itrt_2 | -8.366667
.4121355
-20.301
0.000
-9.174437
-7.558896
Itrt_3 | -5.866667
.4121355
-14.235
0.000
-6.674437
-5.058896
_cons |
23.1
.3895851
59.294
0.000
22.33643
23.86357
---------+--------------------------------------------------------------------
^
Hi  231
.  8.37 I (ritalin)  587
. I ( new drug )
^
  p  231
.
^
 0  1  r  14.7
Hi | placebo  0
Hi | ritalin
^
Hi | new drug  0  2  n  17.2
ANOVA Table
Root MSE
= 2.25736
Adj R-squared =
0.7616
Source | Partial SS
df
MS
F
Prob > F
-----------+---------------------------------------------------Model | 3223.95556
61 52.8517304
10.37
0.0000
|
id | 1010.57778
59 17.1284369
3.36
0.0000
trt | 2213.37778
2 1106.68889
217.18
0.0000
|
Residual | 601.288889
118 5.09566855
-----------+---------------------------------------------------Total | 3825.24444
179 21.3700807
So where is the difference?
ANOVA
se()

Repeated Measures
ANOVA
se()

0
23.1
0.39
23.1
0.39
1
-8.37
0.55
-8.37
0.41
2
-5.87
0.55
-5.87
0.41
Answers to scientific questions
1. Do all three treatments have approximately the same
effect?
No. There is evidence that the intercept alone is
not sufficient for describing variability.
Why? Pvalue on Model Fstatistic < 0.001
2. Is new drug better than placebo?
Yes. Treatment effect is -8.4.
Why? Pvalue on 2 is less than 0.001
3. Is the new drug as good as ritalin?
No. The treatment effect difference is 2.5
Why? Pvalue on 2 - 1 is less than 0.001***
Other issues in repeated measures ANOVA
• “Period” Effects Example:
– Children screened into study if H > 20.
– It is likely that, if we didn’t give them anything, on average, the H scores
would go down
– This phenomenon is called “regression to the mean”
• Why is this an issue?
– We might expect all kids at time 1 to be “worse” than at other time periods.
– We need to adjust for the “period” in which drug was given.
• Related issue: “Carry over” effects
– In many studies, the treatment might be curative or at least long-lasting.
– If an individual is cured by a treatment at time 1, we would not want to
attribute his effect to placebo at time 2.
– In addition to adjustment (as we will see in a minute), it is important to
consider building a “wash out” period into cross-over designs.
Period Adjustment
Hij  0i  1 I j ( ritalin )  2 I j ( new drug )  3 I j ( j  2)  4 I j ( j  3)  ij
^
H i | placebo, time1  0
^
 0  1
H i | ritalin, time1
^
H i | new drug, time1  0  2
^
^
H i | placebo, time2  0
^
H i | ritalin, time2
^
H i | ritalin, time3
^
 0  1  3
H i | new drug , time2  0  2  3
H i | placebo, time3  0
^
+ 3
+ 4
 0  1  4
H i | new drug , time3  0  2  4
Repeated Measures ANOVA Results
Source |
SS
df
MS
Number of obs =
180
---------+-----------------------------F( 63,
116) =
11.69
Model | 3304.89281
63 52.4586161
Prob > F
= 0.0000
Residual |
520.35163
116 4.48578991
R-squared
= 0.8640
---------+-----------------------------Adj R-squared = 0.7901
Total | 3825.24444
179 21.3700807
Root MSE
=
2.118
-----------------------------------------------------------------------------H |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------Itrt_2 | -8.289234
.3878228
-21.374
0.000
-9.049352
-7.529115
Itrt_3 | -5.876126
.3892903
-15.094
0.000
-6.639121
-5.113131
Itime_2 | -.8925736
.3888018
-2.296
0.022
-1.654611
-.1305361
Itime_3 | -1.643255
.3873324
-4.242
0.000
-2.402413
-.8840976
_cons |
23.92262
.4452343
53.730
0.000
23.04998
24.79526
---------+--------------------------------------------------------------------
^
Hij  239
.  8.3I j ( ritalin )  5.9 I j ( new drug )  0.89 I j ( j  2)  164
. 4 I j ( j  3)
^
H i | placebo, time1  239
.
^
H i | ritalin, time1
^
 15.6
H i | new drug , time1  18.0
^
H i | placebo, time2  23.0
^
H i | ritalin, time2
^
 14.7
H i | new drug , time2  17.2
^
H i | placebo, time3  22.3
^
H i | ritalin, time3
^
 14.0
H i | new drug, time3  16.4
ANOVA Table
Number of obs =
180
R-squared
Root MSE
= 0.8640
= 2.11797
Adj R-squared =
0.7901
Source | Partial SS
df
MS
F
Prob > F
-----------+---------------------------------------------------Model | 3304.89281
63 52.4586161
11.69
0.0000
|
id | 1010.57778
59 17.1284369
3.82
0.0000
trt | 2163.63726
2 1081.81863
241.17
0.0000
time | 80.9372588
2 40.4686294
9.02
0.0002
|
Residual |
520.35163
116 4.48578991
-----------+---------------------------------------------------Total | 3825.24444
179 21.3700807
Answers to scientific questions
1. Do all three treatments have approximately the same
effect?
No. There is evidence that the intercept alone is
not sufficient for describing variability.
Why? Pvalue on Fstatistic < 0.001
2. Is new drug better than placebo?
Yes. Treatment effect is -8.3.
Why? Pvalue on 2 is less than 0.001
3. Is the new drug as good as ritalin?
No. The treatment effect difference is 2.4
Why? Pvalue on 2 - 1 is less than 0.001****
Is that it? Not quite…..
• Interactions!
• Is it possible that the effect of treatment is different at
different times?
• Current model: forces treatment effects to be the same
across all time periods.
• Why might this not be okay?
– What if the kids would get “better” by time period 3 anyway?
(Think about diseases/disorders which have “flares”, e.g.
depression, herpes).
• Interactions allow more flexibility in the model
• They allow the treatment effects to be different at different
times.
“Full blown” model
H ij  0 i  1 I j ( ritalin )  2 I j ( new drug ) 
3 I j ( j  2)  4 I j ( j  3) 
5 I j ( j  2) I j ( ritalin )  6 I j ( j  2) I j ( new drug ) 
7 I j ( j  3) I j ( ritalin )  8 I j ( j  3) I j ( new drug ) 
 ij
Including Interactions
^
H i | placebo, time1  0
^
H i | ritalin, time1
 0  1
^
H i | new drug, time1  0  2
^
H i | placebo, time2  0
^
H i | ritalin, time2
 0  1  3  5
^
H i | new drug , time2  0  2  3  7
^
H i | placebo, time3  0
^
H i | ritalin, time3
^
+ 3
+ 4
 0  1  4 + 6
H i | new drug , time3  0  2  4 + 8
How do we measure treatment effects?
• Now that we have interactions, 1 is not “treatment
effect” of ritalin and 2 is not “treatment effect” effect of
new drug
• For ritalin:
– 1 is the treatment effect of ritalin at time 1
– 1 + 5 is the treatment effect of ritalin at time 2
– 1 + 7 is the treatment effect of ritalin at time 3
• For new drug:
– 2 is the treatment effect of new drug at time 1
– 2 + 6 is the treatment effect of new drug at time 2
– 2 + 8 is the treatment effect of new drug at time 3
Are they significant?
Source |
SS
df
MS
---------+-----------------------------Model | 3347.79611
67 49.9671062
Residual | 477.448331
112 4.26293153
---------+-----------------------------Total | 3825.24444
179 21.3700807
Number of obs
F( 67,
112)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
180
11.72
0.0000
0.8752
0.8005
2.0647
-----------------------------------------------------------------------------H |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------Itrt_2 | -9.486124
.8174333
-11.605
0.000
-11.08826
-7.883985
Itrt_3 | -6.954453
.7701922
-9.030
0.000
-8.464002
-5.444904
Itime_2 | -1.788544
.7639526
-2.341
0.019
-3.285864
-.2912243
Itime_3 | -3.104565
.82816
-3.749
0.000
-4.727729
-1.481401
ItXt_2_2 |
1.682091
1.195857
1.407
0.160
-.6617452
4.025926
ItXt_2_3 |
1.904287
1.207546
1.577
0.115
-.4624599
4.271034
ItXt_3_2 |
.9351192
1.203284
0.777
0.437
-1.423273
3.293512
ItXt_3_3 |
2.305485
1.20026
1.921
0.055
-.0469805
4.65795
_cons |
24.69504
.6075533
40.647
0.000
23.50426
25.88583
Need another F-test
Number of obs =
180
R-squared
Root MSE
= 0.8752
= 2.06469
Adj R-squared =
0.8005
Source | Partial SS
df
MS
F
Prob > F
-----------+---------------------------------------------------Model | 3347.79611
67 49.9671062
11.72
0.0000
|
id | 1049.02243
59 17.7800412
4.17
0.0000
trt | 2110.38776
2 1055.19388
247.53
0.0000
time | 89.5774582
2 44.7887291
10.51
0.0001
trt*time | 42.9032988
4 10.7258247
2.52
0.0454|
Residual | 477.448331
112 4.26293153
-----------+---------------------------------------------------Total | 3825.24444
179 21.3700807
^
H i | placebo, time1  24.7
^
H i | ritalin, time1
 15.2
^
H i | new drug , time1  17.7
^
H i | placebo, time2  22.9
^
H i | ritalin, time2
 151
.
^
H i | new drug , time2  16.9
Ritalin Effect|time1  9.5
New Drug Effect|time1  7.0
difference  2.53
( p  0.001)
Ritalin Effect|time2  7.8
New Drug Effect|time2  6.0
difference  178
.
( p  0.03)
^
H i | placebo, time3  216
.
^
H i | ritalin, time3
^
 14.0
H i | new drug, time3  16.9
Ritalin Effect|time1  7.6
New Drug Effect|time1  4.6
difference  2.93
( p  0.001)
Answers to scientific questions
1. Do all three treatments have approximately the same
effect?
No. There is evidence that the intercept alone is
not sufficient for describing variability.
Why? Pvalue on Fstatistic < 0.001
2. Is new drug better than placebo?
Yes. Treatment effects are -9.5,-7.8,-7.6.
Why? Pvalue on 2 is less than 0.001***
3. Is the new drug as good as ritalin?
No. The treatment effect differences are -2.5,-1.8,-2.9
Why? Pvalue on differences are less than 0.05.
Longitudinal Analyses
• Repeated measures ANOVA is a simple
case of a longitudinal analysis
• In longitudinal analysis:
– can be observational or experimental study
– the observations can be at random times (need
not be at time 1, time 2, and time 3 as
previously.)
– Generally, time is an important component of
the study.
New example:
• Depression in adults
• Due to the episodic nature of depression, if an
individual is in a depressive episode today, s/he is
likely to not be in one in 8 weeks
• Evaluation of anti-depressants can be difficult for
that reason.
• In evaluation of treatments, we care about not only
IF a treatment works, but HOW SOON it works.
Clinical Trial of Paroxetine (hypothetical)
• 200 individuals blindly randomized to receive either
paroxetine or placebo beginning at week 0.
• Subjects are screened at week -1 and have to score at least
22 on Hamilton D depression scale.
• Subjects are followed for 8 weeks with evaluations at
weeks 0 (baseline), 1, 2, 4, 6, 8.
• Outcome measure is Hamilton D score.
• Questions:
– Is paroxetine more effective than placebo?
– Do individuals tend to improve more quickly on paroxetine versus
placebo?
Change in
H
a
m
i
l
t
o
n
D
S
c
r
e
-12 -10 -8 -6
Change in HamD score from week 0 to week 6
P a ro x
P
el
ti
a
n
c
e
ebo
If we stopped here, we would conclude that the drug was useless!
H
a
m
i
l
t
o
n
D
s
c
r
e
15 20 25 30
Look at data over time….
P a ro x
e ti n e
P l ac ebo
0
2
4
6
T i me
8
i n
W e
Random Effects Models
Y
• Assumes that an individual has his/her own
“intercept”/”effect”.
• Observations within individuals are correlated.
• The model estimates intercept for each person, but assumes
that individuals have the same slope (within covariate
groups)
Yij  0i  1time  eij
Notice the “i” subscript
T ime
Covariates
• Time: we know that HamD changes over time. To get
“curvy” line, we need to include more than just a linear
time variable.
• Paroxetine: we want to see if the paroxetine group differs
from the placebo group
Yij  0i  1trti  2 week j  3week j  eij
2
Results
xtreg y week week2 trt, i(id)
Random-effects GLS regression
Group variable (i) : id
Number of obs
Number of groups
=
=
1200
200
R-sq:
Obs per group: min =
avg =
max =
6
6.0
6
within = 0.7288
between = 0.4242
overall = 0.6090
Random effects u_i ~ Gaussian
corr(u_i, X)
= 0 (assumed)
Wald chi2(3)
Prob > chi2
=
=
2828.24
0.0000
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------week | -1.112177
.0740175
-15.026
0.000
-1.257249
-.9671058
week2 |
.0126631
.0089831
1.410
0.159
-.0049435
.0302697
trt |
-3.4973
.2895713
-12.078
0.000
-4.064849
-2.92975
_cons |
26.22405
.226547
115.755
0.000
25.78003
26.66808
---------+-------------------------------------------------------------------sigma_u | 1.8942426
sigma_e | 1.9043451
rho | .49734047
(fraction of variance due to u_i)
------------------------------------------------------------------------------
Results
Yij  26.2  35
. trti  11
. week j  0.01week j
2
[Yij | trti  0]  26.2  11
. week j  0.01week j
2
16 18 HamDScore 20 2 24 26
[Yij | trti  1]  22.7  11
. weeki  0.01weeki
2
Placebo
Paroxetine
0
2
4
6
T i me
8
i n
W eek
s
H
a
m
i
l
t
o
n
D
s
c
r
e
15 20 25 30
Something is wrong with our model!
P a ro x
e ti n e
P l ac ebo
0
2
4
6
T i me
8
i n
W
Problem: We need to let the treatment VARY over time!
• The approach above simply ADJUSTS for time.
• We want to see how the relationship differs between
treatment groups over time.
• We need interactions again!
Yij  0 i  1trti  2 week j  3week 2j 
4trti week j  4trti week 2j  eij
[Yij | trti  0]  0i 
2 week j 
3week 2j 
eij
[Yij | trti  1]  ( 0i  1 )  ( 2  4 )week j  ( 3  5 )week 2j  eij
Results
. xi: xtreg y i.trt*week i.trt*week2 , i(id)
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------Itrt_1 | -.0729227
.1007545
-0.724
0.469
-.2703978
.1245525
week |
.8142721
.0546507
14.900
0.000
.7071588
.9213855
week2 | -.2294649
.0066327
-34.596
0.000
-.2424647
-.2164651
ItXwee_1 | -3.852899
.0772877
-49.851
0.000
-4.00438
-3.701418
ItXweea1 |
.4842559
.00938
51.626
0.000
.4658714
.5026404
_cons |
24.36439
.2169079
112.326
0.000
23.93926
24.78952
---------+--------------------------------------------------------------------
Results
[Yij | trti  0]  24.5  0.81week j  0.23week j
2
[Yij | trti  1]  24.3  30
. weeki  0.25weeki
2
H
a
m
D
S
c
o
r
e
16 18 20 2 24
Placebo
Paroxetine
0
2
4
6
T i me
8
i n
W eek
s
H
a
m
i
l
t
o
n
D
s
c
r
e
15 20 25 30
P a ro x
e ti n e
P l ac ebo
0
2
4
6
T i me
8
i n
W
References
• Diggle, Liang, and Zeger (1994) Analysis of Longitudinal
Data
• B.S. Everitt (1995) The analysis of repeated measures: a
practical review with examples. The Statistician, 44, pp.
113-135.
• Crowder and Hand (1990) Analysis of Repeated Measures.
• D. Elkstrom (1990) Statistical analysis of repeated
measures in psychiatric research. Archives of General
Psychiatry, 47, pp.770-772.
[ANOVA (for non-repeated measures) is covered in most
basic stats books.]
Download