Introduction to Modeling Continuous Longitudinal Data

advertisement
Introduction to Modeling
Continuous Longitudinal Data
and Repeated Measures ANOVA
Kristin Sainani Ph.D.
http://www.stanford.edu/~kcobb
Stanford University
Department of Health Research and Policy
Introduction to continuous
longitudinal data: Examples
Homeopathy vs. placebo in
treating pain after surgery
Day of surgery
Mean pain
assessments by
visual analogue
scales (VAS)
Days 1-7 after surgery
(morning and evening)
Copyright ©1995 BMJ Publishing Group Ltd.
Lokken, P. et al. BMJ 1995;310:1439-1442
Divalproex vs. placebo for
treating bipolar depression
Davis et al. “Divalproex in the treatment of bipolar depression: A placebo controlled study.” J
Affective Disorders 85 (2005) 259-266.
Randomized trial of in-field treatments
of acute mountain sickness
Mean (SD) score of acute
mountain sickness in subjects
treated with simulated descent
(One hour of treatment in the
hyperbaric chamber) or
dexamethasone.
Copyright ©1995 BMJ Publishing Group Ltd.
Keller, H.-R. et al. BMJ 1995;310:1232-1235
Pint of milk vs. control on bone
acquisition in adolescent females
Mean (SE) percentage increases in
total body bone mineral and bone
density over 18 months.
P values are for
the differences
between groups
by repeated
measures
analysis of
variance
Copyright ©1997 BMJ Publishing Group Ltd.
Cadogan, J. et al. BMJ 1997;315:1255-1260
Counseling vs. control on
smoking in pregnancy
Copyright ©2000 BMJ Publishing Group Ltd.
Hovell, M. F et al. BMJ 2000;321:337-342
Longitudinal data: broad form
id
1
2
3
4
5
6
time1 time2 time3 time4
31
24
14
38
25
30
29
28
20
34
29
28
15
20
28
30
25
16
26
32
30
34
29
34
Hypothetical data from Twisk, chapter 3, page 26, table 3.4
Jos W. R. Twisk. Applied Longitudinal Data Analysis for Epidemiology: A Practical Guide. Cambridge University Press, 2003.
Longitudinal data: Long form
Hypothetical data
from Twisk,
chapter 3, page
26, table 3.4
id
1
1
1
1
2
2
2
2
3
3
3
3
time
1
2
3
4
1
2
3
4
1
2
3
4
score
31
29
15
26
24
28
20
32
14
20
28
30
id
4
4
4
4
5
5
5
5
6
6
6
6
time
1
2
3
4
1
2
3
4
1
2
3
4
score
38
34
30
34
25
29
25
29
30
28
16
34
Converting data from broad to
long in SAS…
data long;
set broad;
time=1; score=time1;
time=2; score=time2;
time=3; score=time3;
time=4; score=time4;
run;
output;
output;
output;
output;
Profile plots (use long form)
The plot tells a lot!
Mean response plot
Superimposed…
smoothed
smoothed
Superimposed…
Two groups (e.g., treatment
placebo)
id
1
2
3
4
5
6
group
time1 time2 time3 time4
A
A
A
B
B
B
31
24
14
38
25
30
29
28
20
34
29
28
15
20
28
30
25
16
26
32
30
34
29
34
Hypothetical data from Twisk, chapter 3, page 40, table 3.7
Profile plots by group
B
A
Mean plots by group
B
A
Possible questions…

Overall, are there significant differences between time
points?


Overall, are there significant changes from baseline?


From plots: at time3 or time4 maybe
Do the two groups differ at any time points?


From plots: looks like some differences (time3 and 4 look different)
From plots: certainly at baseline; some difference everywhere
Do the two groups differ in their responses over time?**

From plots: their response profile looks similar over time, though A
and B are closer by the end.
Statistical analysis strategies



Strategy 1: ANCOVA on the final
measurement, adjusting for baseline
differences (end-point analysis)
Strategy 2: repeated-measures ANOVA
“Univariate” approach
Strategy 3: “Multivariate” ANOVA approach
Traditional
approaches:
this lecture

Strategy 4: GEE
Strategy 5: Mixed Models
Newer
approaches:
next lecture

Strategy 6: Modeling change
In lecture 8

Comparison of traditional and
new methods
FROM:
Ralitza Gueorguieva, PhD; John H. Krystal, MD Move Over ANOVA : Progress in Analyzing Repeated-Measures Data and Its Reflection in Papers Published in the Archives
of General Psychiatry. Arch Gen Psychiatry. 2004;61:310-317.
Things to consider:
1. Spacing of time intervals


Repeated-measures ANOVA and MANOVA require that all subjects
measured at same time intervals—our plots above assumed this too!
MANOVA weights all time intervals evenly (as if evenly spaced)
2. Assumptions of the model
 ALL strategies assume normally distributed outcome and
homogeneity of variances
 But all strategies are robust against this assumption, especially
if data set is >30
 **Univariate repeated-measures ANOVA assumes sphericity, or
compound symmetry
3. Missing Data

All traditional analyses require imputation of missing data
(also need to know: does the SAS PROC require long or broad form of data?)
Compound symmetry
Compound symmetry requires :
(a) The variances of the outcome variable must
be the same at each time point
(b) The correlation between repeated
measurements are equal, regardless of the
time interval between measurements.
(a) Variances at each time points
(visually)
Does variance look equal across time points??
--Looks like most variability at time1 and least at time4…
(a) Variances at each time points
(numerically)
id
1
2
3
4
5
6
Variance:
65.60000
time1 time2 time3 time4
31
24
14
38
25
30
29
28
20
34
29
28
15
20
28
30
25
16
26
32
30
34
29
34
20.40000 39.46667
9.76667
(b) Correlation (covariance)
across time points
time1
time2
time3
time4
time1
1.00000
0.94035
-0.14150
0.28445
time2
0.94035
1.00000
-0.02819
0.26921
time3
-0.14150
-0.02819
1.00000
0.27844
time4
0.28445
0.26921
0.27844
1.00000
Certainly do NOT have equal correlations!
Time1 and time2 are highly correlated, but
time1 and time3 are inversely correlated!
Compound symmetry would look
like…
time1
time2
time3
time4
time1
1.00000
-0.04878
-0.04878
-0.04878
time2
-0.04878
1.00000
-0.04878
-0.04878
time3
-0.04878
-0.04878
1.00000
-0.04878
time4
-0.04878
-0.04878
-0.04878
1.00000
Missing Data




Very important to fill in missing data!
Otherwise, you have to throw out the whole
observation.
With missing data, changes in the mean over
time may just reflect drop-out pattern; you
cannot compare time point 1 with 50 people
to time point 2 with 35 people!
We will implement classic “last observation
carried forward” strategy for simplicity
Other more complicated imputation strategies
may be more appropriate
LOCF
Subject
HRSD 1
HRSD 2
HRSD 3
HRSD 4
Subject 1 20
13
Subject 2 21
21
20
19
Subject 3 19
18
10
6
25
23
Subject 4 30
LOCF
Last Observation Carried Forward
Subject
HRSD 1
HRSD 2
Subject
1
Subject
2
Subject
3
Subject
4
20
13
21
21
20
19
19
18
10
6
25
23
30
HRSD 3
HRSD 4
13
30
13
Strategy 1: End-point analysis
Removes repeated measures problem by considering
only a single time point (the final one).
Ignores intermediate data completely
Asks whether or not the two group means differ at the
final time point, adjusting for differences at baseline
(using ANCOVA).
proc glm data=broad;
class group;
model time4 = time1 group;
run;
Comparing groups at every follow-up time point in this way
would hugely increase your type I error.
Strategy 1: End-point analysis
DF
Sum of
Squares
Mean Square
F Value
Pr > F
Model
2
13.50000000
6.75000000
0.57
0.6155
Error
3
35.33333333
11.77777778
Corrected Total
5
48.83333333
Source
R-Square
Coeff Var
Root MSE
time4 Mean
0.276451
11.13041
3.431877
30.83333
Source
time1
group
DF
Type I SS
Mean Square
F Value
Pr > F
1
1
3.95121951
9.54878049
3.95121951
9.54878049
0.34
0.81
0.6031
0.4343
group
time4 LSMEAN
Pr > |t|
A
B
29.3333333
32.3333
0.4343
Strategy 1: End-point analysis
DF
Sum of
Squares
Mean Square
F Value
Pr > F
Model
2
13.50000000
6.75000000
0.57
0.6155
Error
3
35.33333333
11.77777778
Corrected Total
5
48.83333333
Source
R-Square
Coeff Var
Root MSE
0.276451
11.13041
3.431877
Source
time1
group
DF
Type I SS
1
1
3.95121951
9.54878049
Mean
Least-squares means of
the two groups at
time4 Mean
time4, adjusted for
30.83333 differences
baseline
(not significantly
Square
F Value
Pr > F
different)
3.95121951
9.54878049
group
time4 LSMEAN
Pr > |t|
A
B
29.3333333
32.3333
0.4343
0.34
0.81
0.6031
0.4343
From end-point analysis…

Overall, are there significant differences between time
points?


Overall, are there significant changes from baseline?


Can’t say
Do the two groups differ at any time points?


Can’t say
They don’t differ at time4
Do the two groups differ in their responses over time?

Can’t say
Strategy 2: univariate repeated
measures ANOVA (rANOVA)
Just good-old regular ANOVA, but accounting for
between subject differences
BUT first… Naive analysis

Run ANOVA on long form of data, ignoring
correlations within subjects (also ignoring
group for now):
proc anova data=long;
class time;
model score= time ;
run;
Compares means from each time point as if they were
independent samples. (analogous to using a two-sample t-test
when a paired t-test is appropriate). Results in loss of power!
One-way ANOVA (naïve)
id
time1 time2
Within time
1
31
29
2
24
28
3
14
20
4
38
34
5
25
29
6
30
28
MEAN: 27.00 28.00
time3 time4
15
20
28
30
25
16
22.33
MEAN
26
32
Between
30
times
34
29
34
30.8327.00
SSB (between t imes)  6 x[(27  27) 2  (28  27) 2  (22.33  27) 2  (30.83  27) 2 ]  224.79
SSW (within ti me)  (31  27) 2  (24  27) 2  .....  (29  30.83) 2  (34  30.83) 2  676.17
One-way ANOVA results
The ANOVA Procedure
Dependent Variable: score
Source
DF
Model
3
Error
Corrected Total
Sum of
Squares
Mean Square
F Value
Pr > F
224.7916667
74.9305556
2.22
0.1177
20
676.1666667
33.8083333
23
900.9583333
Source
DF
Anova SS
time
3
224.7916667
Twisk: Output 3.3
Mean Square
74.9305556
F Value
2.22
Pr > F
0.1177
Univariate repeated-measures
ANOVA
Explain away some error variability by accounting for
differences between subjects:
-SSE was 676.17
-This will be reduced by variability between subjects
proc glm data=broad;
model time1-time4=;
repeated time;
run; quit;
rANOVA
id
time1 time2
1
31
29
2
24
28
3
14
20
4
38
34
5
25
29
6
30
28
MEAN: 27.00 28.00
time3
15
20
28
30
25
16
22.33
time4
26
32
30
34
29
34
30.83
Between
MEANsubjects
25.25
26.00
23.00
34.00
27.00
27.00
27.00
SSB (between times)  224.79 (from before)
SSid (between subjects)  4 x[( 25.25  27) 2  (26  27) 2  (23  27) 2  ...  (27  27) 2 ]  276.21
unexplaine d variabili ty  676.17 - 276.21  399.96
Idea of G-G and H-F corrections, analogous to pooled vs.
unpooled variance ttest: if we have to estimate more things
because variances/covariances aren’t equal, then we lose some
degrees of freedom and p-value increases.
rANOVA results
Repeated measures p-value = .0752
After G-G correction for non-sphericity=.1311
(H-F correction gives .1114)
The GLM Procedure
Repeated Measures Analysis of Variance
Univariate Tests of Hypotheses for Within Subject Effects
Source
DF
Type III SS
Mean Square
F Value
Pr > F
time
Error(time)
3
15
224.7916667
399.9583333
74.9305556
26.6638889
2.81
0.0752
Greenhouse-Geisser Epsilon
Huynh-Feldt Epsilon
Between time
variability
0.4857
0.6343
Unexplained variability
Adj Pr > F
G - G
H - F
0.1311
0.1114
These epsilons should
be 1.0 if sphericity
holds. Sphericity
assumption appears
violated.
With two groups: Naive
analysis

Run ANOVA on long form of data,
ignoring correlations within subjects:
proc anova data=long;
class time;
model score= time group group*time;
run;
As if there are 8 independent samples: 2 groups at each time
point.
Two-way ANOVA (naïve)
grp
A
A
A
MEAN:
B
B
B
time1
31
24
14
23.00
38
25
30
MEAN: 31.00
time2
29
28
20
25.67
time3 time4
MEAN
Within
time
15
26
20
32
28
30
21.00 19.33
24.75
34
30
29
25
28
16
30.33 23.67
Within
time
34
29
34
32.33
29.33
SSB(betwee n times)  224.79 (from before)
SSB (between groups)  12 x[(29.33  27) 2  (24.75  27) 2 ]  126.04
SSE  [(31  23) 2  ( 24  23) 2  (14  23) 2  ...  ( 29  25.67)]  523.33
Recall: SST=900.9583333; group by time=900.9583-523.33-224.79-126.04=26.79
Overall
mean=27
Between
groups
Results: Naïve analysis
The ANOVA Procedure
Dependent Variable: score
Source
DF
Sum of
Squares
Mean Square
F Value
Pr > F
Model
7
377.6250000
53.9464286
1.65
0.1924
Error
16
523.3333333
32.7083333
Corrected Total
23
900.9583333
Source
DF
time
group
time*group
3
1
3
Anova SS
224.7916667
126.0416667
26.7916667
Mean Square
F Value
Pr > F
74.9305556
126.0416667
8.9305556
2.29
3.85
0.27
0.1173
0.0673
0.8439
Univariate repeated-measures
ANOVA
Reduce error variability by between subject differences:
-SSE was 523.33
-This will be reduced by variability between subjects
proc glm data=broad;
class group;
model time1-time4= group;
repeated time;
run; quit;
rANOVA
grp
A
A
A
MEAN:
B
B
B
time1
31
24
14
23.00
38
25
30
MEAN: 31.00
time2
29
28
20
25.67
time3 time4
15
26
20
32
28
30
21.00 19.33
34
30
29
25
28
16
30.33 23.67
34
29
34
32.33
MEAN
25.25
26.00
23.00
24.75
Between
subjects in
each group
34.00
27.00
27.00
29.33
Between
subjects in
each group
Overall
mean=27
SS id (betweensubjects) 4 x[(25.25  24.75) 2  ( 26  24.75) 2  ...  ( 27  29.33) 2 ]  150.16
unexplaine d variabili ty  523.33  150.17  373.167
rANOVA results (two groups)
The GLM Procedure
Repeated Measures Analysis of Variance
Tests of Hypotheses for Between Subjects Effects
Source
group
Error
Usually of less
interest!
DF
Type III SS
Mean Square
F Value
Pr > F
1
4
126.0416667
150.1666667
126.0416667
37.5416667
3.36
0.1408
The GLM Procedure
Repeated Measures Analysis of Variance
Univariate Tests of Hypotheses for Within Subject
What we care
Effects about!
Source
DF
Type III SS
Mean Square
F Value
Pr > F
time
time*group
Error(time)
3
3
12
224.7916667
26.7916667
373.1666667
74.9305556
8.9305556
31.0972222
2.41
0.29
0.1178
0.8338
Greenhouse-Geisser Epsilon
Huynh-Feldt Epsilon
0.4863
0.885
Adj Pr > F
G - G
H - F
0.1743
0.6954
0.1283
0.8118
No apparent difference in
responses over time between
the groups.
From rANOVA analysis…

Overall, are there significant differences between time
points?


Overall, are there significant changes from baseline?


No, Time not statistically significant
Do the two groups differ at any time points?


No, Time not statistically significant (p=.1743, G-G)
No, Group not statistically significant (p=.1408)
Do the two groups differ in their responses over time?**

No, not even close; Group*Time (p-value>.60)
Strategy 3: rMANOVA



Multivariate: More than one dependent
variable
Multivariate Approach to repeated
measures--Treats response variable as
a multivariate response vector.
Not just for repeated measures, but
appropriate for other situations with
multiple dependent variables.
Analogous to paired t-test
n

Recall: paired t-test:
Ydiff 
y
i 1
Ydiff
SD(Ydiff )
2
 y1
n
~ Tn 1
Paired t-test compares the difference values between two time
points to their standard error.
MANOVA is just a paired t-test where the outcome variable is
a vector of difference rather than a single difference:
Where T is the
number of time points:
F (
H2 
N  T 1
)H 2
( N  1)(T  1)
Ny Tdiff y diff
S 2diff
Called: Hotelling's Trace
T-1 differences
id
1
2
3
4
5
6
group
A
A
A
B
B
B
diff1
-2
4
6
-4
4
-2
diff2
-14
-8
8
-4
-4
-12
diff3
11
12
2
4
4
18
Note: weights all differences equally, so hard to interpret if time intervals
are unevenly spaced.
Note: assumes differences follow a multivariate normal distribution +
multivariate homogeneity of variances assumption
On same output as rANOVA
proc glm data=broad;
model time1-time4=;
repeated time;
run; quit;
Null hypothesis: diff1=0, diff2=0, diff3=0
Results (time only)
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time Effect
H = Type III SSCP Matrix for time
E = Error SSCP Matrix
S=1
Statistic
Wilks' Lambda
Pillai's Trace
Hotelling-Lawley Trace
Roy's Greatest Root
M=0.5
N=0.5
Value
F Value
Num DF
Den DF
Pr > F
0.24281920
0.75718080
3.11829053
3.11829053
3.12
3.12
3.12
3.12
3
3
3
3
3
3
3
3
0.1876
0.1876
0.1876
0.1876
•4 separate F-statistics (slightly different versions of MANOVA
statistic)
•all give the same answer: change over time is not significant
•compare to rANOVA results: G-G time p-value=.13
Use Wilks’ Lambda
in general.
Use Pillai’s Trace for
small sample sizes
(when assumptions
of model are
violated)
On same output as rANOVA
proc glm data=broad;
class group;
model time1-time4= group;
repeated time;
run; quit;
Results (two groups)
The GLM Procedure
Repeated Measures Analysis of Variance
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time Effect
Statistic
Wilks' Lambda
Pillai's Trace
Hotelling-Lawley Trace
Roy's Greatest Root
Value
F Value
Num DF
Den DF
Pr > F
0.23333404
0.76666596
3.28570126
3.28570126
2.19
2.19
2.19
2.19
3
3
3
3
2
2
2
2
0.3287
0.3287
0.3287
0.3287
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time*group Effect
Statistic
Wilks' Lambda
Pillai's Trace
Hotelling-Lawley Trace
Roy's Greatest Root
Value
F Value
Num DF
Den DF
Pr > F
0.77496006
0.22503994
0.29038909
0.29038909
0.19
0.19
0.19
0.19
3
3
3
3
2
2
2
2
0.8932
0.8932
0.8932
0.8932
No differences between times.
No differences in change over time between the groups (compare
to G-G time*group p-value=.6954)
From rMANOVA analysis…

Overall, are there significant differences between time
points?


Overall, are there significant changes from baseline?


No, Time not statistically significant
Do the two groups differ at any time points?


No, Time not statistically significant (p=.3287)
Can’t say (never looked at raw scores, only difference values)
Do the two groups differ in their responses over time?**

No, not even close; Group*Time (p-value=.89)
Can also test for the shape of the
response profile…
proc glm data=broad;
class group;
model time1-time4= group;
repeated time 3 polynomial /summary ;
run; quit;
The GLM Procedure
Repeated Measures Analysis of Variance
Analysis of Variance of Contrast Variables
time_N represents the nth degree polynomial contrast for time
Contrast Variable: time_1
Source
Mean
group
Error
Contrast Variable: time_2
Source
Mean
group
Error
Contrast Variable: time_3
Source
Mean
group
Error
linear
DF
Type III SS
Mean Square
F Value
Pr > F
1
1
4
10.2083333
21.6750000
195.7666667
10.2083333
21.6750000
48.9416667
0.21
0.44
0.6716
0.5421
quadratic
DF
Type III SS
Mean Square
F Value
Pr > F
1
1
4
84.37500000
5.04166667
88.83333333
84.37500000
5.04166667
22.20833333
3.80
0.23
0.1231
0.6586
DF
Type III SS
Mean Square
F Value
Pr > F
1
1
4
130.2083333
0.0750000
88.5666667
130.2083333
0.0750000
22.141666
5.88
0.00
0.0724
0.9564
cubic
Can also get successive paired
t-tests
proc glm data=broad;
class group;
model time1-time4= group;
repeated time profile /summary ;
run; quit;
**Not adjusted for multiple comparisons!
Repeated Measures Analysis of Variance
Analysis of Variance of Contrast Variables
time_N represents the nth successive difference in time
Contrast Variable: time_1
Source
Mean
group
Error
Contrast Variable: time_2
Source
Mean
group
Error
Contrast Variable: time_3
Source
Mean
group
Error
Time1 vs. time2
DF
Type III SS
Mean Square
F Value
Pr > F
1
1
4
6.00000000
16.66666667
69.33333333
6.00000000
16.66666667
17.33333333
0.35
0.96
0.5879
0.3823
Time2 vs. time3
DF
Type III SS
Mean Square
F Value
Pr > F
1
1
4
192.6666667
6.0000000
301.3333333
192.6666667
6.0000000
75.3333333
2.56
0.08
0.1850
0.7918
Time3 vs. time4
DF
Type III SS
Mean Square
F Value
Pr > F
1
1
4
433.5000000
0.1666667
191.3333333
433.5000000
0.1666667
47.8333333
9.06
0.00
0.0395
0.9558
Univariate vs. multivariate


If compound symmetry assumption is
met, univariate approach has more
power (more degrees of freedom).
But, if compound symmetry is not met,
then type I error is increased
Summary: rANOVA and
rMANOVA




Require imputation of missing data
rANOVA requires compound symmetry
(though there are corrections for this)
Require subjects measured at same
time points
But, easy to implement and interpret
Practice: rANOVA and
rMANOVA
What effects do effects,
Within-subjects
you
but no between-subjects
expect
to be statistically
effects.
significant?
Time is significant.
Time?
Group*time is significant.
Group?
Group is not significant.
Time*group?
Practice: rANOVA and
rMANOVA
Between group
effects; no within
subject effects:
Time is not
significant.
Group*time is not
significant.
Group IS significant.
Practice: rANOVA and
rMANOVA
Some within-group
effects, no betweengroup effect.
Time is significant.
Group is not
significant.
Time*group is not
significant.
References

Jos W. R. Twisk. Applied Longitudinal Data Analysis for Epidemiology: A
Practical Guide. Cambridge University Press, 2003.
Download