Analysis of Repeated Measures Data Ramon C. Littell

advertisement
Analysis of Repeated Measures Data
Ramon C. Littell
Outline:
1. Introduction
2. Uni-variate and Multi-variate Analyses Using PROC GLM
3. Mixed Model Analyses Using PROC MIXED
Effect on growth regulators on chrysanthemum plants
Data courtesy James Barrett and Terril Nell
Multivariate Data Set
proc print data=mumsmult;
Obs blk trt plt
1 3 1
1
1 3 2
2
1 9 1
3
1 9 2
4
2 3 1
5
2 3 2
6
2 9 1
7
2 9 2
8
3 3 1
9
3 3 2
10
3 9 1
11
3 9 2
12
4 3 1
13
4 3 2
14
4 9 1
15
4 9 2
16
5 3 1
17
5 3 2
18
5 9 1
19
5 9 2
20
ht1 ht2 ht3 ht4 ht5 ht6 elong chem
3.0
2.5
1.0
4.0
1.0
4.0
3.0
3.5
3.0
3.5
3.0
2.0
2.5
3.0
4.0
4.0
2.0
3.5
3.0
4.0
4.0 5.5 19.5 33.0 44.5
3.0 5.5 18.0 33.5 46.5
2.0 2.0 6.0 14.0 26.5
4.5 7.0 17.0 31.5 44.0
2.0 4.0 17.5 35.5 47.0
4.5 11.5 29.0 41.5 54.0
4.0 5.5 17.5 33.0 48.0
4.5 7.0 19.0 31.5 43.5
3.5 5.5 19.0 34.5 46.0
4.0 6.0 17.0 35.0 48.5
3.5 6.5 15.5 30.0 40.5
2.0 2.5 5.5 10.5 23.0
4.0 5.0 13.0 30.0 44.0
3.5 5.0 17.0 33.5 48.0
4.5 6.5 14.0 28.0 38.0
4.5 8.0 21.0 36.0 47.0
2.5 3.0 9.0 19.5 32.5
4.0 6.5 20.0 35.5 47.5
4.0 7.0 19.0 31.5 43.0
4.5 6.5 16.0 26.0 37.0
41.5bonzi
44.0bonzi
25.5sumag
40.0sumag
46.0bonzi
50.0bonzi
45.0sumag
40.0sumag
43.0bonzi
45.0bonzi
37.5sumag
21.0sumag
41.5bonzi
45.0bonzi
34.0sumag
43.0sumag
30.5bonzi
44.0bonzi
40.0sumag
33.0sumag
Effect on growth regulators on chrysanthemum plants
Data courtesy James Barrett and Terril Nell
Univariate ANOVA at each time
proc glm data=mumsmult; class blk chem plt;
model ht1-ht6 = blk chem blk*chem;
estimate ‘bonzi-sumag’ chem 1 -1;
The GLM Procedure
Class Level Information
Class LevelsValues
51 2 3 4 5
blk
2bonzi sumag
chem
21 2
plt
Number of Observations Read
Number of Observations Used
20
20
Effect on growth regulators on chrysanthemum plants
Data courtesy James Barrett and Terril Nell
Univariate ANOVA at each time
The GLM Procedure
Dependent Variable: ht1
Source
DF
Sum of Mean Square F Value Pr > F
Squares
9 4.61250000
10 11.62500000
19 16.23750000
Model
Error
Corrected Total
0.51250000
1.16250000
0.44
0.8834
R-Square Coeff Var Root MSE ht1 Mean
0.284065
Source
blk
chem
blk*chem
36.24178
1.078193
2.975000
DF Type III SS Mean Square F Value Pr > F
4
1
4
Parameter
bonzi-sumag
1.30000000
0.61250000
2.70000000
Estimate
0.32500000
0.61250000
0.67500000
0.28
0.53
0.58
0.8846
0.4846
0.6836
Standard t Value Pr > |t|
Error
-0.35000000 0.48218254
-0.73
0.4846
Summary of Results from ANOVA at each Time
Mean Squares (p-values) and Chem Differences (std. err.) at each Time
Time
Blk
Chem
Blk*Chem
Error
Difference
1
.325
.613(.48)
.675
1.16
-0.35(.48)
2
.481
.450(.51)
.794
0.95
-0.30(.44)
3
2.64
0.05(.93)
3.46
5.73
-0.10(1.07)
4
25.3
40.6(.25)
27.1
27.8
2.85(2.36)
5
46.0
177(.10)
46.0
54.4
5.95(3.30)
6
54.5
231(.06)
37.2
52.5
6.80(3.24)
Conclusions:
• Differences between Chems not statistically significant until Time 6, even though trends
appear to separate at Time 4
• Mean squares increase with Time, corresponding to growth of plants
• Univariate analyses at each time are valid, but not most efficient
Effect on growth regulators on chrysanthemum plants
Data courtesy James Barrett and Terril Nell
Multivariate ANOVA
repeated time / printe;
Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
DF = 10
ht1
ht2
ht3
ht4
ht5
ht6
1.000000
0.951572
0.915772
0.821807
0.688554
0.688022
ht1
ht2
ht3
ht4
ht5
ht6
0.951572
<.0001
0.915772
<.0001
0.821807
0.0019
0.688554
0.0191
0.688022
0.0193
<.0001
1.000000
0.927271
<.0001
0.831303
0.0015
0.740812
0.0091
0.722152
0.0121
<.0001
0.927271
<.0001
1.000000
0.922833
<.0001
0.791272
0.0037
0.788787
0.0039
0.0019
0.831303
0.0015
0.922833
<.0001
1.000000
0.0191
0.740812
0.0091
0.791272
0.0037
0.918780
<.0001
1.000000
0.918780
<.0001
0.909319
0.0001
0.0193
0.722152
0.0121
0.788787
0.0039
0.909319
0.0001
0.989512
<.0001
1.000000
0.989512
<.0001
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time Effect
H = Type III SSCP Matrix for Time
E = Error SSCP Matrix
S=1
Statistic
Wilks' Lambda
Pillai's Trace
Hotelling-Lawley Trace
Roy's Greatest Root
Value
0.00202528
0.99797472
492.75942574
492.75942574
M=1.5 N=2
F Value
Num DF
591.31
591.31
591.31
591.31
Den DF
Pr > F
6
6
6
6
<.0001
<.0001
<.0001
<.0001
5
5
5
5
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time*chem Effect
H = Type III SSCP Matrix for Time*Chem
E = Error SSCP Matrix
S=1
Statistic
Wilks' Lambda
Pillai's Trace
Hotelling-Lawley Trace
Roy's Greatest Root
Value
0.36066374
0.63933626
1.77266579
1.77266579
M=1.5 N=2
F Value
2.13
2.13
2.13
2.13
Num DF
Den DF
Pr > F
5
5
5
5
6
6
6
6
0.1925
0.1925
0.1925
0.1925
Conclusions from Multivariate ANOVA:
• Effect of Time significant—no surprise
• Time*Chem not significant—reflects weakness of multivariate test
Effect on growth regulators on chrysanthemum plants
Data courtesy James Barrett and Terril Nell
Univariate ANOVA (Split-plot in time)
The GLM Procedure
Repeated Measures Analysis of Variance
Tests of Hypotheses for Between Subjects Effects
Source
blk
chem
blk*chem
“Whole plot” Error
DF Type III SS Mean Square F Value Pr > F
4
1
4
10
293.9666667
183.7687500
288.7000000
899.0208333
73.4916667
183.7687500
72.1750000
89.9020833
0.82
2.04
0.80
0.5425
0.1833
0.5505
The GLM Procedure
Repeated Measures Analysis of Variance
Univariate Tests of Hypotheses for Within Subject Effects
Source
DF Type III SS Mean Square F Value Pr > F Adj Pr > F
G-G H-F
time
time*blk
time*chem
time*blk*chem
“Sub plot” Error
5 26437.68542
20
223.03333
5
266.16875
20
172.55000
50
526.60417
5287.53708
11.15167
53.23375
8.62750
10.53208
502.04
1.06
5.05
0.82
<.0001
0.4184
0.0008
0.6799
<.0001
0.4270
0.0403
0.5531
<.0001
0.4276
0.0107
0.6112
Conclusions from Split-plot in Time ANOVA:
• Chem Diff not significant (p=.18)
• Chem*Time significant (unadj. p=.0008, H-F adj. p=.01)
Effect on growth regulators on chrysanthemum plants
Data courtesy James Barrett and Terril Nell
Test for Justification of “Split-plot in time” ANOVA
Variables
Transformed Variates
Orthogonal Components
Sphericity Tests
DF Mauchly's Chi-Square Pr > ChiSq
Criterion
14
14
4.6116E-7
6.7767E-6
118.17504
96.406309
<.0001
<.0001
Conclusions:
• Sphericity Assumption does not hold
• Therefore Split-plot in Time analysis is not justified. It would result in incorrect standard
errors and invalid test of hypothesis.
Effect on growth regulators on chrysanthemum plants
Data courtesy James Barrett and Terril Nell
Mixed Model Repeated Measures Analyses
proc print data=mumsuni;
Obs blk trt plt chem time ht
1 3 1bonzi
1 3.0
1
1 3 1bonzi
2 4.0
2
1 3 1bonzi
3 5.5
3
1
3
1bonzi
4
19.5
4
1
3
1bonzi
5
33.0
5
1 3 1bonzi
6 44.5
6
1 3 2bonzi
1 2.5
7
1 3 2bonzi
2 3.0
8
1
3
2bonzi
3 5.5
9
1 3 2bonzi
4 18.0
10
1 3 2bonzi
5 33.5
11
1 3 2bonzi
6 46.5
12
1
9
1sumag
1
1.0
13
1 9 1sumag
2 2.0
14
1 9 1sumag
3 2.0
15
1 9 1sumag
4 6.0
16
1 9 1sumag
5 14.0
17
1
9
1sumag
6 26.5
18
1 9 2sumag
1 4.0
19
1 9 2sumag
2 4.5
20
1 9 2sumag
3 7.0
21
1
9
2sumag
4 17.0
22
1 9 2sumag
5 31.5
23
1 9 2sumag
6 44.0
24
2 3 1bonzi
1 1.0
25
2 3 1bonzi
2 2.0
26
2 3 1bonzi
3 4.0
27
2 3 1bonzi
4 17.5
28
2 3 1bonzi
5 35.5
29
2 3 1bonzi
6 47.0
30
2
3
2bonzi
1
4.0
31
2 3 2bonzi
2 4.5
32
2 3 2bonzi
3 11.5
33
2 3 2bonzi
4 29.0
34
2 3 2bonzi
5 41.5
35
2
3
2bonzi
6
54.0
36
2 9 1sumag
1 3.0
37
2 9 1sumag
2 4.0
38
2 9 1sumag
3 5.5
39
2
9
1sumag
4 17.5
40
Effect on growth regulators on chrysanthemum plants
Data courtesy James Barrett and Terril Nell
Mixed Model Repeated Measures Analyses
Mixed Model analysis of repeated measures data incorporates the covariance into the
analysis, resulting in efficient and valid analyses. The first step is to model the covariance
structure. It is usually to begin with unstructured covariance to examine the covariance
matrix for patterns.
The MIXED procedure uses syntax similar to the GLM procedure. A major distinction is
that only fixed effects appear in the model statement.
The repeated statement is used to define the covariance structure. The MIXED procedure
employs likelihood methods to fit the model and compute inferential statistics.
proc mixed data=mumsuni; class blk chem plt time;
model ht = chem time chem*time / ddfm=kr;
repeated time / sub=plt(chem time) type=un r rcorr;
The Mixed Procedure
Model Information
WORK.MUMSUNI
Data Set
ht
Dependent Variable
Unstructured
Covariance Structure
plt(blk*chem)
Subject Effect
REML
Estimation Method
None
Residual Variance Method
Prasad-Rao-Jeske-Kackar-Harville
Fixed Effects SE Method
Degrees of Freedom Method Kenward-Roger
Class Level Information
Class LevelsValues
51 2 3 4 5
blk
2bonzi sumag
chem
21 2
plt
61 2 3 4 5 6
time
Iteration History
Iteration Evaluations -2 Res Log Like Criterion
1
669.21399990
0
1
356.89670434 0.00000000
1
Convergence criteria met.
This is good news!
Effect on growth regulators on chrysanthemum plants
Data courtesy James Barrett and Terril Nell
Mixed Model Repeated Measures Analyses
Covariance Matrix from r option in repeated statement:
Estimated R Matrix for plt(blk*chem) 1 1 bonzi
Row Col1 Col2
Col3
Col4
Col5
Col6
1 0.8681 0.7806 1.6236 3.4014 4.3750 4.1736
2 0.7806 0.8111 1.5667 3.4639 4.7583 4.6444
3 1.6236 1.5667 4.5361 10.2403 12.5236 11.8722
4 3.4014 3.4639 10.2403 27.1181 34.4472 32.9903
5 4.3750 4.7583 12.5236 34.4472 50.6736 49.3125
6 4.1736 4.6444 11.8722 32.9903 49.3125 49.5417
Interpretation:
• The variance of height is .8681 at time 1, .8111 at time 2, 4.536 at time 3, etc.
• The covariance is .7806 between heights at times 1 and 2, 1.623 between times 1 and 3,
1.566 between times 2 and 3, etc.
• General pattern:
- variances increase with time
- covariances increase with time
Correlation Matrix from rcorr option in repeated statement:
Estimated R Correlation Matrix for
plt(blk*chem) 1 1 bonzi
Row Col1 Col2 Col3 Col4 Col5
1 1.0000 0.9302 0.8182 0.7011 0.6596
2 0.9302 1.0000 0.8168 0.7386 0.7422
3 0.8182 0.8168 1.0000 0.9233 0.8260
4 0.7011 0.7386 0.9233 1.0000 0.9293
5 0.6596 0.7422 0.8260 0.9293 1.0000
6 0.6364 0.7327 0.7920 0.9001 0.9842
Col6
0.6364
0.7327
0.7920
0.9001
0.9842
1.0000
Interpretation:
• The correlation is .9302 between heights at times 1 and 2, .8182 between times 1 and 3,
.8162 between times 2 and 3, etc.
• General pattern:
- correlations decrease with time interval
- correlations of equal time lag are similar
Effect on growth regulators on chrysanthemum plants
Data courtesy James Barrett and Terril Nell
Mixed Model Repeated Measures Analyses
Selecting a Covariance Structure
The next step is to select a covariance structure from those with the characteristics identified in
the “unstructured” covariance and correlation matrices. One such candidate is heterogeneous
autoregressive.
proc mixed data=mumsuni; class blk chem plt time;
model ht = chem time chem*time / ddfm=kr;
repeated time / sub=plt(chem time) type=arh(1) r rcorr;
The Mixed Procedure
Model Information
WORK.MUMSUNI
Data Set
ht
Dependent Variable
Heterogeneous Autoregressive
Covariance Structure
plt(blk*chem)
Subject Effect
REML
Estimation Method
None
Residual Variance Method
Prasad-Rao-Jeske-Kackar-Harville
Fixed Effects SE Method
Degrees of Freedom Method Kenward-Roger
Iteration History
Iteration Evaluations -2 Res Log Like
1
669.21399990
0
2
390.36247704
1
1
389.73086224
2
1
387.28045811
3
1
386.77198610
4
1
386.71272264
5
1
386.71128525
6
1
386.71128394
7
Criterion
0.04137490
0.02128789
0.00456145
0.00057582
0.00001498
0.00000001
0.00000000
Convergence criteria met.
Interpretation:
The estimation algorithm converges in 7 steps; more good news.
Effect on growth regulators on chrysanthemum plants
Data courtesy James Barrett and Terril Nell
Mixed Model Repeated Measures Analyses
Selecting a Covariance Structure
Estimated R Matrix for plt(blk*chem) 1 1 bonzi
Row Col1 Col2
Col3
Col4
Col5
Col6
1.0498
0.9780
2.0913
4.3011
5.0151
4.4498
1
2 0.9780 1.0623 2.2716 4.6720 5.4475 4.8335
3 2.0913 2.2716 5.6636 11.6484 13.5819 12.0511
4 4.3011 4.6720 11.6484 27.9341 32.5708 28.8999
5 5.0151 5.4475 13.5819 32.5708 44.2811 39.2904
6 4.4498 4.8335 12.0511 28.8999 39.2904 40.6491
Estimated R Correlation Matrix for
plt(blk*chem) 1 1 bonzi
Row Col1 Col2 Col3 Col4 Col5
1 1.0000 0.9261 0.8576 0.7942 0.7355
2 0.9261 1.0000 0.9261 0.8576 0.7942
3 0.8576 0.9261 1.0000 0.9261 0.8576
4 0.7942 0.8576 0.9261 1.0000 0.9261
5 0.7355 0.7942 0.8576 0.9261 1.0000
6 0.6812 0.7355 0.7942 0.8576 0.9261
Fit Statistics
-2 Res Log Likelihood
AIC (smaller is better)
AICC (smaller is better)
BIC (smaller is better)
Col6
0.6812
0.7355
0.7942
0.8576
0.9261
1.0000
386.7
400.7
401.8
407.7
Interpretation:
The arh(1) covariance and correlation matrices are similar to the “unstructured” matrices. The
AICC fit index is 401.8 and the BIC fit index is 407.7.
Type 3 Tests of Fixed Effects
Effect
Num Den F Value Pr > F
DF DF
1 18.6
2.39 0.1393
chem
5 35.9
274.95 <.0001
time
5 35.9
2.70 0.0361
chem*time
Interpretation:
Chem*time is significant (p=.036) when using the arh(1) covariance
Effect on growth regulators on chrysanthemum plants
Data courtesy James Barrett and Terril Nell
Mixed Model Repeated Measures Analyses
Selecting a Covariance Structure
Another covariance structure that is often useful is the Toeplitz structure.
proc mixed data=mumsuni; class blk chem plt time;
model ht = chem time chem*time / ddfm=kr;
repeated time / sub=plt(chem time) type=toep r rcorr;
The Mixed Procedure
Model Information
WORK.MUMSUNI
Data Set
ht
Dependent Variable
Toeplitz
Covariance Structure
plt(blk*chem)
Subject Effect
REML
Estimation Method
Profile
Residual Variance Method
Prasad-Rao-Jeske-Kackar-Harville
Fixed Effects SE Method
Degrees of Freedom Method Kenward-Roger
Iteration History
Iteration Evaluations -2 Res Log Like
Criterion
1
669.21399990
0
2
521.60761870 17800.229199
1
1
514.30158009 3121.5889640
2
1
508.26590893 999.48842706
3
1
505.47449670
0.02318228
4
3
503.09397952
0.00247659
5
1
502.66411897
0.00016872
6
1
502.63720490
0.00000106
7
1
502.63704334
0.00000000
8
Convergence criteria met.
Interpretation:
The estimation algorithm converges in 8 steps; even more good news.
Effect on growth regulators on chrysanthemum plants
Data courtesy James Barrett and Terril Nell
Mixed Model Repeated Measures Analyses
Selecting a Covariance Structure
Estimated R Matrix for plt(blk*chem) 1 1 bonzi
Row
Col1
Col2
Col3
Col4
Col5
Col6
22.0547
19.3033
14.4120
9.7663
7.1461
8.9552
1
2 19.3033 22.0547 19.3033 14.4120 9.7663 7.1461
3 14.4120 19.3033 22.0547 19.3033 14.4120 9.7663
4 9.7663 14.4120 19.3033 22.0547 19.3033 14.4120
5 7.1461 9.7663 14.4120 19.3033 22.0547 19.3033
6 8.9552 7.1461 9.7663 14.4120 19.3033 22.0547
Estimated R Correlation Matrix for
plt(blk*chem) 1 1 bonzi
Row Col1 Col2 Col3 Col4 Col5
1 1.0000 0.8752 0.6535 0.4428 0.3240
2 0.8752 1.0000 0.8752 0.6535 0.4428
3 0.6535 0.8752 1.0000 0.8752 0.6535
4 0.4428 0.6535 0.8752 1.0000 0.8752
5 0.3240 0.4428 0.6535 0.8752 1.0000
6 0.4060 0.3240 0.4428 0.6535 0.8752
Fit Statistics
-2 Res Log Likelihood
AIC (smaller is better)
AICC (smaller is better)
BIC (smaller is better)
Col6
0.4060
0.3240
0.4428
0.6535
0.8752
1.0000
502.6
514.6
515.5
520.6
Interpretation:
The “toep” and “unstructured” correlation matrices are similar, but the covariance matrices
are quite different due to heterogeneous variances, which are not accommodated by “toep.”
The AICC fit index is 515.5 and the BIC fit index is 520.6 for toep, both larger than for arh(1).
This indicates using the arh(1) covariance structure.
Effect on growth regulators on chrysanthemum plants
Data courtesy James Barrett and Terril Nell
Mixed Model Repeated Measures Analyses
The next step is to use the selected covariance structure and compute inferential statistics. For
this example, differences between the growth retardants at each time are of interest.
proc mixed data=mumsuni; class blk chem plt time;
model ht = chem time chem*time / ddfm=kr;
repeated time / sub=plt(chem time) type=arh(1);
estimate 'bonzi-sumag' chem 1 -1;
estimate 'b-s time1' trt 1 -1 trt*time 1 0 0 0 0 0
-1 0 0 0 0 0;
estimate 'b-s time2' trt 1 -1 trt*time 0 1 0 0 0 0
0 -1 0 0 0 0;
estimate 'b-s time3' trt 1 -1 trt*time 0 0 1 0 0 0
0 0 -1 0 0 0;
estimate 'b-s time4' trt 1 -1 trt*time 0 0 0 1 0 0
0 0 0 -1 0 0;
estimate 'b-s time5' trt 1 -1 trt*time 0 0 0 0 1 0
0 0 0 0 -1 0;
estimate 'b-s time6' trt 1 -1 trt*time 0 0 0 0 0 1
0 0 0 0 0 -1;
Estimates
Estimate Standard DF t Value Pr > |t|
Error
Label
bonzi-sumag
b-s time1
b-s time2
b-s time3
b-s time4
b-s time5
b-s time6
2.4750
-0.3500
-0.3000
-0.10000
2.8500
5.9500
6.8000
1.6025
0.4582
0.4609
1.0643
2.3636
2.9759
2.8513
18.6
17.5
17.4
17.4
18.3
19.6
20.5
1.54
-0.76
-0.65
-0.09
1.21
2.00
2.38
0.1393
0.4552
0.5237
0.9262
0.2433
0.0596
0.0268
Comparison of inferential results with ANOVA at each time:
Time
ANOVA
Chem p-value
Diff. (s.e.)
MIXED
Chem p-value
Diff. (s.e.)
1
2
3
4
5
6
.48
-0.35(.48)
.51
-0.30(.44)
.93
-0.10(1.07)
.25
2.85(2.36)
.10
5.95(3.30)
.06
6.80(3.24)
.45
-0.35(.46)
.52
-0.30(.46)
.92
-0.10(1.06)
.24
2.85(2.36)
.06
5.95(2.97)
.03
6.80(2.85)
Download