Covariance structures in longitudinal analysis

advertisement
Covariance structures in
longitudinal analysis
Which one to choose?
Repeated Measures
Importance of Covariance Structures
variability not explained by the fixed
effects are model in the covariance
structure
represent the background variability that
the fixed effects are tested against
valid inferences for fixed effects parameters
Selecting the Appropriate Covariance
Structure
Choice of covariance structure is a
balance since:
Too simple  Type I error rate increases
Too complex  power and efficiency decreases
Example
 How does the left atrial dimension change over time in
patients newly diagnosed with atrial fibrillation?
 Atrial fibrillation is an irregularity of the heart’s rhythm
 Due to chaotic electrical activity in the upper chambers
(atria), the atria quiver instead of contracting in an
organized manner
 Atrial enlargement maybe related to how easily a subject can
go back to a normal rhythm and the likelihood of a blood clot
forming --> stroke
Heart Diagram
Example - Data
 Data source: Canadian Registry of Atrial
Fibrillation
 Left atrial dimension measured at enrolment,
Year 2, Year 4, Year 7 and Year 10
 Fit model with fixed effects only
adjust for age at first diagnosis of atrial fibrillation (AF),
gender, hypertension at enrolment and visit year
Example
Model specification
Y = Xb + Zg + e
where:
Y = Xb + e
Y = response over time
X = design matrix for fixed effects
b = parameters for fixed effects
Z = vector of 1s for the random effects
g = parameters for random effects
e = within-subject variation
SAS Code
PROC MIXED < options > ;
CLASS variables ;
MODEL dependent = < fixed-effects > < / options > ;
RANDOM random-effects < / options > ;
REPEATED < repeated-effect >
/ TYPE = covariance-structure ;
Repeat vs Random statement
The RANDOM statement relates to
random effects
The REPEATED statement relates to the
structure of the within subject errors.
Each statement has a different role…BUT
specifying a model with compound symmetry
covariance structure can be done with either
statement
Models with REPEATED Statement only
No random effects specified in model
Assume random effects error is small compared
to within subject error
Covariance structure is based only on the
within subject error.
General covariance structure
Assume homogeneity assumption for
practical reasons – reduces the number of
parameters estimated
Possible to not assume the homogeneity
assumption (can be tested but need
sufficient amount of data to specify)
Block Diagonal Covariance Matrix
r ~ N 0,
S
0
0
.
.
.
0
0
S
0
.
.
.
0
0
0
S
.
.
.
0
0
0
0
0
.
.
.
0
.
.
.
0
.
0
0
0
.
.
.
0
.
.
.
0
0
0
0
S
Covariance structures
Simple (VC – Variance Component)
1 parameter
s2
S=
0
0
0
0
s2
0
0
0
s2
0
0
s2
0
s2
Covariance structures
Unstructured (UN)
15 parameters
s21
S=
s221
s231
s241
s251
s22
s232
s242
s252
s23
s243
s253
s24
s254
s 25
Covariance structures
Compound Symmetry (CS)
2 parameters
s2 + s1
S=
s1
s1
s1
s1
s2 + s1
s1
s1
s1
s2 + s1
s1
s1
s2 + s1
s1
s2 + s1
Covariance structures
First-order Autoregressive [AR(1)]
2 parameters
s2
S=
s2r
s 2r 2
s2r3
s2r4
s2
s2r
s2r2
s2r3
s2
s2r
s2r2
s2
s 2r
s2
Covariance structures
Toeplitz (TOEP)
5 parameters
s2
S=
s1
s2
s3
s4
s2
s1
s2
s3
s2
s1
s2
s2
s1
s2
Draftsman’s plots
2D array of scatterplots for each pair of
time lagged observations
For 3 time points: Y1, Y2 and Y3
Y1 vs. Y2
Y1 vs. Y3
Y2 vs. Y3
Draftsman’s plot – Simulation examples
Y2
Y3
Y4
Y1
Y2
Y3
Independence
Draftsman’s plot – Simulation examples
Compound
Symmetry
Autoregressive
Example – Draftsman’s plot
30
40
50
60
30
40
50
60
40
50
60
20
50
60
20
30
la0
20 30 40 50 60 70
20
30
40
la2
50
60
la4
70
30
40
la7
30
50
la10
20
30
40
50
60
20
30
40
50
60
70
30
40
50
60
70
80
Example - Correlation matrix
LA_0
LA_2
LA_4
LA_7
LA_10
LA_0
LA_2
LA_4
LA_7
LA_10
1.000
0.703
0.702
0.674
0.589
1.000
0.777
0.706
0.708
1.000
0.751
0.720
1.000
0.724
1.000
Variogram
graphical description of the time/spatial
correlation between observations
summarises the relationship between
differences in pairs of measurements and
the distance of the corresponding points
from each other
Equally or unequally spaced observation
periods
Variogram
Calculate the sample variogram
components:
vijk = ½ (rij – rik)2
rij=residual
uijk = |tij – tik|
tij=time
Plot of vijk vs. uijk
 Process variance – estimated by the average of
½(rij – rlk)2 for i ≠ l
Variogram - Theoretical
Process
Variance
Random Effects
Process
Variance
Within Subject
Correlation
Measurement
Error
Time Lag
Variogram – Sitka tree example
0
50
Variogram
100
150
Example - Variogram
2
4
6
lag in months
8
10
Which covariance structure?
 Fit model with different covariance structures
 Compare goodness-of-fit statistics to choose
covariance structure
 Goodness-of-fit statistics
Bayesian information criterion (BIC)
BIC = -2loglik+ d logn
Akaike information criterion (AIC)
AIC = -2loglik+ 2d
Estimation method for the covariance
parameters
Maximum Likelihood (ML) versus
Restricted Maximum Likelihood (REML)
both are based on likelihood principles
 properties of consistency, asymptotic normality,
and efficiency
differences increase as the number of
fixed effects in the model increases
ML vs. REML
 Goodness-of-fit testing for the two methods differ
in what part of the model it assesses
ML: describes the fit of the whole model (fixed
and random effects)
REML: describes the fit of the stochastic portion
(random effects)
Which goodness-of-fit statistic?
Bayesian information criterion (BIC)
 BIC = -2loglik+ d logn
Akaike information criterion (AIC)
 AIC = -2loglik+ 2d
 The BIC has a higher penalty than AIC for
including more parameters  more simple
model
 a too simple model has inflated Type I error
rates
Typically, choose model based on AIC
Example
Which covariance structure fits the best?
Fit Statistics
UN
(15)
CS
(2)
TOEP
(5)
AR(1)
(2)
-2 Res Log Likelihood
3655.5 3670.6 3663.5 3729.5
AIC (smaller is better)
3685.5 3674.6 3673.5 3733.5
BIC (smaller is better)
3726.4 3680.0 3687.2 3739.0
Fixed Effects Parameter Estimates
Effect
Covariance
structure
Estimate
SE
t-statistic
p-value
Intercept
UN
34.237
3.681
9.3
<.0001
CS
33.265
3.832
8.68
<.0001
TOEP
33.323
3.810
8.75
<.0001
AR(1)
33.361
3.412
9.78
<.0001
UN
0.048
0.064
0.74
0.4585
CS
0.060
0.066
0.9
0.3676
TOEP
0.059
0.066
0.9
0.3693
AR(1)
0.058
0.059
0.99
0.323
UN
-1.135
1.513
-0.75
0.455
CS
-1.213
1.574
-0.77
0.4425
TOEP
-1.141
1.563
-0.73
0.4672
AR(1)
-0.995
1.391
-0.72
0.4759
Age
Female
Fixed Effect Parameters – cont’d
Effect
Covariance
structure
Estimate
SE
t-statistic
p-value
Hypertension
UN
3.123
1.548
2.02
0.0461
CS
3.007
1.610
1.87
0.0645
TOEP
3.021
1.600
1.89
0.0616
AR(1)
3.044
1.423
2.14
0.0347
UN
0.626
0.064
9.76
<.0001
CS
0.629
0.057
11.02
<.0001
TOEP
0.632
0.065
9.72
<.0001
AR(1)
0.653
0.099
6.58
<.0001
Time
Likelihood ratio test (LRT)
For nested models, can also test if the
additional parameters add a statistically
significant improvement in the model
For the example, the LRT for TOEP (5
parameters) vs. CS (2 parameters)
---> choose CS model
Summary
Graphical plots to help identify covariance
structure
AIC and BIC to choose between
covariance structures
LRT to test if additional parameters are
warranted
References
 Dawson, K.S., Gennings, C. and Carter, W.H. 1997. Two graphical
techniques useful in detecting correlation structure in repeated
measures data. The American Statistician. 51(3). 275-283.
 Diggle, P.J., Liang, K.Y. and Zeger, S.L. 1994. Analysis of
Longitudinal Data. Oxford. Clarendon Press.
 Littell, R.C., Pendergast, J. and Natarajan, R. 2000. Modelling
covariance structure in the analysis of repeated measures data.
Statistics in Medicine. 19. 1783-1819.
 Moser, E.B. 2004. Repeated Measures Modeling with PROC
MIXED. Paper 188-29. SUGI 29.
 Singer, J.D. 1998. Using SAS PROC MIXED to Fit Multilevel
Models, Hierarchichal Models, and Individual Growth Models.
Journal of Educational and Behavioral Statistics. 24(40). 323-355.
 Singer, J.D. and Willet, J.B. 2003. Applied Longitudinal Data
Analysis: Modeling Change and Event Occurrence. New York.
Oxford Univeristy Press.
 Ware, J.H. 1985. Linear models for the analysis of longitudinal
studies. The American Statistician. 39(2). 95-101.
Download