Longitudinal Studies Cross-Sectional Studies •

advertisement
Longitudinal Studies
• Repeatedly measure individuals
followed over time
Cross-Sectional Studies
• Sometimes called panel studies (e.g.
Economics, Sociology, Food Science)
• One observation on each subject
• Different subjects are measured at different points in time (e.g. at different
ages)
• Cannot entirely distinguish between cohort and age effects
• Reading ability example (DHLZ, pages
1-2)
• Able to distinguish changes over time
within individuals (age effects) from
differences among individuals (cohort
effects)
• Must account for correlations among
measurements taken on the same
individual
• Vector of measurements
Y i = (Yit1 , Yit2 , ...Yi,tn )T
i
on the ith subject, or experimental
unit.
455
456
Example:
Measure muscle strength of elderly
subjects at 0, 6, and 12 months
(Daniels and Hogan, 2000, Biometrics)
• Randomized clinical trial
• Investigate the effects of recombinant
human growth hormone (rhGH)
therapy for building and maintaining
muscle strength in the elderly
(Kiel et al, 1998).
• The trial enrolled 161 subjects and
randomized them to one of four
treatments:
– Placebo
• Both placebo and growth hormone
were administered via daily injections
• Muscle
strength
measures
were
recorded at baseline, six months and
twelve months.
• Strength is measured as the maximum
foot-pounds of torque which can be exerted against resistance provided by a
mechanical device
– Growth hormone only (0.015 mg/kg
rhGH)
– Exercise plus placebo
– Exercise plus growth hormone
457
458
Measure Depression over time (Pourahmadi and Daniels, 2002 Biometrics)
• Patients were assigned active treatment and measured weekly for 16
weeks
– Weekly depression scores
– 549 subjects with no missing baseline
covariates.
• Main questions of interest
1. Is a combined drug/psychotherapy
treatment more effective than the
only psychotherapy treatment in reducing depression?
2. Is initial severity an important predictor of patient improvement?
3. Do treatment and initial severity interact in their impact on the rate of
improvement?
• Current practices for treatment of major depression emphasizes symptom
severity in determining the need for
anti-depressant drugs.
• For these 549 patients, about 30%
(2840) of the possible measurements
were missing, mostly intermittently.
beginitemize
• Several of the studies measured depression bi-weekly for part or all of the
active phase of treatment so we have
some observations missing by design.
• Some subjects dropped out (about
16%). Some were missing completely
at random (MCAR) but others were
related to side effects of treatment or
being so depressed that they were provided an alternative treatment.
459
460
Marginal Models
Classes of Models
• Consider a vector of observations, Yi =
(Yi1 , . . . , Yini )T for i = 1, . . . , m individuals (or experimental units)
for
Longitudinal Data
• Reduce the repeated measurements to a single value for each
• Expect observations taken on the same
individual (experimental unit) to be
correlated
individual and perform a univariate analysis
• Marginal models
• Random
effects
• Consider a linear model for the conditional means, eg,
(hierarchical)
models
E(Yij ) = β0 + β1(xi) + β2tij + β3t2ij
• Model the variances and covariances
⎛
• Transition models
Vi = V ar
461
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
Yi1
Yi2
..
Yi,ni
⎞
⎡
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
=
σ12
σ12
σ12
σ22
..
..
σni,1 σni,2
· · · σ1,ni
· · · σ2,ni
..
···
2
· · · σn
i
462
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
• Consider the model: Yi ∼ N (Xiβ, Σi)
i = 1,2,...,m
Estimation in Marginal Models
• Assume independent responses from
different individuals
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
Y1
Y2
..
Ym
⎤
⎛⎡
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
⎜⎢
⎜⎢
⎜⎢
⎜⎢
⎜⎢
⎜⎢
⎜⎢
⎜⎢
⎜⎢
⎜⎢
⎝⎣
∼N
⎤
⎡
X1β ⎥⎥ ⎢⎢
⎥ ⎢
X2β ⎥⎥⎥ ⎢⎢⎢
⎥,⎢
..
⎥ ⎢
⎥ ⎢
⎥ ⎢
⎦ ⎣
Xmβ
V1 0
0 V2
..
..
0 0
···
···
...
···
0
0
0
Vm
• Good reference for likelihood estimation: Jennrich and Schluchter (1986)
Biometrics
⎤⎞
⎥⎟
⎥⎟
⎥⎟
⎥⎟
⎥⎟
⎥⎟
⎥⎟
⎥⎟
⎥⎟
⎥⎟
⎦⎠
• Ordinary Least Squares:
Minimize
(Y −Xβ)T (Y −Xβ) =
m
(Yi−Xiβ)T (Yi−Xiβ)
i=1
Set partial derivatives with respect to
the elements of β equal to zero to obtain the estimating equations
• The block diagonal covariance
structure is important
m
• The covariance structure Vi can differ
across individuals
i=1
XiT (Yi − Xiβ) = 0
The unique solution is
β̂ = (X T X)−1 X T Y
⎛
• Marginal models are used when inferences about β are of primary interest
=
m
⎜ ⎝
i=1
⎞
XiT Xi⎟⎠
−1 m
i=1
XiT Yi
463
• Maximum Likelihood Estimation:
• Generalized Least Squares:
The natural logarithm of the multivariate Gaussian likelihood is
Minimize
(Y −
Xβ)T Σ−1(Y
− Xβ)
T −1
= m
i=1(Yi − Xiβ) Vi (Yi − Xiβ)
Set partial derivatives with respect to
the elements of β equal to zero to obtain the estimating equations
m
i=1
XiT Vi−1(Yi − Xiβ) = 0
log(L(β, Σ)) =
−1 m
i=1
2
(2π)ni + log(|Vi |) + (Yi − Xiβ)T Vi−1(Yi − Xiβ)
Set partial derivatives with respect to
the elements of β equal to zero to obtain the estimating equations
−1 m
X T V −1(Yi − Xiβ) = 0
2 i=1 i i
The unique solution is
The unique solution is
β̂ =
(X T Σ−1X)−1 X T Σ−1Y
⎛
=
m
⎜ ⎝
i=1
⎞
XiT Vi−1Xi⎟⎠
−1 m
i=1
XiT Vi−1Yi
β̂ = (X T Σ−1X)−1 X T Σ−1Y
⎛
=
m
⎜ ⎝
i=1
464
⎞
XiT Vi−1Xi⎟⎠
−1 m
i=1
XiT Vi−1Yi
465
• For given Σ, the mle for β is the
generalized least squares estimator
Restricted Maximum Likelihood
Estimation (REML)
• You can simultaneously obtain the
maximum likelihood estimator Σ̂ for Σ
• Maximizing a Gaussian likelihood that
does not depend on E(Y) = Xβ .
• If we plug in β̂ for β and Σ̂ for Σ we
obtain
• Maximize a likelihood function for
“error contrasts”
−2log(L(β, Σ)) =
m
i=1
(2π)ni + log(|V̂i|) + (Yi − Xiβ)T V̂i−1(Yi − Xiβ̂)
• Likelihood ratio tests for comparing
models
• Maximum likelihood estimates of variance components tend to be too small
– linear combinations of observations
that do not depend on Xβ
– will need a set of
⎛
m
⎜ ⎝
i=1
⎞
ni⎟⎠ − rank(X)
linearly independent “error
contrasts”
466
467
Gaussian model:
To avoid losing information we must have
row rank(M ) = n − rank(X)
Y ∼ N (Xβ, Σ)
= n−p
For a non-random matrix L
LY ∼ N (L(Xβ, LΣLT )
Then a set of n − p error contrasts is
Consequently, LY does not depend on Xβ
if and only if LX = 0. But LX = 0 if and
only if
L = M (I − PX )
for some M with n =
m n rows, where
i=1 i
PX = X(X T X)−X T
468
r = M (I − PX )Y
∼ Nn−p(0, M (I − PX )Σ−1(I − PX )M T )
call this W ,
then rank(W ) = n − p
and W −1 exists.
469
For any M(n−p)×n with row rank equal to
n − p = n − rank(X)
The ”Restricted” likelihood is
L(Σ; r) =
1
(2π)(n−p)/2 |W |1/2
1 T
−1
e− 2 r W r
the log-likelihood can be expressed in
terms of
e = (I − X(XΣ−1 X T )−1X T Σ−1)Y
as
1
(Σ; e) = constant − log(|Σ|)
2
The resulting log-likelihood is
(Σ; r) =
−(n − p)
2
1
log(2π) − log|W |
2
1
− r T W −1r
2
1
1
− log(|X∗T Σ−1X∗|) − eT Σ−1e
2
2
where X∗ is any set of p =rank(X) linearly
independent columns of X .
Denote the resulting REML estimator as
Σ̂REM L
470
471
Selecting Covariance Structure
Estimation of fixed effects:
• AIC: Akaike Information Criterion
For any estimable function Cβ , the
blue is the generalized least squares
estimator
– AIC=-2 loglik + 2*p (p is number
of parameters)
– When n is large, often favors models
with too many parameters (penalty
does not change with sample size)
CbGLS = C(X T Σ−1X)−1 X T Σ−1Y
An approximation is
− T −1
C β̂ = C(X T Σ̂−1
REM L X) X Σ̂REM LY
and for “large” samples:
C β̂ ∼
˙ N (Cβ, C(X T Σ−1X)−C T )
– In SAS, when specify ML, p is the
number of fixed effects parameters
plus the number of covariance parameters; when you specify REML,
p is the number of covariance parameters
if you specified the correct model for Σ
– AICC - continuity corrected version
(see Burnham and Anderson, 1998)
472
473
• BIC: Bayesian Information Criterion
– BIC= -2 loglik + p*log(n) (n is the
number of subjects)
– based on approximation to the Bayes
Factor
– In SAS, when you specify ML, p is
the number of fixed effects parameters + number of covariance parameters; in REML, p is the number of
covariance parameters
– In SAS, n is the number of subjects
• Bayesian Analysis: specify prior distributions for β (often either a Gaussian prior or a non-informative prior)
and Σ (inverse Wishart prior); for other
choices see Leonard, 1992 Annals;
Brown, Le, and Zidek, 1994; Daniels
and Kass, 1999 JASA; Barnard, McCulloch, and Meng, 2000, Statistica
Sinica
• Empirical Bayes: estimate hyperparameters of prior distributions from the
data
474
475
What if you assumed the wrong structure
for Σ?
What if you Mis-Specify Σ?
• Often assume some parametric structure for Σ
• Often get an estimator for β that
– is consistent
– Time series structures (AR, MA)
– Structured antedependence models
(SAD), (Zimmerman and NunezAnton, 1997)
– has a large sample normal
distribution
– is not quite efficient
– Compound Symmetry
• Likelihood is a function
paramters L(β, Σ(α))
of
fewer
• A consistent estimator of the large
sample covariance matrix for the estimator of β is obtained from a sandwich
variance estimator
• More stable estimator for Σ (fewer parameters) and subsequently, a more
stable estimator for β
476
477
Shrinkage Estimators
Chen, 1979; Daniels and Kass (1999,
2001 Biometrics); Daniels and Pourahmadi
(2001)
The default in PROC MIXED in SAS is to
take
V ar(Y ) = σe2 I
You can change this by using the
REPEATED statement in PROC MIXED
• Strategy:
– Shrink toward the structure
– Data determines
shrinkage
the
amount
of
REPEATED / type =
subject = subj (program)
variables in the
class statement
– Many different parameterizations on
which to shrink
• Properties:
rcorr;
↑
print the
correlation
martix for
one subject
r
↑
print the
R
matrix for
one subject
– Consistent
– Asymptotic normality
– Asymptotic efficiency
478
479
Compound Symmetry: (type = CS)
⎡
R=
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
Unstructured: (type = UN)
σ12 + σ22
σ22
σ22
σ22
σ22
σ12 + σ22
σ22
σ22
2
2
2
2
σ2
σ2
σ1 + σ2
σ22
2
2
2
2
σ2
σ2
σ2
σ1 + σ22
Variance components: (type = VC)
(default)
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
⎡
R=
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
σ12 σ12 σ13 σ14
σ12 σ22 σ23 σ24
σ13 σ23 σ32 σ34
σ14 σ24 σ34 σ42
R=
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
σ12 0 0 0
0 σ22 0 0
0 0 σ32 0
0 0 0 σ42
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
Toeplitz: (type = TOEP)
⎡
⎡
⎤
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
R=
480
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
σ2
σ1
σ2
σ3
σ1
σ2
σ1
σ2
σ2
σ1
σ2
σ1
σ3
σ2
σ1
σ2
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
481
Heterogeneous
TOEPH)
⎡
R=
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
Toeplitz:
(type
=
σ12
σ1σ2ρ1 σ1σ2ρ2 σ1σ4ρ3
σ2σ1ρ1
σ22
σ2σ3ρ1 σ2σ4ρ2
σ3σ1ρ2 σ3σ2ρ1
σ32
σ3σ4ρ1
σ4σ1ρ3 σ4σ2ρ2 σ4σ3ρ1
σ42
R=
⎢
⎢
⎢
⎢
⎢
⎢
⎣
σ12
σ2σ1ρ1
σ3σ1ρ2ρ1
Autoregressive:
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
(type
⎡
⎤
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
1 ρ ρ2 ρ3
ρ
1 ρ ρ2
R = σ2
2
ρ
ρ 1 ρ
ρ3 ρ2 ρ 1
⎤
=
Heterogeneous AR(1): (type = ARH(1))
First order Ante-dependence:
(type = ANTE(1))
⎡
First Order
AR(1))
⎡
σ1σ2ρ1 σ1σ3ρ1ρ2
σ22
σ2σ3ρ2
σ3σ2ρ2
σ32
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎦
R=
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
σ12
σ1σ2ρ σ1σ3ρ2 σ1σ4ρ3
σ2σ1ρ
σ22
σ2σ3ρ σ2σ4ρ2
2
σ3σ1ρ
σ3σ2ρ
σ32
σ3σ4ρ
3
2
σ4σ1ρ σ4σ2ρ
σ4σ3ρ
σ42
482
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
483
Fitting Marginal Models in SAS and Splus
SAS: the MIXED procedure
Spatial power: (type = sp(pow)(list))
↑
list of variables
defining coordinates
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
1
ρd12
R = σ2
ρd13
ρd14
ρd12 ρd13 ρd14
1 ρd23 ρd24
ρd23
1 ρd34
d
d
ρ 24 ρ 34
1
⎤
/* Enter the cow protein data */
data set2;
infile ’c:\st565\dhlz.example1_4.data’;
input diet cow week protein;
run;
proc sort data=set2; by diet week; run;
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
proc means data=set2 noprint;
by diet week;
var protein;
output out=means mean=pmean;
run;
where dij is the Euclidean distance
between the i-th and j -th observations
provided by one subject (or unit).
You can replace pow with a number of
other choices.
484
proc print data=means;
run;
axis1 label=(f=swiss h=1.8 a=90 r=0 "Protein (percent)")
order = 2.5 to 4.5 by 0.5
value=(f=swiss h=1.8) w=3.0
length= 4.0in;
axis2 label=(f=swiss h=2.0 "Time(weeks)")
order = 0 to 20 by 5
value=(f=swiss h=1.8) w= 3.0
length = 6.5 in;
485
SYMBOL1 V=CIRCLE H=1.7 w=3 l=1 i=join ;
SYMBOL2 V=DIAMOND H=1.7 w=3 l=3 i=join ;
SYMBOL3 V=square H=1.7 w=3 l=9 i=join ;
PROC GPLOT DATA=means;
PLOT pmean*week=diet /
vaxis=axis1 haxis=axis2;
TITLE1 ls=0.01in H=2.0 F=swiss "Protein Content in Milk";
footnote ls=0.01in;
RUN;
/* perform a one-way anova at each time point */
proc sort data=set2; by week diet; run;
proc glm data=set2; by week;
class diet;
model protein = diet / ss1 ss3;
lsmeans diet / stderr pdiff;
run;
486
487
---------------------------- week=1 -----------------------------
---------------------------- week=19 ----------------------------
The GLM Procedure
The GLM Procedure
Dependent Variable: protein
Dependent Variable: protein
Source
DF
Sum of
Squares
Model
2
0.24488572
0.12244286
Error
76
12.31921807
0.16209497
Corrected Total
78
12.56410380
F Value
Pr > F
Source
DF
Sum of
Squares
Mean Square
F Value
Pr>F
0.76
0.4733
Model
2
1.27469477
0.63734739
6.53
.0037
Error
38
3.71148571
0.09767068
Corrected Total 40
4.98618049
protein
LSMEAN
Standard
Error
Pr > |t|
LSMEAN
Number
3.88680000
3.86111111
3.75814815
0.08052204
0.07748237
0.07748237
<.0001
<.0001
<.0001
1
2
3
diet
1
2
3
Mean Square
Least Squares Means for effect diet
Pr > |t| for H0: LSMean(i)=LSMean(j)
The GLM Procedure
Least Squares Means
protein
LSMEAN
Standard
Error
Pr > |t|
LSMEAN
Number
3.64000000
3.39571429
3.20571429
0.08667831
0.08352531
0.08352531
<.0001
<.0001
<.0001
1
2
3
diet
1
2
3
Least Squares Means for effect diet
Pr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: protein
i/j
1
2
3
1
0.8188
0.2532
2
3
0.8188
0.2532
0.3504
Dependent Variable: protein
i/j
0.3504
NOTE: To ensure overall protection level, only probabilities
associated with pre-planned comparisons should be used.
1
2
3
1
0.0495
0.0009
2
3
0.0495
0.0009
0.1160
0.1160
NOTE: To ensure overall protection level, only probabilities
associated with pre-planned comparisons should be used.
488
489
The Mixed Procedure
Model Information
Data Set
Dependent Variable
Covariance Structure
Subject Effect
Estimation Method
Residual Variance Method
Fixed Effects SE Method
Degrees of Freedom Method
/* Fit cubic trends across time with a compound
symmetry covariance structure */
WORK.SET2
protein
Compound Symmetry
cow(diet)
REML
Profile
Model-Based
Between-Within
Class Level Information
proc mixed data=set2;
class diet cow;
model protein = diet diet*week diet*week*week
diet*week*week*week
/ noint s htype=1 outpm=means;
repeated / type=cs sub=cow(diet);
run;
Class
diet
cow
Levels
Values
3
79
1 2 3
1 2 3
14 15
24 25
34 35
44 45
54 55
64 65
74 75
4 5 6
16 17
26 27
36 37
46 47
56 57
66 67
76 77
7 8 9
18 19
28 29
38 39
48 49
58 59
68 69
78 79
10
20
30
40
50
60
70
11
21
31
41
51
61
71
12
22
32
42
52
62
72
13
23
33
43
53
63
73
Iteration History
490
Iteration
Evaluations
-2 Res Log Like
Criterion
0
1
2
1
2
1
730.59188696
438.77899831
438.77813937
0.00000086
0.00000000
491
Convergence criteria met.
Covariance Parameter Estimates
Cov Parm
Subject
CS
Residual
cow(diet)
Estimate
0.02807
0.06546
Fit Statistics
-2 Res Log Likelihood
AIC (smaller is better)
AICC (smaller is better)
BIC (smaller is better)
438.8
442.8
442.8
447.5
/* Fit cubic trends across time with a general
(unstructured) covariance structure */
Null Model Likelihood Ratio Test
DF
Chi-Square
Pr > ChiSq
1
291.81
<.0001
Solution for Fixed Effects
Effect
diet
diet
1
diet
2
diet
3
week*diet
1
week*diet
2
week*diet
3
week*w*diet 1
week*w*diet 2
week*w*diet 3
week*w*w*diet 1
week*w*w*diet 2
week*w*w*diet 3
Estimate
3.9264
3.8878
3.7948
-0.1601
-0.1890
-0.1696
0.0155
0.0192
0.0159
-0.0004
-0.0005
-0.0004
Standard
Error
0.06812
0.06538
0.06549
0.02555
0.02459
0.02472
0.002994
0.002886
0.002901
0.000101
0.000097
0.000098
DF
76
76
76
1249
1249
1249
1249
1249
1249
1249
1249
1249
t Value
57.64
59.46
57.94
-6.27
-7.69
-6.86
5.18
6.67
5.49
-4.23
-5.77
-4.58
Pr > |t|
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
492
proc mixed data=set2;
class diet cow;
model protein = diet diet*week diet*week*week
diet*week*week*week
/ noint s htype=1 outpm=means;
repeated / type=un sub=cow(diet);
run;
493
Iteration History
Iteration
Evaluations
-2 Res Log Like
Criterion
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
730.59188696
759.18351015
511.07147888
397.64762263
219.69572765
-12.98148491
-154.86883417
-239.82383162
-288.63157456
-314.96920815
-328.12928957
-334.33609104
-337.20776612
-338.32927625
-338.61339846
-338.64539572
-338.64626381
20883449819
7427110783.9
4375955610.8
0.15895475
0.08776816
0.04982468
0.02786239
0.01487291
0.00741391
0.00348648
0.00162686
0.00065487
0.00017419
0.00002111
0.00000061
0.00000000
Convergence criteria met.
Covariance Parameter Estimates
Cov Parm
Subject
Estimate
UN(1,1)
UN(2,1)
UN(2,2)
UN(3,1)
.
.
UN(19,16)
UN(19,17)
UN(19,18)
UN(19,19)
cow(diet)
cow(diet)
cow(diet)
cow(diet)
.
.
cow(diet)
cow(diet)
cow(diet)
cow(diet)
0.1934
0.05180
0.07465
0.04706
.
.
0.08295
0.05973
0.08104
0.1106
Fit Statistics
-2 Res Log Likelihood
AIC (smaller is better)
AICC (smaller is better)
BIC (smaller is better)
-338.6
41.4
105.4
491.5
The Mixed Procedure
Null Model Likelihood Ratio Test
DF
Chi-Square
Pr > ChiSq
189
1069.24
<.0001
Solution for Fixed Effects
Effect
diet
Estimate
Standard
Error
DF
t Value
diet
diet
diet
week*diet
week*diet
week*diet
week*week*diet
week*week*diet
week*week*diet
week*week*week*diet
week*week*week*diet
week*week*week*diet
1
2
3
1
2
3
1
2
3
1
2
3
3.9106
3.8365
3.7323
-0.2073
-0.1988
-0.1827
0.02223
0.02222
0.01877
-0.00063
-0.00068
-0.00056
0.07821
0.07495
0.07492
0.03082
0.02969
0.02972
0.003453
0.003339
0.003334
0.000110
0.000106
0.000106
76
76
76
76
76
76
76
76
76
76
76
76
50.00
51.19
49.81
-6.73
-6.70
-6.15
6.44
6.65
5.63
-5.79
-6.40
-5.26
495
494
The Mixed Procedure
/* To test for diet effects fit another form of
the same model */
Type 1 Tests of Fixed Effects
Effect
diet
week*diet
week*week*diet
week*week*week*diet
Num
DF
Den
DF
F Value
Pr > F
3
3
3
3
76
76
76
76
13105.8
6.13
21.47
34.06
<.0001
0.0009
<.0001
<.0001
496
proc mixed data=set2;
class diet cow;
model protein = diet week diet*week week*week
diet*week*week week*week*week
diet*week*week*week
/ s htype=3 outpm=means;
repeated / type=un sub=cow(diet);
run;
497
Fit Statistics
-2 Res Log Likelihood
AIC (smaller is better)
AICC (smaller is better)
BIC (smaller is better)
-338.6
41.4
105.4
491.5
Null Model Likelihood Ratio Test
DF
Chi-Square
Pr > ChiSq
189
1069.24
<.0001
Solution for Fixed Effects
Effect
Intercept
diet
diet
diet
week
week*diet
week*diet
week*diet
week*week
week*week*diet
week*week*diet
week*week*diet
week*week*week
week*week*week*diet
week*week*week*diet
week*week*week*diet
diet
1
2
3
1
2
3
1
2
3
1
2
3
Estimate
Standard
Error
DF
t Value
3.7323
0.1783
0.1042
0
-0.1827
-0.02459
-0.01606
0
0.01877
0.003464
0.003449
0
-0.00056
-0.00008
-0.00012
0
0.07492
0.1083
0.1060
.
0.02972
0.04282
0.04201
.
0.003334
0.004800
0.004719
.
0.000106
0.000152
0.000150
.
76
76
76
.
76
76
76
.
76
76
76
.
76
76
76
.
49.81
1.65
0.98
.
-6.15
-0.57
-0.38
.
5.63
0.72
0.73
.
-5.26
-0.51
-0.82
.
498
/* To test for diet effects fit another
form of the same model */
Type 3 Tests of Fixed Effects
Effect
diet
week
week*diet
week*week
week*week*diet
week*week*week
week*week*week*diet
Num
DF
Den
DF
F Value
Pr > F
2
1
2
1
2
1
2
76
76
76
76
76
76
76
1.38
127.73
0.17
116.91
0.35
101.49
0.35
0.2587
<.0001
0.8427
<.0001
0.7031
<.0001
0.7082
proc mixed data=set2;
class diet cow;
model protein = diet week diet*week week*week
diet*week*week week*week*week
diet*week*week*week
/ s htype=3 outpm=means;
repeated / type=un sub=cow(diet);
run;
/* plot the fitted curves */
axis1 label=(f=swiss h=1.8 a=90 r=0 "Protein (percent)")
order = 2.5 to 4.5 by .5
value=(f=swiss h=1.8) w=3.0
length= 4.0in;
axis2 label=(f=swiss h=2.0 "Time(weeks)")
order = 0 to 20 by 5
value=(f=swiss h=1.8) w= 3.0
length = 6.5 in;
499
500
SYMBOL1 V=CIRCLE H=1.7 w=3 l=1 i=join ;
SYMBOL2 V=DIAMOND H=1.7 w=3 l=3 i=join ;
SYMBOL3 V=square H=1.7 w=3 l=9 i=join ;
PROC GPLOT DATA=means;
PLOT pred*week=diet /
vaxis=axis1 haxis=axis2;
TITLE1 ls=0.01in H=2.0 F=swiss "Estimated Protein Content";
footnote ls=0.01in;
RUN;
501
502
Solution for Fixed Effects
Effect
Intercept
diet
diet
diet
week
week*diet
week*diet
week*diet
week*week
week*week*diet
week*week*diet
week*week*diet
week*week*week
week*week*week*diet
week*week*week*diet
week*week*week*diet
/* To test for diet effects fit another
form of the same model */
proc mixed data=set2;
class diet cow;
model protein = diet week diet*week week*week
diet*week*week week*week*week
diet*week*week*week
/ s htype=3 outpm=means df=kr;
repeated / type=un sub=cow(diet);
run;
Fit Statistics
-2 Res Log Likelihood
AIC (smaller is better)
AICC (smaller is better)
BIC (smaller is better)
-338.6
41.4
105.4
491.5
Chi-Square
1069.24
1
2
3
1
2
3
1
2
3
1
2
3
Estimate
Standard
Error
DF
t Value
3.7323
0.1783
0.1042
0
-0.1827
-0.02459
-0.01606
0
0.01877
0.003464
0.003449
0
-0.00056
-0.00008
-0.00012
0
0.09639
0.1395
0.1365
.
0.04044
0.05829
0.05725
.
0.004728
0.006805
0.006702
.
0.000156
0.000225
0.000222
.
66.5
68.2
66.1
.
48.1
50.3
47.7
.
39.3
41.3
39
.
35.4
37.2
35.1
.
38.72
1.28
0.76
.
-4.52
-0.42
-0.28
.
3.97
0.51
0.51
.
-3.56
-0.35
-0.56
.
Effect
Intercept
diet
diet
diet
week
week*diet
week*diet
week*diet
week*week
week*week*diet
week*week*diet
week*week*diet
week*week*week
week*week*week*diet
week*week*week*diet
week*week*week*diet
Null Model Likelihood Ratio Test
DF
189
diet
Pr > ChiSq
<.0001
503
diet
1
2
3
1
2
3
1
2
3
1
2
3
Pr > |t|
<.0001
0.2053
0.4477
.
<.0001
0.6750
0.7804
.
0.0003
0.6135
0.6097
.
0.0011
0.7297
0.5810
.
504
# This file posted as milkprotein.ssc
#
# This code is applied to the milk protein
# data from DHLZ, page 8.
Type 3 Tests of Fixed Effects
Effect
diet
week
week*diet
week*week
week*week*diet
week*week*week
week*week*week*diet
set1 <- read.table("c:/mydocuments/courses/st565/data/dhlz.example1_4.data",
col.names=c("diet","cow","week","protein"))
set1
Num
DF
Den
DF
F Value
Pr > F
2
1
2
1
2
1
2
67.3
49.3
49.2
40.4
40.3
36.4
36.3
0.83
68.79
0.09
58.04
0.18
46.62
0.16
0.4403
<.0001
0.9118
<.0001
0.8394
<.0001
0.8537
# Create factors
set1$dietf <- as.factor(set1$diet)
set1$weekf <- as.factor(set1$week)
set1$cowf <- as.factor(set1$cow)
#
Sort the data set by subject
i <- order(set1$cow,set1$week)
set1 <- set1[i,]
# Delete the list called i
rm(i)
505
506
Observed Protein Means
Make a profile plot of the means
Unix users should insert the motif( )
command
3.0
x.axis <- unique(set1$week)
3.4
#
#
#
Protein (percent)
means <- tapply(set1$protein,
list(set1$week,set1$diet),mean)
means
3.8
# Compute sample means
par(fin=c(6.0,6.0),pch=18,mkh=.1,mex=1.5,
cex=1.2,lwd=3)
matplot(c(1,19), c(3.0,4.0), type="n",
xlab="Time(weeks)", ylab="Protein (percent)",
main= "Observed Protein Means")
matlines(x.axis,means,type=’l’,lty=c(1,3,7))
matpoints(x.axis,means, pch=c(16,17,15))
legend(1,2.45,legend=c("Barley diet",
’Barley+lupins’,’Lupin diet’),lty=c(1,3,7),bty=’n’)
5
10
15
Time(weeks)
Barley diet
Barley+lupins
Lupin diet
507
508
AIC
BIC
logLik
466.7781 539.4265 -219.3891
#
#
#
#
Correlation Structure: Compound symmetry
Formula: ~ 1 | cow
Parameter estimate(s):
Rho
0.3001093
Use the gls( ) function to fit a
model where the errors have a
compound symmetry covariance structure
within cows.
Coefficients:
options(contrasts=c("contr.treatment","contr.poly"))
(Intercept)
dietf2
dietf3
week
I(week^2)
I(week^3)
dietf2week
dietf3week
dietf2I(week^2)
dietf3I(week^2)
dietf2I(week^3)
dietf3I(week^3)
set1.glscs <- gls(protein ~ dietf+ week+
dietf*week+week^2+dietf*week^2 +
week^3 + dietf*week^3,
data=set1,
correlation = corCompSymm(form=~1|cow),
method=c("REML"))
summary(set1.glscs)
anova(set1.glscs)
Value
3.926357
-0.038575
-0.131599
-0.160102
0.015522
-0.000427
-0.028928
-0.009489
0.003723
0.000407
-0.000134
-0.000021
Std.Error
0.06811861
0.09441974
0.09449531
0.02554698
0.00299418
0.00010094
0.03545611
0.03554717
0.00415876
0.00416915
0.00014023
0.00014057
t-value p-value
57.64001 <.0001
-0.40855 0.6829
-1.39265 0.1640
-6.26697 <.0001
5.18417 <.0001
-4.23121 <.0001
-0.81587 0.4147
-0.26694 0.7896
0.89519 0.3708
0.09754 0.9223
-0.95631 0.3391
-0.14699 0.8832
509
510
Standardized residuals:
Min
Q1
Med
Q3
Max
-3.1085 -0.6782905 -0.03438773 0.6791416 3.42049
# Try an auto regressive covariance
# structures across weeks within cows
Residual standard error: 0.3058223
Degrees of freedom: 1337 total; 1325 residual
Denom. DF: 1325
numDF F-value p-value
(Intercept)
1 28938.35 <.0001
dietf
2
8.42 0.0002
week
1
20.06 <.0001
I(week^2)
1
109.03 <.0001
I(week^3)
1
71.03 <.0001
dietf:week
2
3.55 0.0290
dietf:I(week^2)
2
0.06 0.9459
dietf:I(week^3)
2
0.54 0.5831
set1.glsar <- gls(protein ~ dietf+ week+
dietf*week+week^2+dietf*week^2 +
week^3 + dietf*week^3,
data=set1,
correlation = corAR1(form=~1|cow),
method=c("REML"))
summary(set1.glsar)
anova(set1.glsar)
511
512
Generalized least squares fit by REML
Model: protein ~ dietf + week +
dietf * week + week^2 +
dietf * week^2 + week^3 +
dietf * week^3
Data: set1
AIC
BIC
logLik
156.8072 229.4556 -64.40361
Standardized residuals:
Min
Q1
Med
Q3
Max
-3.272627 -0.6151624 -0.003893045 0.699279 3.373619
Correlation Structure: AR(1)
Formula: ~ 1 | cow
Parameter estimate(s):
Phi
0.6519978
Residual standard error: 0.31561
Degrees of freedom: 1337 total; 1325 residual
Denom. DF: 1325
numDF F-value p-value
(Intercept)
1 40939.16 <.0001
dietf
2
13.05 <.0001
week
1
44.73 <.0001
I(week^2)
1
49.32 <.0001
I(week^3)
1
54.95 <.0001
dietf:week
2
1.42 0.2427
dietf:I(week^2)
2
0.15 0.8591
dietf:I(week^3)
2
0.57 0.5629
Coefficients:
(Intercept)
dietf2
dietf3
week
I(week^2)
I(week^3)
dietf2week
dietf3week
dietf2I(week^2)
dietf3I(week^2)
dietf2I(week^3)
dietf3I(week^3)
Value
4.037225
0.003778
-0.117365
-0.197838
0.018849
-0.000520
-0.048511
-0.019394
0.006141
0.002009
-0.000222
-0.000091
Std.Error
0.0862360
0.1198494
0.1198399
0.0371623
0.0044381
0.0001495
0.0517505
0.0518068
0.0061814
0.0061863
0.0002081
0.0002083
t-value p-value
46.81599 <.0001
0.03153 0.9749
-0.97935 0.3276
-5.32363 <.0001
4.24704 <.0001
-3.48125 0.0005
-0.93741 0.3487
-0.37435 0.7082
0.99346 0.3207
0.32471 0.7455
-1.06449 0.2873
-0.43779 0.6616
514
513
#
#
#
Try a general correlation structure
set1.glss <- gls(protein ~ dietf + week+
dietf*week +week^2+dietf*week^2 +
week^3 + dietf*week^3,
data=set1,
correlation = corSymm(form=~1|cow),
weights = varIdent(form = ~1 | weekf),
method=c("REML"))
summary(set1.glss)
anova(set1.glss)
#
#
Compare the fit of various covariance
structures.
anova(set1.glss, set1.glscs)
anova(set1.glss, set1.glsar)
anova(set1.glss, set1.glsar, set1.glsarh)
#
#
#
#
Try an AR(1) correlation structure
with hterogeneous variances
To compare the continuous week model to the
model where we fit a different mean at each
time point, we must compare likelihood values
instead of REML likelihood values.
set1.glsarmle <- gls(protein ~ dietf+
weekf+dietf*weekf,
data=set1,
correlation = corAR1(form=~1|cow),
method=c("ML"))
set1.glss <- gls(protein ~ dietf + week+
dietf*week +week^2+dietf*week^2 +
week^3 + dietf*week^3,
data=set1,
correlation = corAR1(form=~1|cow),
weights = varIdent(form = ~ 1 | week),
method=c("REML"))
summary(set1.glsarh)
anova(set1.glss)
set1.glscarmle <- gls(protein ~ dietf+
week+ dietf*week + week^2 +
dietf*week^2 + week^3 + dietf*week^3,
data=set1,
correlation = corAR1(form=~1|cow),
method=c("ML"))
515
516
corStruct functions
corCompSymm
corSym
corAR1
corCAR1
corARM A
corEXP
anova(set1.glsarmle,set1.glscarmle)
set1.glsarmle
set1.glscarmle
set1.glsarmle
set1.glacarmle
Model df
AIC
BIC
1 59 19.19311 325.8859
2 14 29.85783 102.6324
logLik
49.40345
-0.92891 1 vs 2 100.6647
Test
L.Ratio
p-value
1 vs 2
100.6647
<.0001
<.0001
corGaus
corLin
corRatio
corSpher
517
compound symmetry
general
autoregressive of order 1
continous time AR(1)
autoregressive-moving average
exponential
1 − exp(−s/ρ)
expGaus
1 − exp[−(s/ρ)2 ]
linear
1 − (1 − s/ρ)I(s < ρ)
rational quadratic
(s/ρ)2 /[1 + (s/ρ)2 ]
spherical
1 − [1 − 1.5(s/ρ) + 0.5(sρ)3 ]I(s < ρ)
518
Download