Longitudinal data 1/27

advertisement
Longitudinal data
1/27
Longitudinal study and cross-sectional studies
I
The main characteristic of a longitudinal study is that
subjects are measured repeatedly through time.
I
In a cross-sectional study, a single outcome is measured
for each subject. The cross-sectional study is typically
designed to study cohort effects.
I
The main advantages of a longitudinal study is its capacity
to separate cohort and age (time) effects.
2/27
Cross-sectional studies
Cross-sectional studies measure individuals of various groups
at one particular time point to study differences among groups
(e.g. defined by age). Cohort effects refer to differences among
groups of individuals.
I
Advantages: easy to obtain samples, cost effective, able to
collect a large sample.
I
Disadvantages: not able to separate cohort and time
effects.
3/27
Longitudinal studies
Longitudinal studies measure a single individual or groups over
a period of time to provide information about age changes (time
effects). Time effects refer to changes over time.
I
Advantages: able to separate cohort and time effects,
provide more detailed information.
I
Disadvantages: expansive, time consuming, drop out.
4/27
Dependence
I
Because the measurements are obtained from the same
individual, they are naturally dependent to each other.
I
We should account for dependence among measurements
taken from the same individual.
5/27
Example: protein content of milk
I
Milk was collected weekly from 79 Australian cows and
analyzed for its protein content.
I
The cows were maintained on one of the three diet: barley,
a mixture of barley and lupins or lupins alone.
I
Cows were randomized assigned to three diets: barley
(25), mixed diet (27) and lupins (27).
I
The protein content was measured weekly for 19 weeks.
Time is measured in weeks since calving. The experiment
was terminated 19 weeks after the earliest calving. Thus,
not all the cows have 19 repeated measurements.
6/27
3.5
2.5
3.0
Protein Content
4.0
4.5
Example: protein content of milk
5
10
15
Weeks
Figure: Protein content versus time plot. Black: barley diet, Red:
barley+lupins, Green: lupin diet.
7/27
Date structure
Let Yij be the milk protein for the i-th cow measured at j-th
week (i = 1, · · · , m; j = 1, · · · , ni ).
m: number of subjects;
ni : number of repeated measurements for the i-th subject;
The data structure is
Y11
···
Y1n1
Y21
..
.
···
..
.
Y2n2
..
.
Ym1 · · ·
Ymnm
8/27
A simple model
I
Our main question is about the effect of diet on protein
content.
I
We might consider a random intercept model as following:
Yij = µ + β T Xij + ui + eij ,
where Xij = (Xij1 , Xij2 )T , Xij1 = 1 if i-th cow in the 2nd diet,
otherwise 0, Xij2 = 1 if i-th cow in the 3rd diet, otherwise 0,
ui are IID random intercept for i-th individual, ui ∼ N(0, σu2 )
and eij are IID random error with eij ∼ N(0, σ 2 ).
I
In the above model, β represents the mean effects of diet
on protein content.
9/27
A simple model
The random intercept model assumes that
I
The effects of diet on protein is the same across 19 weeks.
No time effects are modeled.
I
The covariances among all the observations obtained from
the same individual are the same.
10/27
Covariances
Let Y1 = (Y11 , · · · , Y1n1 ) be the observations from the first cow.
The random intercept model assumption implies that




Var(Y1 ) = 



σ 2 + σu2
σu2
..
.
σu2
···
σu2

σ 2 + σu2 · · ·
..
..
.
.
σu2
..
.







σu2
σu2
···
σ 2 + σu2
and Cov(Yij , Ykl ) = 0 for i 6= k .
11/27
Example: protein content of milk
4.0
Observed Protein Means
3.8
●
3.6
●
●
●
●
3.4
●
●
●
●
●
●
●
●
●
●
●
●
3.0
3.2
Protein (percent)
●
●
5
10
15
Time(weeks)
Figure: Mean protein content versus time plot. Black: barley diet,
Red: barley+lupins, Green: lupin diet.
12/27
Example: protein content of milk
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
Figure: Heatmap of covariance.
13/27
Example: protein content of milk
We observe the following:
(1) The mean effects of diet on milk protein content might not
be time invariant;
(2) The compound symmetric covariance might not be very
appropriate.
14/27
A model with time effects
We could extend the simple model as following
Yij = µ + β0T Xij + β1 tij + β2 tij2 + εij .
where tij is the time when the measurement Yij was measured.
The term β1 tij + β2 tij2 models the time effects using a quadratic
function of time.
We typically assume Cov(εij , εkl ) = 0 if k 6= i and


 
ε11
σ12 σ12 · · · σ1ni

 



 
 ε12   σ12 σ22 · · · σ2ni 

 
Var 
 ..  =  ..
..
..  .
..
 .   .

.
.
.


 
ε1ni
σ1ni σ2ni · · · σn2i
15/27
A general model
Let Yi = (Yi1 , · · · , Yini ) be the observations from the i-th cow.
The a general model for longitudinal data could be
Yi = Xi β + Zi ui + εi ,
i = 1, · · · , m.
where ui are random effects with variance G and εi are random
errors with variance R. For longitudinal data we also assume
that ui ’s are independent. This model implies that
Var(Yi ) = Zi GZiT + R := Vi .
16/27
A general model
The general model assumes that the observations from
different subjects are independent. As a result, the covariance
matrix of Y = (Y1T , · · · , YmT )T is of block diagonal form. Thus,



Y1




 .. 
 . ∼N





Ym




V1

X1
 
0
..  
.  β, 

  ...

Xm
0
···
0
V2 · · ·
.. . .
.
.
0
..
.
0
0
···




 .



Vm
17/27
Estimation and statistical inference
I
Since the general model is a special case of the general
linear mixed model, the inference methods we learned
could be applied here.
I
For estimating fixed effects, the maximum likelihood
estimator and generalized least squares method could be
used.
I
For estimating of variance components, one could apply
the REML.
18/27
Estimation of β
The generalized least squares estimate of β is
m
m
X
X
T −1
−1
β̂ = (
Xi Vi Xi ) (
XiT Vi−1 Yi ).
i=1
i=1
Using the large sample theory, we have
β̂ ∼ N(β, (
m
X
XiT Vi−1 Xi )−1 ).
i=1
19/27
Some commonly used covariance structures
Compound symmetric:

σ12 + σ22







σ22
..
.
σ22
···
σ22

σ12 + σ22 · · ·
..
..
.
.
σ22
..
.



.



σ22
σ22
···
σ12 + σ22
20/27
Some commonly used covariance structures
Toeplitz:

σ 2 σ1 σ2 σ3





 σ1 σ 2 σ1 σ2 
.



 σ2 σ1 σ 2 σ1 


σ3 σ2 σ1 σ 2
21/27
Some commonly used covariance structures
Ante-dependence: assume Corr(Yij , Yik ) = ρjk =
Qk−1
s=j
ρs(s+1)
for any j < k − 1. Specifically,





σ12
σ1 σ2 ρ12
σ1 σ2 ρ12 σ1 σ3 ρ12 ρ23
σ22
σ1 σ3 ρ12 ρ23 σ2 σ3 ρ23
σ1 σ3 ρ23
σ32



.

22/27
Some commonly used covariance structures
Autoregressive (AR(1)) :

1
ρ ρ2


σ2  ρ 1

ρ2 ρ



ρ .

1
Heterogeneous AR(1) :

σ12
σ1 σ2 ρ σ1 σ3 ρ2


 σ1 σ2 ρ
σ22
σ2 σ3 ρ

σ1 σ3 ρ2 σ2 σ3 ρ
σ32



.

23/27
Selecting a covariance model: two information criteria
I
AIC: Akaike information criterion.
AIC=-2log Likelihood + 2p,
where p is the number of free parameters.
I
BIC: Baysian information criterion.
BIC=-2log Likelihood + p log(n),
where p is the number of free parameters and n is the
sample size.
24/27
Selecting a covariance model: two information criteria
I
The models being compared need not be nested, unlike
the case when models are being compared using
likelihood ratio test.
I
When picking from several models, the one with the lowest
BIC (or AIC) is preferred. The BIC generally penalizes free
parameters more strongly than the AIC when n is large.
AIC tends to select models with too many parameters.
25/27
Incorrect model for Vi
What if you select a wrong model for Vi ?
I
The estimation of β is still consistent.
I
The large sample asymptotic normality is still true.
I
May not be the most efficient (smallest variance) estimator.
26/27
Incorrect model for Vi
If you select a correct model for Vi , then β̂ has the following
large sample asymptotic normality
β̂ ∼ N
β,
m
X
(
XiT Vi−1 Xi )−1
!
.
i=1
If you select an incorrect model Vi∗ , which is not the same as
P
P
T ∗−1 X )−1 ( m X T V ∗−1 Y ), then
Vi , i.e., β̂ = ( m
i
i
i=1 Xi Vi
i=1 i
i
m
X
β̂ ∼ N β, (
XiT Vi∗−1 Xi )−1
i=1
m
m
X
X
∗−1
T ∗−1
×(
Xi Vi Vi Vi Xi )(
XiT Vi∗−1 Xi )−1 .
i=1
i=1
27/27
Download