Longitudinal data 1/27 Longitudinal study and cross-sectional studies I The main characteristic of a longitudinal study is that subjects are measured repeatedly through time. I In a cross-sectional study, a single outcome is measured for each subject. The cross-sectional study is typically designed to study cohort effects. I The main advantages of a longitudinal study is its capacity to separate cohort and age (time) effects. 2/27 Cross-sectional studies Cross-sectional studies measure individuals of various groups at one particular time point to study differences among groups (e.g. defined by age). Cohort effects refer to differences among groups of individuals. I Advantages: easy to obtain samples, cost effective, able to collect a large sample. I Disadvantages: not able to separate cohort and time effects. 3/27 Longitudinal studies Longitudinal studies measure a single individual or groups over a period of time to provide information about age changes (time effects). Time effects refer to changes over time. I Advantages: able to separate cohort and time effects, provide more detailed information. I Disadvantages: expansive, time consuming, drop out. 4/27 Dependence I Because the measurements are obtained from the same individual, they are naturally dependent to each other. I We should account for dependence among measurements taken from the same individual. 5/27 Example: protein content of milk I Milk was collected weekly from 79 Australian cows and analyzed for its protein content. I The cows were maintained on one of the three diet: barley, a mixture of barley and lupins or lupins alone. I Cows were randomized assigned to three diets: barley (25), mixed diet (27) and lupins (27). I The protein content was measured weekly for 19 weeks. Time is measured in weeks since calving. The experiment was terminated 19 weeks after the earliest calving. Thus, not all the cows have 19 repeated measurements. 6/27 3.5 2.5 3.0 Protein Content 4.0 4.5 Example: protein content of milk 5 10 15 Weeks Figure: Protein content versus time plot. Black: barley diet, Red: barley+lupins, Green: lupin diet. 7/27 Date structure Let Yij be the milk protein for the i-th cow measured at j-th week (i = 1, · · · , m; j = 1, · · · , ni ). m: number of subjects; ni : number of repeated measurements for the i-th subject; The data structure is Y11 ··· Y1n1 Y21 .. . ··· .. . Y2n2 .. . Ym1 · · · Ymnm 8/27 A simple model I Our main question is about the effect of diet on protein content. I We might consider a random intercept model as following: Yij = µ + β T Xij + ui + eij , where Xij = (Xij1 , Xij2 )T , Xij1 = 1 if i-th cow in the 2nd diet, otherwise 0, Xij2 = 1 if i-th cow in the 3rd diet, otherwise 0, ui are IID random intercept for i-th individual, ui ∼ N(0, σu2 ) and eij are IID random error with eij ∼ N(0, σ 2 ). I In the above model, β represents the mean effects of diet on protein content. 9/27 A simple model The random intercept model assumes that I The effects of diet on protein is the same across 19 weeks. No time effects are modeled. I The covariances among all the observations obtained from the same individual are the same. 10/27 Covariances Let Y1 = (Y11 , · · · , Y1n1 ) be the observations from the first cow. The random intercept model assumption implies that Var(Y1 ) = σ 2 + σu2 σu2 .. . σu2 ··· σu2 σ 2 + σu2 · · · .. .. . . σu2 .. . σu2 σu2 ··· σ 2 + σu2 and Cov(Yij , Ykl ) = 0 for i 6= k . 11/27 Example: protein content of milk 4.0 Observed Protein Means 3.8 ● 3.6 ● ● ● ● 3.4 ● ● ● ● ● ● ● ● ● ● ● ● 3.0 3.2 Protein (percent) ● ● 5 10 15 Time(weeks) Figure: Mean protein content versus time plot. Black: barley diet, Red: barley+lupins, Green: lupin diet. 12/27 Example: protein content of milk 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Figure: Heatmap of covariance. 13/27 Example: protein content of milk We observe the following: (1) The mean effects of diet on milk protein content might not be time invariant; (2) The compound symmetric covariance might not be very appropriate. 14/27 A model with time effects We could extend the simple model as following Yij = µ + β0T Xij + β1 tij + β2 tij2 + εij . where tij is the time when the measurement Yij was measured. The term β1 tij + β2 tij2 models the time effects using a quadratic function of time. We typically assume Cov(εij , εkl ) = 0 if k 6= i and ε11 σ12 σ12 · · · σ1ni ε12 σ12 σ22 · · · σ2ni Var .. = .. .. .. . .. . . . . . ε1ni σ1ni σ2ni · · · σn2i 15/27 A general model Let Yi = (Yi1 , · · · , Yini ) be the observations from the i-th cow. The a general model for longitudinal data could be Yi = Xi β + Zi ui + εi , i = 1, · · · , m. where ui are random effects with variance G and εi are random errors with variance R. For longitudinal data we also assume that ui ’s are independent. This model implies that Var(Yi ) = Zi GZiT + R := Vi . 16/27 A general model The general model assumes that the observations from different subjects are independent. As a result, the covariance matrix of Y = (Y1T , · · · , YmT )T is of block diagonal form. Thus, Y1 .. . ∼N Ym V1 X1 0 .. . β, ... Xm 0 ··· 0 V2 · · · .. . . . . 0 .. . 0 0 ··· . Vm 17/27 Estimation and statistical inference I Since the general model is a special case of the general linear mixed model, the inference methods we learned could be applied here. I For estimating fixed effects, the maximum likelihood estimator and generalized least squares method could be used. I For estimating of variance components, one could apply the REML. 18/27 Estimation of β The generalized least squares estimate of β is m m X X T −1 −1 β̂ = ( Xi Vi Xi ) ( XiT Vi−1 Yi ). i=1 i=1 Using the large sample theory, we have β̂ ∼ N(β, ( m X XiT Vi−1 Xi )−1 ). i=1 19/27 Some commonly used covariance structures Compound symmetric: σ12 + σ22 σ22 .. . σ22 ··· σ22 σ12 + σ22 · · · .. .. . . σ22 .. . . σ22 σ22 ··· σ12 + σ22 20/27 Some commonly used covariance structures Toeplitz: σ 2 σ1 σ2 σ3 σ1 σ 2 σ1 σ2 . σ2 σ1 σ 2 σ1 σ3 σ2 σ1 σ 2 21/27 Some commonly used covariance structures Ante-dependence: assume Corr(Yij , Yik ) = ρjk = Qk−1 s=j ρs(s+1) for any j < k − 1. Specifically, σ12 σ1 σ2 ρ12 σ1 σ2 ρ12 σ1 σ3 ρ12 ρ23 σ22 σ1 σ3 ρ12 ρ23 σ2 σ3 ρ23 σ1 σ3 ρ23 σ32 . 22/27 Some commonly used covariance structures Autoregressive (AR(1)) : 1 ρ ρ2 σ2 ρ 1 ρ2 ρ ρ . 1 Heterogeneous AR(1) : σ12 σ1 σ2 ρ σ1 σ3 ρ2 σ1 σ2 ρ σ22 σ2 σ3 ρ σ1 σ3 ρ2 σ2 σ3 ρ σ32 . 23/27 Selecting a covariance model: two information criteria I AIC: Akaike information criterion. AIC=-2log Likelihood + 2p, where p is the number of free parameters. I BIC: Baysian information criterion. BIC=-2log Likelihood + p log(n), where p is the number of free parameters and n is the sample size. 24/27 Selecting a covariance model: two information criteria I The models being compared need not be nested, unlike the case when models are being compared using likelihood ratio test. I When picking from several models, the one with the lowest BIC (or AIC) is preferred. The BIC generally penalizes free parameters more strongly than the AIC when n is large. AIC tends to select models with too many parameters. 25/27 Incorrect model for Vi What if you select a wrong model for Vi ? I The estimation of β is still consistent. I The large sample asymptotic normality is still true. I May not be the most efficient (smallest variance) estimator. 26/27 Incorrect model for Vi If you select a correct model for Vi , then β̂ has the following large sample asymptotic normality β̂ ∼ N β, m X ( XiT Vi−1 Xi )−1 ! . i=1 If you select an incorrect model Vi∗ , which is not the same as P P T ∗−1 X )−1 ( m X T V ∗−1 Y ), then Vi , i.e., β̂ = ( m i i i=1 Xi Vi i=1 i i m X β̂ ∼ N β, ( XiT Vi∗−1 Xi )−1 i=1 m m X X ∗−1 T ∗−1 ×( Xi Vi Vi Vi Xi )( XiT Vi∗−1 Xi )−1 . i=1 i=1 27/27