Longitudinal Studies • Repeatedly measure individuals followed over time Cross-Sectional Studies • Sometimes called panel studies (e.g. Economics, Sociology, Food Science) • One observation on each subject • Different subjects are measured at different points in time (e.g. at different ages) • Cannot entirely distinguish between cohort and age effects • Reading ability example (DHLZ, pages 1-2) • Able to distinguish changes over time within individuals (age effects) from differences among individuals (cohort effects) • Must account for correlations among measurements taken on the same individual • Vector of measurements Y i = (Yit1 , Yit2 , ...Yi,tn )T i on the ith subject, or experimental unit. 455 456 Example: Measure muscle strength of elderly subjects at 0, 6, and 12 months (Daniels and Hogan, 2000, Biometrics) • Randomized clinical trial • Investigate the effects of recombinant human growth hormone (rhGH) therapy for building and maintaining muscle strength in the elderly (Kiel et al, 1998). • The trial enrolled 161 subjects and randomized them to one of four treatments: – Placebo • Both placebo and growth hormone were administered via daily injections • Muscle strength measures were recorded at baseline, six months and twelve months. • Strength is measured as the maximum foot-pounds of torque which can be exerted against resistance provided by a mechanical device – Growth hormone only (0.015 mg/kg rhGH) – Exercise plus placebo – Exercise plus growth hormone 457 458 Measure Depression over time (Pourahmadi and Daniels, 2002 Biometrics) • Patients were assigned active treatment and measured weekly for 16 weeks – Weekly depression scores – 549 subjects with no missing baseline covariates. • Main questions of interest 1. Is a combined drug/psychotherapy treatment more effective than the only psychotherapy treatment in reducing depression? 2. Is initial severity an important predictor of patient improvement? 3. Do treatment and initial severity interact in their impact on the rate of improvement? • Current practices for treatment of major depression emphasizes symptom severity in determining the need for anti-depressant drugs. • For these 549 patients, about 30% (2840) of the possible measurements were missing, mostly intermittently. beginitemize • Several of the studies measured depression bi-weekly for part or all of the active phase of treatment so we have some observations missing by design. • Some subjects dropped out (about 16%). Some were missing completely at random (MCAR) but others were related to side effects of treatment or being so depressed that they were provided an alternative treatment. 459 460 Marginal Models Classes of Models • Consider a vector of observations, Yi = (Yi1 , . . . , Yini )T for i = 1, . . . , m individuals (or experimental units) for Longitudinal Data • Reduce the repeated measurements to a single value for each • Expect observations taken on the same individual (experimental unit) to be correlated individual and perform a univariate analysis • Marginal models • Random effects • Consider a linear model for the conditional means, eg, (hierarchical) models E(Yij ) = β0 + β1(xi) + β2tij + β3t2ij • Model the variances and covariances ⎛ • Transition models Vi = V ar 461 ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ Yi1 Yi2 .. Yi,ni ⎞ ⎡ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ = σ12 σ12 σ12 σ22 .. .. σni,1 σni,2 · · · σ1,ni · · · σ2,ni .. ··· 2 · · · σn i 462 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ • Consider the model: Yi ∼ N (Xiβ, Σi) i = 1,2,...,m Estimation in Marginal Models • Assume independent responses from different individuals ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ Y1 Y2 .. Ym ⎤ ⎛⎡ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎜⎢ ⎜⎢ ⎜⎢ ⎜⎢ ⎜⎢ ⎜⎢ ⎜⎢ ⎜⎢ ⎜⎢ ⎜⎢ ⎝⎣ ∼N ⎤ ⎡ X1β ⎥⎥ ⎢⎢ ⎥ ⎢ X2β ⎥⎥⎥ ⎢⎢⎢ ⎥,⎢ .. ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣ Xmβ V1 0 0 V2 .. .. 0 0 ··· ··· ... ··· 0 0 0 Vm • Good reference for likelihood estimation: Jennrich and Schluchter (1986) Biometrics ⎤⎞ ⎥⎟ ⎥⎟ ⎥⎟ ⎥⎟ ⎥⎟ ⎥⎟ ⎥⎟ ⎥⎟ ⎥⎟ ⎥⎟ ⎦⎠ • Ordinary Least Squares: Minimize (Y −Xβ)T (Y −Xβ) = m (Yi−Xiβ)T (Yi−Xiβ) i=1 Set partial derivatives with respect to the elements of β equal to zero to obtain the estimating equations • The block diagonal covariance structure is important m • The covariance structure Vi can differ across individuals i=1 XiT (Yi − Xiβ) = 0 The unique solution is β̂ = (X T X)−1 X T Y ⎛ • Marginal models are used when inferences about β are of primary interest = m ⎜ ⎝ i=1 ⎞ XiT Xi⎟⎠ −1 m i=1 XiT Yi 463 • Maximum Likelihood Estimation: • Generalized Least Squares: The natural logarithm of the multivariate Gaussian likelihood is Minimize (Y − Xβ)T Σ−1(Y − Xβ) T −1 = m i=1(Yi − Xiβ) Vi (Yi − Xiβ) Set partial derivatives with respect to the elements of β equal to zero to obtain the estimating equations m i=1 XiT Vi−1(Yi − Xiβ) = 0 log(L(β, Σ)) = −1 m i=1 2 (2π)ni + log(|Vi |) + (Yi − Xiβ)T Vi−1(Yi − Xiβ) Set partial derivatives with respect to the elements of β equal to zero to obtain the estimating equations −1 m X T V −1(Yi − Xiβ) = 0 2 i=1 i i The unique solution is The unique solution is β̂ = (X T Σ−1X)−1 X T Σ−1Y ⎛ = m ⎜ ⎝ i=1 ⎞ XiT Vi−1Xi⎟⎠ −1 m i=1 XiT Vi−1Yi β̂ = (X T Σ−1X)−1 X T Σ−1Y ⎛ = m ⎜ ⎝ i=1 464 ⎞ XiT Vi−1Xi⎟⎠ −1 m i=1 XiT Vi−1Yi 465 • For given Σ, the mle for β is the generalized least squares estimator Restricted Maximum Likelihood Estimation (REML) • You can simultaneously obtain the maximum likelihood estimator Σ̂ for Σ • Maximizing a Gaussian likelihood that does not depend on E(Y) = Xβ . • If we plug in β̂ for β and Σ̂ for Σ we obtain • Maximize a likelihood function for “error contrasts” −2log(L(β, Σ)) = m i=1 (2π)ni + log(|V̂i|) + (Yi − Xiβ)T V̂i−1(Yi − Xiβ̂) • Likelihood ratio tests for comparing models • Maximum likelihood estimates of variance components tend to be too small – linear combinations of observations that do not depend on Xβ – will need a set of ⎛ m ⎜ ⎝ i=1 ⎞ ni⎟⎠ − rank(X) linearly independent “error contrasts” 466 467 Gaussian model: To avoid losing information we must have row rank(M ) = n − rank(X) Y ∼ N (Xβ, Σ) = n−p For a non-random matrix L LY ∼ N (L(Xβ, LΣLT ) Then a set of n − p error contrasts is Consequently, LY does not depend on Xβ if and only if LX = 0. But LX = 0 if and only if L = M (I − PX ) for some M with n = m n rows, where i=1 i PX = X(X T X)−X T 468 r = M (I − PX )Y ∼ Nn−p(0, M (I − PX )Σ−1(I − PX )M T ) call this W , then rank(W ) = n − p and W −1 exists. 469 For any M(n−p)×n with row rank equal to n − p = n − rank(X) The ”Restricted” likelihood is L(Σ; r) = 1 (2π)(n−p)/2 |W |1/2 1 T −1 e− 2 r W r the log-likelihood can be expressed in terms of e = (I − X(XΣ−1 X T )−1X T Σ−1)Y as 1 (Σ; e) = constant − log(|Σ|) 2 The resulting log-likelihood is (Σ; r) = −(n − p) 2 1 log(2π) − log|W | 2 1 − r T W −1r 2 1 1 − log(|X∗T Σ−1X∗|) − eT Σ−1e 2 2 where X∗ is any set of p =rank(X) linearly independent columns of X . Denote the resulting REML estimator as Σ̂REM L 470 471 Selecting Covariance Structure Estimation of fixed effects: • AIC: Akaike Information Criterion For any estimable function Cβ , the blue is the generalized least squares estimator – AIC=-2 loglik + 2*p (p is number of parameters) – When n is large, often favors models with too many parameters (penalty does not change with sample size) CbGLS = C(X T Σ−1X)−1 X T Σ−1Y An approximation is − T −1 C β̂ = C(X T Σ̂−1 REM L X) X Σ̂REM LY and for “large” samples: C β̂ ∼ ˙ N (Cβ, C(X T Σ−1X)−C T ) – In SAS, when specify ML, p is the number of fixed effects parameters plus the number of covariance parameters; when you specify REML, p is the number of covariance parameters if you specified the correct model for Σ – AICC - continuity corrected version (see Burnham and Anderson, 1998) 472 473 • BIC: Bayesian Information Criterion – BIC= -2 loglik + p*log(n) (n is the number of subjects) – based on approximation to the Bayes Factor – In SAS, when you specify ML, p is the number of fixed effects parameters + number of covariance parameters; in REML, p is the number of covariance parameters – In SAS, n is the number of subjects • Bayesian Analysis: specify prior distributions for β (often either a Gaussian prior or a non-informative prior) and Σ (inverse Wishart prior); for other choices see Leonard, 1992 Annals; Brown, Le, and Zidek, 1994; Daniels and Kass, 1999 JASA; Barnard, McCulloch, and Meng, 2000, Statistica Sinica • Empirical Bayes: estimate hyperparameters of prior distributions from the data 474 475 What if you assumed the wrong structure for Σ? What if you Mis-Specify Σ? • Often assume some parametric structure for Σ • Often get an estimator for β that – is consistent – Time series structures (AR, MA) – Structured antedependence models (SAD), (Zimmerman and NunezAnton, 1997) – has a large sample normal distribution – is not quite efficient – Compound Symmetry • Likelihood is a function paramters L(β, Σ(α)) of fewer • A consistent estimator of the large sample covariance matrix for the estimator of β is obtained from a sandwich variance estimator • More stable estimator for Σ (fewer parameters) and subsequently, a more stable estimator for β 476 477 Shrinkage Estimators Chen, 1979; Daniels and Kass (1999, 2001 Biometrics); Daniels and Pourahmadi (2001) The default in PROC MIXED in SAS is to take V ar(Y ) = σe2 I You can change this by using the REPEATED statement in PROC MIXED • Strategy: – Shrink toward the structure – Data determines shrinkage the amount of REPEATED / type = subject = subj (program) variables in the class statement – Many different parameterizations on which to shrink • Properties: rcorr; ↑ print the correlation martix for one subject r ↑ print the R matrix for one subject – Consistent – Asymptotic normality – Asymptotic efficiency 478 479 Compound Symmetry: (type = CS) ⎡ R= ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ Unstructured: (type = UN) σ12 + σ22 σ22 σ22 σ22 σ22 σ12 + σ22 σ22 σ22 2 2 2 2 σ2 σ2 σ1 + σ2 σ22 2 2 2 2 σ2 σ2 σ2 σ1 + σ22 Variance components: (type = VC) (default) ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎡ R= ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ σ12 σ12 σ13 σ14 σ12 σ22 σ23 σ24 σ13 σ23 σ32 σ34 σ14 σ24 σ34 σ42 R= ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ σ12 0 0 0 0 σ22 0 0 0 0 σ32 0 0 0 0 σ42 ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ Toeplitz: (type = TOEP) ⎡ ⎡ ⎤ ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ R= 480 ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ σ2 σ1 σ2 σ3 σ1 σ2 σ1 σ2 σ2 σ1 σ2 σ1 σ3 σ2 σ1 σ2 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ 481 Heterogeneous TOEPH) ⎡ R= ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ Toeplitz: (type = σ12 σ1σ2ρ1 σ1σ2ρ2 σ1σ4ρ3 σ2σ1ρ1 σ22 σ2σ3ρ1 σ2σ4ρ2 σ3σ1ρ2 σ3σ2ρ1 σ32 σ3σ4ρ1 σ4σ1ρ3 σ4σ2ρ2 σ4σ3ρ1 σ42 R= ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ σ12 σ2σ1ρ1 σ3σ1ρ2ρ1 Autoregressive: ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ (type ⎡ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ 1 ρ ρ2 ρ3 ρ 1 ρ ρ2 R = σ2 2 ρ ρ 1 ρ ρ3 ρ2 ρ 1 ⎤ = Heterogeneous AR(1): (type = ARH(1)) First order Ante-dependence: (type = ANTE(1)) ⎡ First Order AR(1)) ⎡ σ1σ2ρ1 σ1σ3ρ1ρ2 σ22 σ2σ3ρ2 σ3σ2ρ2 σ32 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ R= ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ σ12 σ1σ2ρ σ1σ3ρ2 σ1σ4ρ3 σ2σ1ρ σ22 σ2σ3ρ σ2σ4ρ2 2 σ3σ1ρ σ3σ2ρ σ32 σ3σ4ρ 3 2 σ4σ1ρ σ4σ2ρ σ4σ3ρ σ42 482 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ 483 Fitting Marginal Models in SAS and Splus SAS: the MIXED procedure Spatial power: (type = sp(pow)(list)) ↑ list of variables defining coordinates ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 1 ρd12 R = σ2 ρd13 ρd14 ρd12 ρd13 ρd14 1 ρd23 ρd24 ρd23 1 ρd34 d d ρ 24 ρ 34 1 ⎤ /* Enter the cow protein data */ data set2; infile ’c:\st565\dhlz.example1_4.data’; input diet cow week protein; run; proc sort data=set2; by diet week; run; ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ proc means data=set2 noprint; by diet week; var protein; output out=means mean=pmean; run; where dij is the Euclidean distance between the i-th and j -th observations provided by one subject (or unit). You can replace pow with a number of other choices. 484 proc print data=means; run; axis1 label=(f=swiss h=1.8 a=90 r=0 "Protein (percent)") order = 2.5 to 4.5 by 0.5 value=(f=swiss h=1.8) w=3.0 length= 4.0in; axis2 label=(f=swiss h=2.0 "Time(weeks)") order = 0 to 20 by 5 value=(f=swiss h=1.8) w= 3.0 length = 6.5 in; 485 SYMBOL1 V=CIRCLE H=1.7 w=3 l=1 i=join ; SYMBOL2 V=DIAMOND H=1.7 w=3 l=3 i=join ; SYMBOL3 V=square H=1.7 w=3 l=9 i=join ; PROC GPLOT DATA=means; PLOT pmean*week=diet / vaxis=axis1 haxis=axis2; TITLE1 ls=0.01in H=2.0 F=swiss "Protein Content in Milk"; footnote ls=0.01in; RUN; /* perform a one-way anova at each time point */ proc sort data=set2; by week diet; run; proc glm data=set2; by week; class diet; model protein = diet / ss1 ss3; lsmeans diet / stderr pdiff; run; 486 487 ---------------------------- week=1 ----------------------------- ---------------------------- week=19 ---------------------------- The GLM Procedure The GLM Procedure Dependent Variable: protein Dependent Variable: protein Source DF Sum of Squares Model 2 0.24488572 0.12244286 Error 76 12.31921807 0.16209497 Corrected Total 78 12.56410380 F Value Pr > F Source DF Sum of Squares Mean Square F Value Pr>F 0.76 0.4733 Model 2 1.27469477 0.63734739 6.53 .0037 Error 38 3.71148571 0.09767068 Corrected Total 40 4.98618049 protein LSMEAN Standard Error Pr > |t| LSMEAN Number 3.88680000 3.86111111 3.75814815 0.08052204 0.07748237 0.07748237 <.0001 <.0001 <.0001 1 2 3 diet 1 2 3 Mean Square Least Squares Means for effect diet Pr > |t| for H0: LSMean(i)=LSMean(j) The GLM Procedure Least Squares Means protein LSMEAN Standard Error Pr > |t| LSMEAN Number 3.64000000 3.39571429 3.20571429 0.08667831 0.08352531 0.08352531 <.0001 <.0001 <.0001 1 2 3 diet 1 2 3 Least Squares Means for effect diet Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: protein i/j 1 2 3 1 0.8188 0.2532 2 3 0.8188 0.2532 0.3504 Dependent Variable: protein i/j 0.3504 NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used. 1 2 3 1 0.0495 0.0009 2 3 0.0495 0.0009 0.1160 0.1160 NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used. 488 489 The Mixed Procedure Model Information Data Set Dependent Variable Covariance Structure Subject Effect Estimation Method Residual Variance Method Fixed Effects SE Method Degrees of Freedom Method /* Fit cubic trends across time with a compound symmetry covariance structure */ WORK.SET2 protein Compound Symmetry cow(diet) REML Profile Model-Based Between-Within Class Level Information proc mixed data=set2; class diet cow; model protein = diet diet*week diet*week*week diet*week*week*week / noint s htype=1 outpm=means; repeated / type=cs sub=cow(diet); run; Class diet cow Levels Values 3 79 1 2 3 1 2 3 14 15 24 25 34 35 44 45 54 55 64 65 74 75 4 5 6 16 17 26 27 36 37 46 47 56 57 66 67 76 77 7 8 9 18 19 28 29 38 39 48 49 58 59 68 69 78 79 10 20 30 40 50 60 70 11 21 31 41 51 61 71 12 22 32 42 52 62 72 13 23 33 43 53 63 73 Iteration History 490 Iteration Evaluations -2 Res Log Like Criterion 0 1 2 1 2 1 730.59188696 438.77899831 438.77813937 0.00000086 0.00000000 491 Convergence criteria met. Covariance Parameter Estimates Cov Parm Subject CS Residual cow(diet) Estimate 0.02807 0.06546 Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) 438.8 442.8 442.8 447.5 /* Fit cubic trends across time with a general (unstructured) covariance structure */ Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 1 291.81 <.0001 Solution for Fixed Effects Effect diet diet 1 diet 2 diet 3 week*diet 1 week*diet 2 week*diet 3 week*w*diet 1 week*w*diet 2 week*w*diet 3 week*w*w*diet 1 week*w*w*diet 2 week*w*w*diet 3 Estimate 3.9264 3.8878 3.7948 -0.1601 -0.1890 -0.1696 0.0155 0.0192 0.0159 -0.0004 -0.0005 -0.0004 Standard Error 0.06812 0.06538 0.06549 0.02555 0.02459 0.02472 0.002994 0.002886 0.002901 0.000101 0.000097 0.000098 DF 76 76 76 1249 1249 1249 1249 1249 1249 1249 1249 1249 t Value 57.64 59.46 57.94 -6.27 -7.69 -6.86 5.18 6.67 5.49 -4.23 -5.77 -4.58 Pr > |t| <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 492 proc mixed data=set2; class diet cow; model protein = diet diet*week diet*week*week diet*week*week*week / noint s htype=1 outpm=means; repeated / type=un sub=cow(diet); run; 493 Iteration History Iteration Evaluations -2 Res Log Like Criterion 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 730.59188696 759.18351015 511.07147888 397.64762263 219.69572765 -12.98148491 -154.86883417 -239.82383162 -288.63157456 -314.96920815 -328.12928957 -334.33609104 -337.20776612 -338.32927625 -338.61339846 -338.64539572 -338.64626381 20883449819 7427110783.9 4375955610.8 0.15895475 0.08776816 0.04982468 0.02786239 0.01487291 0.00741391 0.00348648 0.00162686 0.00065487 0.00017419 0.00002111 0.00000061 0.00000000 Convergence criteria met. Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) UN(2,1) UN(2,2) UN(3,1) . . UN(19,16) UN(19,17) UN(19,18) UN(19,19) cow(diet) cow(diet) cow(diet) cow(diet) . . cow(diet) cow(diet) cow(diet) cow(diet) 0.1934 0.05180 0.07465 0.04706 . . 0.08295 0.05973 0.08104 0.1106 Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) -338.6 41.4 105.4 491.5 The Mixed Procedure Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 189 1069.24 <.0001 Solution for Fixed Effects Effect diet Estimate Standard Error DF t Value diet diet diet week*diet week*diet week*diet week*week*diet week*week*diet week*week*diet week*week*week*diet week*week*week*diet week*week*week*diet 1 2 3 1 2 3 1 2 3 1 2 3 3.9106 3.8365 3.7323 -0.2073 -0.1988 -0.1827 0.02223 0.02222 0.01877 -0.00063 -0.00068 -0.00056 0.07821 0.07495 0.07492 0.03082 0.02969 0.02972 0.003453 0.003339 0.003334 0.000110 0.000106 0.000106 76 76 76 76 76 76 76 76 76 76 76 76 50.00 51.19 49.81 -6.73 -6.70 -6.15 6.44 6.65 5.63 -5.79 -6.40 -5.26 495 494 The Mixed Procedure /* To test for diet effects fit another form of the same model */ Type 1 Tests of Fixed Effects Effect diet week*diet week*week*diet week*week*week*diet Num DF Den DF F Value Pr > F 3 3 3 3 76 76 76 76 13105.8 6.13 21.47 34.06 <.0001 0.0009 <.0001 <.0001 496 proc mixed data=set2; class diet cow; model protein = diet week diet*week week*week diet*week*week week*week*week diet*week*week*week / s htype=3 outpm=means; repeated / type=un sub=cow(diet); run; 497 Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) -338.6 41.4 105.4 491.5 Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 189 1069.24 <.0001 Solution for Fixed Effects Effect Intercept diet diet diet week week*diet week*diet week*diet week*week week*week*diet week*week*diet week*week*diet week*week*week week*week*week*diet week*week*week*diet week*week*week*diet diet 1 2 3 1 2 3 1 2 3 1 2 3 Estimate Standard Error DF t Value 3.7323 0.1783 0.1042 0 -0.1827 -0.02459 -0.01606 0 0.01877 0.003464 0.003449 0 -0.00056 -0.00008 -0.00012 0 0.07492 0.1083 0.1060 . 0.02972 0.04282 0.04201 . 0.003334 0.004800 0.004719 . 0.000106 0.000152 0.000150 . 76 76 76 . 76 76 76 . 76 76 76 . 76 76 76 . 49.81 1.65 0.98 . -6.15 -0.57 -0.38 . 5.63 0.72 0.73 . -5.26 -0.51 -0.82 . 498 /* To test for diet effects fit another form of the same model */ Type 3 Tests of Fixed Effects Effect diet week week*diet week*week week*week*diet week*week*week week*week*week*diet Num DF Den DF F Value Pr > F 2 1 2 1 2 1 2 76 76 76 76 76 76 76 1.38 127.73 0.17 116.91 0.35 101.49 0.35 0.2587 <.0001 0.8427 <.0001 0.7031 <.0001 0.7082 proc mixed data=set2; class diet cow; model protein = diet week diet*week week*week diet*week*week week*week*week diet*week*week*week / s htype=3 outpm=means; repeated / type=un sub=cow(diet); run; /* plot the fitted curves */ axis1 label=(f=swiss h=1.8 a=90 r=0 "Protein (percent)") order = 2.5 to 4.5 by .5 value=(f=swiss h=1.8) w=3.0 length= 4.0in; axis2 label=(f=swiss h=2.0 "Time(weeks)") order = 0 to 20 by 5 value=(f=swiss h=1.8) w= 3.0 length = 6.5 in; 499 500 SYMBOL1 V=CIRCLE H=1.7 w=3 l=1 i=join ; SYMBOL2 V=DIAMOND H=1.7 w=3 l=3 i=join ; SYMBOL3 V=square H=1.7 w=3 l=9 i=join ; PROC GPLOT DATA=means; PLOT pred*week=diet / vaxis=axis1 haxis=axis2; TITLE1 ls=0.01in H=2.0 F=swiss "Estimated Protein Content"; footnote ls=0.01in; RUN; 501 502 Solution for Fixed Effects Effect Intercept diet diet diet week week*diet week*diet week*diet week*week week*week*diet week*week*diet week*week*diet week*week*week week*week*week*diet week*week*week*diet week*week*week*diet /* To test for diet effects fit another form of the same model */ proc mixed data=set2; class diet cow; model protein = diet week diet*week week*week diet*week*week week*week*week diet*week*week*week / s htype=3 outpm=means df=kr; repeated / type=un sub=cow(diet); run; Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) -338.6 41.4 105.4 491.5 Chi-Square 1069.24 1 2 3 1 2 3 1 2 3 1 2 3 Estimate Standard Error DF t Value 3.7323 0.1783 0.1042 0 -0.1827 -0.02459 -0.01606 0 0.01877 0.003464 0.003449 0 -0.00056 -0.00008 -0.00012 0 0.09639 0.1395 0.1365 . 0.04044 0.05829 0.05725 . 0.004728 0.006805 0.006702 . 0.000156 0.000225 0.000222 . 66.5 68.2 66.1 . 48.1 50.3 47.7 . 39.3 41.3 39 . 35.4 37.2 35.1 . 38.72 1.28 0.76 . -4.52 -0.42 -0.28 . 3.97 0.51 0.51 . -3.56 -0.35 -0.56 . Effect Intercept diet diet diet week week*diet week*diet week*diet week*week week*week*diet week*week*diet week*week*diet week*week*week week*week*week*diet week*week*week*diet week*week*week*diet Null Model Likelihood Ratio Test DF 189 diet Pr > ChiSq <.0001 503 diet 1 2 3 1 2 3 1 2 3 1 2 3 Pr > |t| <.0001 0.2053 0.4477 . <.0001 0.6750 0.7804 . 0.0003 0.6135 0.6097 . 0.0011 0.7297 0.5810 . 504 # This file posted as milkprotein.ssc # # This code is applied to the milk protein # data from DHLZ, page 8. Type 3 Tests of Fixed Effects Effect diet week week*diet week*week week*week*diet week*week*week week*week*week*diet set1 <- read.table("c:/mydocuments/courses/st565/data/dhlz.example1_4.data", col.names=c("diet","cow","week","protein")) set1 Num DF Den DF F Value Pr > F 2 1 2 1 2 1 2 67.3 49.3 49.2 40.4 40.3 36.4 36.3 0.83 68.79 0.09 58.04 0.18 46.62 0.16 0.4403 <.0001 0.9118 <.0001 0.8394 <.0001 0.8537 # Create factors set1$dietf <- as.factor(set1$diet) set1$weekf <- as.factor(set1$week) set1$cowf <- as.factor(set1$cow) # Sort the data set by subject i <- order(set1$cow,set1$week) set1 <- set1[i,] # Delete the list called i rm(i) 505 506 Observed Protein Means Make a profile plot of the means Unix users should insert the motif( ) command 3.0 x.axis <- unique(set1$week) 3.4 # # # Protein (percent) means <- tapply(set1$protein, list(set1$week,set1$diet),mean) means 3.8 # Compute sample means par(fin=c(6.0,6.0),pch=18,mkh=.1,mex=1.5, cex=1.2,lwd=3) matplot(c(1,19), c(3.0,4.0), type="n", xlab="Time(weeks)", ylab="Protein (percent)", main= "Observed Protein Means") matlines(x.axis,means,type=’l’,lty=c(1,3,7)) matpoints(x.axis,means, pch=c(16,17,15)) legend(1,2.45,legend=c("Barley diet", ’Barley+lupins’,’Lupin diet’),lty=c(1,3,7),bty=’n’) 5 10 15 Time(weeks) Barley diet Barley+lupins Lupin diet 507 508 AIC BIC logLik 466.7781 539.4265 -219.3891 # # # # Correlation Structure: Compound symmetry Formula: ~ 1 | cow Parameter estimate(s): Rho 0.3001093 Use the gls( ) function to fit a model where the errors have a compound symmetry covariance structure within cows. Coefficients: options(contrasts=c("contr.treatment","contr.poly")) (Intercept) dietf2 dietf3 week I(week^2) I(week^3) dietf2week dietf3week dietf2I(week^2) dietf3I(week^2) dietf2I(week^3) dietf3I(week^3) set1.glscs <- gls(protein ~ dietf+ week+ dietf*week+week^2+dietf*week^2 + week^3 + dietf*week^3, data=set1, correlation = corCompSymm(form=~1|cow), method=c("REML")) summary(set1.glscs) anova(set1.glscs) Value 3.926357 -0.038575 -0.131599 -0.160102 0.015522 -0.000427 -0.028928 -0.009489 0.003723 0.000407 -0.000134 -0.000021 Std.Error 0.06811861 0.09441974 0.09449531 0.02554698 0.00299418 0.00010094 0.03545611 0.03554717 0.00415876 0.00416915 0.00014023 0.00014057 t-value p-value 57.64001 <.0001 -0.40855 0.6829 -1.39265 0.1640 -6.26697 <.0001 5.18417 <.0001 -4.23121 <.0001 -0.81587 0.4147 -0.26694 0.7896 0.89519 0.3708 0.09754 0.9223 -0.95631 0.3391 -0.14699 0.8832 509 510 Standardized residuals: Min Q1 Med Q3 Max -3.1085 -0.6782905 -0.03438773 0.6791416 3.42049 # Try an auto regressive covariance # structures across weeks within cows Residual standard error: 0.3058223 Degrees of freedom: 1337 total; 1325 residual Denom. DF: 1325 numDF F-value p-value (Intercept) 1 28938.35 <.0001 dietf 2 8.42 0.0002 week 1 20.06 <.0001 I(week^2) 1 109.03 <.0001 I(week^3) 1 71.03 <.0001 dietf:week 2 3.55 0.0290 dietf:I(week^2) 2 0.06 0.9459 dietf:I(week^3) 2 0.54 0.5831 set1.glsar <- gls(protein ~ dietf+ week+ dietf*week+week^2+dietf*week^2 + week^3 + dietf*week^3, data=set1, correlation = corAR1(form=~1|cow), method=c("REML")) summary(set1.glsar) anova(set1.glsar) 511 512 Generalized least squares fit by REML Model: protein ~ dietf + week + dietf * week + week^2 + dietf * week^2 + week^3 + dietf * week^3 Data: set1 AIC BIC logLik 156.8072 229.4556 -64.40361 Standardized residuals: Min Q1 Med Q3 Max -3.272627 -0.6151624 -0.003893045 0.699279 3.373619 Correlation Structure: AR(1) Formula: ~ 1 | cow Parameter estimate(s): Phi 0.6519978 Residual standard error: 0.31561 Degrees of freedom: 1337 total; 1325 residual Denom. DF: 1325 numDF F-value p-value (Intercept) 1 40939.16 <.0001 dietf 2 13.05 <.0001 week 1 44.73 <.0001 I(week^2) 1 49.32 <.0001 I(week^3) 1 54.95 <.0001 dietf:week 2 1.42 0.2427 dietf:I(week^2) 2 0.15 0.8591 dietf:I(week^3) 2 0.57 0.5629 Coefficients: (Intercept) dietf2 dietf3 week I(week^2) I(week^3) dietf2week dietf3week dietf2I(week^2) dietf3I(week^2) dietf2I(week^3) dietf3I(week^3) Value 4.037225 0.003778 -0.117365 -0.197838 0.018849 -0.000520 -0.048511 -0.019394 0.006141 0.002009 -0.000222 -0.000091 Std.Error 0.0862360 0.1198494 0.1198399 0.0371623 0.0044381 0.0001495 0.0517505 0.0518068 0.0061814 0.0061863 0.0002081 0.0002083 t-value p-value 46.81599 <.0001 0.03153 0.9749 -0.97935 0.3276 -5.32363 <.0001 4.24704 <.0001 -3.48125 0.0005 -0.93741 0.3487 -0.37435 0.7082 0.99346 0.3207 0.32471 0.7455 -1.06449 0.2873 -0.43779 0.6616 514 513 # # # Try a general correlation structure set1.glss <- gls(protein ~ dietf + week+ dietf*week +week^2+dietf*week^2 + week^3 + dietf*week^3, data=set1, correlation = corSymm(form=~1|cow), weights = varIdent(form = ~1 | weekf), method=c("REML")) summary(set1.glss) anova(set1.glss) # # Compare the fit of various covariance structures. anova(set1.glss, set1.glscs) anova(set1.glss, set1.glsar) anova(set1.glss, set1.glsar, set1.glsarh) # # # # Try an AR(1) correlation structure with hterogeneous variances To compare the continuous week model to the model where we fit a different mean at each time point, we must compare likelihood values instead of REML likelihood values. set1.glsarmle <- gls(protein ~ dietf+ weekf+dietf*weekf, data=set1, correlation = corAR1(form=~1|cow), method=c("ML")) set1.glss <- gls(protein ~ dietf + week+ dietf*week +week^2+dietf*week^2 + week^3 + dietf*week^3, data=set1, correlation = corAR1(form=~1|cow), weights = varIdent(form = ~ 1 | week), method=c("REML")) summary(set1.glsarh) anova(set1.glss) set1.glscarmle <- gls(protein ~ dietf+ week+ dietf*week + week^2 + dietf*week^2 + week^3 + dietf*week^3, data=set1, correlation = corAR1(form=~1|cow), method=c("ML")) 515 516 corStruct functions corCompSymm corSym corAR1 corCAR1 corARM A corEXP anova(set1.glsarmle,set1.glscarmle) set1.glsarmle set1.glscarmle set1.glsarmle set1.glacarmle Model df AIC BIC 1 59 19.19311 325.8859 2 14 29.85783 102.6324 logLik 49.40345 -0.92891 1 vs 2 100.6647 Test L.Ratio p-value 1 vs 2 100.6647 <.0001 <.0001 corGaus corLin corRatio corSpher 517 compound symmetry general autoregressive of order 1 continous time AR(1) autoregressive-moving average exponential 1 − exp(−s/ρ) expGaus 1 − exp[−(s/ρ)2 ] linear 1 − (1 − s/ρ)I(s < ρ) rational quadratic (s/ρ)2 /[1 + (s/ρ)2 ] spherical 1 − [1 − 1.5(s/ρ) + 0.5(sρ)3 ]I(s < ρ) 518