• Suppose we have i=1,2,...m groups of observations Marginal and Random Effects Models for Discrete Data • There are ni observations in the i-th group denoted by • Chapters 7-10 in Diggle, Heagerty, Liang and Diggle Yi1, Yi2, . . . , Yi,ni for i = 1,2,....,m • Generalized Linear Models • The observations in any group are independent of the observations in any other group – Poisson regression (log-linear models) – Logistic regression models • Within a group, the observations may be correlated 580 581 Example: Logistic regression Generalized Linear Models • Yij ∼ Binomial(nij , πij ) Each model has a systematic part and a random part • E(Yij ) = μij = nij πij • Logit link function: • Expectation of the response ⎛ T β) E(Yij ) = μij = h−1(Xij ⎜ h(μij ) = log ⎜⎜⎝ μij nij − μij ⎞ ⎛ ⎟ ⎟ ⎟ ⎠ = log ⎜⎜⎝ πij ⎜ 1 − πij ⎞ ⎟ ⎟ ⎟ ⎠ where h is a known link function, or Tβ h(μij ) = Xij ⎛ ⎞ π ij • log ⎝ 1−π ⎠ ij • Specify the distribution of Yij ⇒ – Yij ∼ P oisson(μij ) = β0 + β1X1ij + . . . + βk Xkij πij = T β) exp(Xij T β) 1 + exp(Xij • exp(βr ) is a conditional odds ratio: The odds of success with Xr +1 divided by the odds of success with Xr when all other variables are held constant – Yij ∼ Binomial(n, π) 582 583 If the Yij are mutually independent, the likelihood function is L(β) = n m i ⎛ ⎞ nij ⎟⎟ Yij nij −Yij ⎟π ⎠ ij (1 − πij ) i=1 j=i Yij ⎜ ⎜ ⎜ ⎝ exp(X T β) ij where πij = T β) 1+exp(Xij Likelihood Equations: and the log-likelihood function is ⎡ (β) = ⎤ ni m ⎢ ⎣ i=1 j=1 log(nij !) − log(Yij !) − log((nij − Yij )!) ni m + Yij log(πij ) + i=1 j=1 ⎡ = i=1 j=1 ni ⎛ Yij log ⎜ ⎝ i=1 j=1 m = ni i=1 j=1 + m (nij − Yij ) log(1 − πij ) ∂β0 1 − πij ⎟ ⎠ + m ni i=1 j=1 m ni i=1 j=1 Yij − (i,j) nij T exp(Xij β) T β) 1 + exp(Xij (Yij − nij πij ) i=1 j=1 ∂ 0 = ∂βr = ⎞ πij = ni m = log(nij !) − log(Yij !) − log((nij − Yij )!)⎥⎦ m + i=1 j=1 ⎤ ni m ⎢ ⎣ ni m ∂ 0 = ⎥ ⎦ = m m ni i=1 j=1 nij log(1 − πij ) ni i=1 j=1 Yij Xrij − m ni nij Xrij i=1 j=1 Xrij (Yij − nij πij ) T exp(Xij β) T β) 1 − exp(Xij for r = 1, 2, ..., k [log(nij !) − log(Yij !) − log((nij − Yij )!)] ni i=1 j=1 T Yij Xij β− m ni i=1 j=1 T nij log(1 + exp(Xij β)) 584 585 Second partial derivatives likelihood function are: For the i-th group of responses let ⎡ Xi = ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ T Xi1 T Xi2 .. T Xi,n i ⎤ ⎡ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ Yi = Yi1 Yi2 .. Yi,ni ⎤ ⎡ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ μi = ni1πi1 ni2πi2 .. ni,ni πi,ni ⎡ ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ H = Stack these on top of each other to form a model matrix and vectors of responses and estimated means for the entire data set: ⎡ X= ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ X1 X2 .. Xm ⎤ ⎡ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ Y= Y1 Y2 .. Ym ⎤ ⎡ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ μ= μ1 μ2 .. μm ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ −∂ 2 ∂β02 −∂ 2 ∂β0∂β1 −∂ 2 −∂ 2 ∂β1 ∂βk .. ∂β0∂βk .. of the 2 −∂ · · · ∂β 0 ∂βk .. ... ··· −∂ 2 ∂βk2 log⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ = (X T V X) = m i=1 XiT ViXi where ⎡ Vi = ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ nij πi1(1 − πi1) ⎤ ... ni,ni πi,ni (1 − πi,ni ) for i=1,2,...,m, and The likelihood equations are ⎡ V = V ar(Y ) = 0 = Q = X T (Y − μ) = m i=1 ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ V1 ⎤ ... Vm ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ XiT (Yi − μi) H is the Fisher information matrix and it does not depend on the observed counts. 586 587 ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ Likelihood equations are solved iteratively • Usually no closed form solution exists The likelihood equations can be written as 0 = X T (Y − μ) • Newton-Raphson algorithm is equivalent to Fisher scoring β̂ (S+1) = β̂ (S) + Ĥ −1 Q̂ these are evaluated at β̂ = X T V V −1(Y − μ) = (S) = • Modification (halving): (S+1) than at β̂ (S+1) ⎡ Di = XiT Vi = ⎛ ⎞ (S) (S+1)⎟ ⎠ = 12 ⎜⎝β̂ + β̂ and repeat the check. • Starting values: β̂r(0) = 0 r=1,2,..k ⎛ ⎞ ΣY (0) and β̂0 = log ⎝ P̂ ⎠ where P̂ = Σnij 1−P̂ ij 588 Generalized Estimating Equations (GEE) To estimate β solve 0 = m i=1 Di = ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ DiVi−1(Yi − μi) ∂μi1 ∂β0 ∂μi1 ∂β1 ∂μi,n i ∂β0 ∂μi,n i ∂β1 .. DiVi−1(Yi − μi) ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ∂μi1 ∂β0 ∂μi1 ∂β1 .. ··· .. ∂μi1 ∂βk ∂μi,n i ∂β0 ∂μi,n i ∂β1 ··· ∂μi,n i ∂βk .. .. ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ and μij = nij πij = nij T β) exp(Xij T β) 1 + exp(Xij 589 Marginal Model for Correlated Binary Responses Suppose several measurements are taken on the same subject or experimental unit • Examine a patient for the presence or absence of a rash at different points in time where Vi = V ar(Yi) and ⎡ XiT ViVi−1(Yi − μi) where (S) – If it is, go to the next iteration – If not, try β̂ m i=1 – Check if the log-likelihood is larger at β̂ m i=1 .. ··· .. ∂μi1 ∂βk ··· ∂μi,n i ∂βk .. • Mastitis infections in different lactation periods of a dairy cow ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ • Survival of ducklings to migration time and μij = nij πij = nij • Present or absence of tumors in different organs T β) exp(Xij T β) 1 + exp(Xij 590 591 Suppose patients treated for a reoccurring rash are examined at a series of time points. For the i − th patient at the j − th time point Yij = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 0 ⎜ log ⎜⎜⎝ ρjr = ρ ρjr = ρ|j−r| if rash is present if rash is absent α ρjr = ρ|tj −tr | Assume Yij ∼ Binomial(1, πij ) where ⎛ Some possibilities are: ⎞ πij ⎟⎟ ⎟ = β 0 + β1X1ij + · · · + βk Xkij ⎠ 1 − πij Specify a covariance matrix for Yi = (Yi1, . . . , Yi,ni )T V ar(Yij ) = φπij (1 − πij ) Cov(Yij , Yir ) = φρjr πij (1 − πij )πir (1 − πir ) • The marginal variance depends on the marginal mean, V ar(Yij ) = φ ν(πij ) where ν is a known variance function and φ is a scale parameter that may have to be estimated, e.g. ν(πij ) = πij (1 − πij ) φ=1 • correlation between Yij and Yir is Corr(Yij , Yir ) = ρ(πij , πir ; α) where ρ is a known function 592 You must estimate association parameters (e.g. α) so you can evaluate Vi in the GEE’s 593 • Get an updated estimate β̂GEE by solving • Get an initial estimate β̂IW M by solving IWM equations 0 = m i=1 −1 DiVIW M,i(Yi − μi) where VIW M,i is the diagonal part of Vi = V ar(Yi) i=1 ⎡ m ⎢ ⎣ i=1 ⎤ D̂iV̂i−1D̂iT ⎥⎦ −1 • A robust (empirical) estimate of the covariance matrix for β̂GEE is ˆ (Yij − μ̂IW M,ij )/ V ar(Y ij ) to estimate association (see page 147 in DHLZ) D̂iV̂i−1(Yi − μi) • The large sample (large m) distribution of β̂GEE is approximately multivariate normal with expectation β and covariance matrix (model estimate) • Obtain μ̂IW M,i by evaluating μi at β̂IW M • Use values of m 0 = parameters ⎡ ⎣ m i=1 ⎤−1 ⎡ D̂iV̂i−1D̂iT ⎦ ⎣ m i=1 ⎤ D̂iV̂i−1 [Yi − μ̂i][Yi − μ̂i ]T V̂i−1D̂iT ⎦ ⎡ ×⎣ • Use the estimates of the association parameters to evaluate V̂i 594 m i=1 ⎤−1 D̂iV̂i−1 D̂iT ⎦ 595 data set1; infile "c:\stat565\dhlz.example8_1.data"; input patient class y int x1 x2 x12 order ; /* Recode the treatment variable */ x1=abs(1-x1); x12=x1*x2; run; /* Analyze the data from a crossover trial on cerebrovascular deficiency from example 8.1 in DHLZ */ /* THE VARIABLES WERE CODED AS FOLLOWS: ID = CLUSTER VARIABLE CLASS = 1, NEEDED TO RUN GEE2 Y = 1 0 abnormal electrocardiogram normal electrocardiogram Int = 1 (INTERCEPT) /* First fit a model with an order effect */ X1 = 1 placebo (treatment B) 0 active drug A proc genmod data=set1 descending; class patient; model y = x1 x2 x12 / dist=binomial link=logit itprint pscale converge=1e-8 maxit=50; repeated subject=patient / type=un modelse covb corrw; run; X2 = 1 if period 2 0 if period 1 X12 = X1*X2 order = 1 if B before A 0 if A before B REFERENCE: JONES AND KENWARD(1989) CHAPMAN AND HALL, P.9 */ 596 597 The GENMOD Procedure Criteria For Assessing Goodness Of Fit Model Information Data Set Distribution Link Function Dependent Variable Observations Used Criterion WORK.SET1 Binomial Logit y 134 Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X2 Log Likelihood Values patient 1 2 3 17 18 30 31 43 44 56 57 4 5 6 19 20 32 33 45 46 58 59 7 8 9 21 22 34 35 47 48 60 61 10 23 36 49 62 11 24 37 50 63 12 25 38 51 64 13 26 39 52 65 Value Value/DF 130 130 130 130 162.0983 162.0983 133.9999 133.9999 -81.0491 1.2469 1.2469 1.0308 1.0308 Analysis Of Initial Parameter Estimates Class Level Information Class DF 14 27 40 53 66 15 28 41 54 67 16 29 42 55 Response Profile Ordered Value y Total Frequency 1 2 1 0 92 42 PROC GENMOD is modeling the probability that y=’1’. 598 Parameter DF Estimate Intercept x1 x2 x12 Scale 1 1 1 1 0 0.4308 1.1097 0.1754 -1.0226 1.0000 Standard Wald 95% Error Confidence Limits 0.3563 0.5738 0.5057 0.7710 0.0000 -0.2675 -0.0151 -0.8158 -2.5338 1.0000 Parameter Pr > ChiSq Intercept x1 x2 x12 Scale 0.2266 0.0531 0.7288 0.1847 1.1290 2.2344 1.1665 0.4885 1.0000 NOTE: The scale parameter was held fixed. 599 ChiSquare 1.46 3.74 0.12 1.76 Covariance Matrix (Model-Based) Prm1 GEE Model Information Correlation Structure Subject Effect Number of Clusters Correlation Matrix Dimension Maximum Cluster Size Minimum Cluster Size Unstructured patient (67 levels) 67 2 2 2 Prm1 Prm2 Prm3 Prm4 Prm2 0.12692 -0.12692 -0.12692 0.21114 -0.12692 0.32930 0.23027 -0.51687 Prm1 Gradient Prm1 Prm2 Prm3 Prm4 1.882E-11 17.376763 5.8342866 10.618293 4.1806177 Prm2 9.522E-11 5.8342866 20.797139 5.7067855 12.425128 Prm3 Prm4 -7.64E-11 10.618293 5.7067855 25.581146 12.425128 -1.3E-15 4.1806177 12.425128 12.425128 12.425128 Prm4 -0.12692 0.23027 0.25571 -0.44328 0.21114 -0.51687 -0.44328 0.96959 Covariance Matrix (Empirical) Prm1 Last Evaluation Of The Generalized Gradient And Hessian Prm3 Prm1 Prm2 Prm3 Prm4 Prm2 0.12692 -0.12692 -0.12692 0.20769 Prm3 -0.12692 0.32930 0.22811 -0.51126 Prm4 -0.12692 0.22811 0.25571 -0.43767 0.20769 -0.51126 -0.43767 0.95837 Working Correlation Matrix Row1 Row2 Col1 Col2 1.0000 0.6402 0.6402 1.0000 600 601 Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Parameter Estimate Intercept x1 x2 x12 0.4308 1.1097 0.1754 -1.0227 Standard Error 0.3563 0.5739 0.5057 0.9790 95% Confidence Limits -0.2675 -0.0151 -0.8158 -2.9414 1.1290 2.2344 1.1665 0.8961 Z 1.21 1.93 0.35 -1.04 Pr>|Z| 0.2266 0.0531 0.7288 0.2962 Analysis Of GEE Parameter Estimates Model-Based Standard Error Estimates Parameter Estimate Intercept x1 x2 x12 Scale 0.4308 1.1097 0.1754 -1.0227 1.0000 Standard Error 0.3563 0.5739 0.5057 0.9847 . 95% Confidence Limits -0.2675 -0.0151 -0.8158 -2.9526 . 1.1290 2.2344 1.1665 0.9073 . Z 1.21 1.93 0.35 -1.04 . Pr>|Z| 0.2266 0.0531 0.7288 0.2990 . /* Fit a model ignoring any order effect */ proc genmod data=set1 descending; class patient; model y = x1 x2 / dist=binomial link=logit itprint pscale; repeated subject=patient / type=un modelse covb corrw; run; NOTE: The scale parameter was held fixed. 602 603 Working Correlation Matrix Criteria For Assessing Goodness Of Fit Criterion Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X2 Log Likelihood DF Value 131 131 131 131 163.8863 163.8863 133.5123 133.5123 -81.9432 DF Estimate Standard Error Intercept x1 x2 Scale 1 1 1 0 0.6604 0.5582 -0.2743 1.0000 0.3213 0.3784 0.3768 0.0000 Parameter Row1 Row2 1.2510 1.2510 1.0192 1.0192 Analysis Of Initial Parameter Estimates Parameter Col2 1.0000 0.6389 0.6389 1.0000 Parameter Estimate Wald 95% Confidence Limits ChiSquare 0.0307 -0.1835 -1.0129 1.0000 4.23 2.18 0.53 1.2901 1.2998 0.4642 1.0000 Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Intercept x1 x2 0.6660 0.5690 -0.2953 Pr>ChiSq 0.0398 0.1402 0.4666 Standard Error 95% Confidence Limits 0.2879 0.2327 0.2311 0.1017 0.1129 -0.7483 1.2303 1.0252 0.1577 Z 2.31 2.45 -1.28 Pr>|Z| 0.0207 0.0145 0.2013 Analysis Of GEE Parameter Estimates Model-Based Standard Error Estimates Parameter Estimate Intercept x1 x2 Scale Col1 Value/DF Intercept x1 x2 Scale 0.6660 0.5690 -0.2953 1.0000 Standard Error 95% Confidence Limits 0.2842 0.2288 0.2272 . 0.1090 0.1206 -0.7405 . 1.2231 1.0174 0.1499 . 604 Z 2.34 2.49 -1.30 . Pr>|Z| 0.0191 0.0129 0.1936 . 605 Criteria For Assessing Goodness Of Fit Criterion Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X2 Log Likelihood /* Fit a model assuming completely independent responses */ proc genmod data=set1 descending; class patient; model y = x1 x2 / dist=binomial link=logit itprint robust aggregate=patient; run; DF Value 131 131 131 131 163.8863 163.8863 133.5123 133.5123 -81.9432 Value/DF 1.2510 1.2510 1.0192 1.0192 Analysis Of Parameter Estimates 606 Parameter DF Estimate Standard Error Intercept x1 x2 Scale 1 1 1 0 0.6604 0.5582 -0.2743 1.0000 0.3293 0.3781 0.3765 0.0000 Wald 95% Confidence Limits 0.0151 -0.1829 -1.0122 1.0000 Parameter Pr>ChiSq Intercept x1 x2 Scale 0.0449 0.1399 0.4662 1.3057 1.2992 0.4636 1.0000 607 ChiSquare 4.02 2.18 0.53 Respiratory Infection in Indonesian Preschool Children • Sommer et al. (1984) Amer. Jour. of Clinical Nutrition, 40, 1090-1095. Cross-sectional Analysis • Use only data measured at entry into the study • Fit a logistic regression model • DHLZ, Examples 8.4 ⎛ • Is prevalence of respiratory infection higher among children who suffer xerophthalmia, a manifestation of chronic vitamin A deficiency? ⎜ log ⎜⎝ ⎞ πi ⎟⎟ ⎠ = 1 − πi −1.47 − 0.66(sex) (0.36) (0.44) −0.11(height) (0.041) +0.44(xerophthalmia) (1.15) • Does prevale of respiratory infection change with age? −0.089(age) − .0026(age2) (0.027) (0.0011) • 275 preschool children • Examined in up to six consecutive quarters • The sex and xerophthalmia effects are not significant 608 Longitudinal and cross-sectional analysis • Separate time into two components • agei1 = age at entry into the study • (ageij − agei1) = time since entry • Fit the model ⎛ ⎞ πij ⎟⎟ ⎜ ⎟ = log ⎜⎜⎝ ⎠ 1 − πij −2.21 − 0.53(sexi) (0.32) (0.24) −0.048(heighti) (0.024) +0.53(xerophthalmiai) (0.45) −0.053(agei1) − .0013(age2i2) (0.013) (0.0005) −0.19(ageij − agei1) (0.071) +.013(ageij − agei1)2 (0.004) 610 609 • No significant effect of presence of xerophthalmia on incidence of respiratory infection • Incidence of respiratory infection increases up to about 20 months, then it declines • Quadratic follow-up time effect can be explained by seasonality (higher incidence of respiratory infection in summer) • Association among repeated measures ˆ log(γ) = 0.49 (0.26) where γ= P r(Yj = 1, Yk = 1)P r(Yj = 0, Yk = 0) P r(Yj = 0, Yk = 1)P r(Yj = 1, Yk = 0) for all (j, k) 611 Transect factors: Quail Egg Predation • x1 = • Examine effects of local environmental conditions on predation rates • Sampling units: 136 transects • x2 = • x3 = – Along Iowa roads – Each transect is divided into two sub-units • x4 = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 0 paved road 1 unpaved road 0 row crops 1 other 0 fence or trees 1 no fence or trees 0 trees 1 no trees ∗ foreslope Sub-transect factor: ∗ backslope • Z= ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 0 foreslope 1 backslope 612 613 Binary Response Yijk = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 nest was disturbed 1 otherwise A logistic regression model for the k-th nest in the j -th sub-transect of the i-th transect. ⎛ log Conditional expectation given the values of the transect and sub-transect variables is E(Yijk ) = πij ⎜ ⎜ ⎜ ⎝ πij 1 − πij ⎞ ⎟ ⎟ ⎟ ⎠ = β0 + β1X1i + β2X2i + β3X3i +β4X4i + β5X1iX2i + β6X2iX4i +β7X1iX3i + β8Z the conditional probability that a nest is disturbed. This is the same for all nests in a particular sub-transect. 614 615 Distributional assumptions • Number of disturbed nests in the j -th sub-transect of the i-th transect is • Positive correlations Yij+ = Yij1 + Yij2 + · · · + Yijnij • Transects are far enough apart for ⎡ ⎢ ⎢ ⎣ Yi1+ Yi2+ – If a predator finds one nest it will look for more ⎤ ⎥ ⎥ ⎦ i = 1, 2, ..., m – It will attract the attention of other predators to be independent ⎛ • Variance V V ar(Yij+) = n ij V ar(Yijk ) + 2 k=1 = nij πij (1 − πij )(1 + ⎞ ⎟ ⎟ ⎠ ⎡ 1/2 ⎢⎢ = Vi ⎣ ⎤ 1 ρ ⎥⎥ 1/2 ⎦ Vi ρ 1 where k< 2 nij Yi1+ Yi2+ Cov(Yijk , Yij) k< = nij πij (1 − πij ) + 2 ⎜ ⎜ ⎝ ρk πij (1 − πij ) ⎡ Vi = ⎣ θ1ni1 πi1(1 − πi1 ) 0 0 θ2ni2 πi2(1 − π2) ⎤ ⎦ ρk) k< = θij nij πij (1 − πij ) ≥ binomial variance 616 /* This program uses features in PROC GENMOD in SAS to fit logistic regression models to nest predation data from a split-plot experiment. . The code is stored in the file 617 /* Sort data with respect to the transects and sub-transects */ proc sort data=set1; by loc slope; run; nestgee.sas The data are stored in a different file we currently do not have permission to give to you. */ /* Compute parameter estimates and the covariance matrix for the IWM model using GENMOD */ data set1; infile ’c:\st557\nestall.dat’; input wshed $ loc $ round roadside $ roadtype $ adjhab $ rdepth rwidth foreback percip mtemp ftotal floss btotal bloss ctotal closs; if(roadtype=’grav’) then x1=1; else x1=0; if(adjhab=’nrc’) then x2=1; else x2=0; if(roadside=’nfen’) then x3=1; else x3=0; if(roadside=’wd’) then x4=1; else x4=0; x5=x1*x2; x6=x2*x4; x7=x1*x3; slope=0; loss=floss; total=ftotal; output; slope=1; loss=bloss; total=btotal; output; keep wshed loc x1 x2 x3 x4 x5 x6 x7 slope loss total; run; proc genmod data=set1; class loc; model loss/total= x1 x2 x3 x4 x5 x6 x7 slope/ dist=binomial link=logit itprint converge=1e-8 maxit=50; run; 618 619 Parameter Prm1 Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Prm8 Prm9 The GENMOD Procedure Model Information Data Set Distribution Link Function Response Variable (Events) Response Variable (Trials) Observations Used Number Of Events Number Of Trials WORK.SET1 Binomial Logit loss total 272 307 1332 Iter Ridge Class Level Information Class Levels loc Log Likelihood 0 0 -704.50723 1 0 -680.23768 2 0 -679.60783 3 0 -679.60595 4 0 -679.60595 5 0 -679.60595 136 Class Level Information Class Values loc 1 10 100 101 102 103 104 105 106 107 108 109 11 110 111 112 113 114 115 116 117 118 119 12 120 121 122 123 124 125 126 127 128 129 13 130 131 132 133 134 135 136 14 15 16 17 18 19 2 20 21 22 23 24 25 26 27 28 29 3 30 31 32 33 34 35 36 37 38 39 4 40 41 ... Effect Intercept x1 x2 x3 x4 x5 x6 x7 slope Prm1 Prm4 Prm7 -0.954212 -0.057336 0.5526403 -1.468752 0.1193446 1.2046592 -1.546383 0.1514569 1.3454385 -1.548711 0.152238 1.3505862 -1.548714 0.1522385 1.3505943 -1.548714 0.1522385 1.3505943 Prm2 Prm5 Prm8 -0.164353 0.139895 -0.364248 0.0036198 -0.195381 -0.910456 0.0315938 -0.269986 -1.102685 0.0323312 -0.272661 -1.117141 0.0323321 -0.272664 -1.117218 0.0323321 -0.272664 -1.117218 620 Prm3 Prm6 Prm9 -0.66138 0.4733128 0.5854818 -0.890005 0.5827854 0.7281776 -0.99817 0.6511028 0.7832247 -1.005069 0.6561795 0.7856011 -1.005088 0.6561945 0.7856059 -1.005088 0.6561945 0.7856059 621 Criteria For Assessing Goodness Of Fit Criterion Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X2 Log Likelihood DF Value Value/DF 263 263 263 263 546.8136 546.8136 491.7968 491.7968 -679.6059 2.0791 2.0791 1.8699 1.8699 The GENMOD Procedure Last Evaluation Of The Negative Of The Gradient and Hessian Prm1 Grad Prm1 Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Prm8 Prm9 3.1443E-8 222.26001 179.74131 81.501383 23.896581 49.417896 70.05557 33.959366 13.147433 132.96838 Gradient Prm1 Prm2 Prm2 3.0052E-8 179.74131 179.74131 70.05557 13.147433 45.679179 70.05557 30.220649 13.147433 106.98113 Prm3 Prm4 Prm5 7.2257E-9 2.9764E-8 2.929E-10 81.501383 23.896581 49.417896 70.05557 13.147433 45.679179 81.501383 3.2291792 33.959366 3.2291792 23.896581 -3.55E-15 33.959366 -3.55E-15 49.417896 70.05557 2.1569729 30.220649 33.959366 -1.07E-14 33.959366 2.1569729 13.147433 -1.42E-14 48.22479 14.886677 28.289297 Prm6 Prm7 Prm8 Prm9 5.8884E-9 70.05557 70.05557 2.188E-10 33.959366 30.220649 2.9604E-8 13.147433 13.147433 622 1.76E-8 132.96838 106.98113 Analysis Of Parameter Estimates Parameter DF Estimate Standard Error Intercept x1 x2 x3 x4 x5 x6 x7 slope Scale 1 1 1 1 1 1 1 1 1 0 -1.5487 0.0323 -1.0051 0.1522 -0.2727 0.6562 1.3506 -1.1172 0.7856 1.0000 0.2277 0.2380 0.3618 0.3621 0.2766 0.3897 0.3561 0.4646 0.1371 0.0000 Wald 95% Confidence Limits -1.9950 -0.4342 -1.7143 -0.5574 -0.8147 -0.1076 0.6527 -2.0279 0.5169 1.0000 Parameter Pr > ChiSq Intercept x1 x2 x3 x4 x5 x6 x7 slope Scale <.0001 0.8920 0.0055 0.6741 0.3242 0.0922 0.0001 0.0162 <.0001 -1.1024 0.4989 -0.2959 0.8619 0.2694 1.4200 2.0485 -0.2066 1.0543 1.0000 623 ChiSquare 46.26 0.02 7.72 0.18 0.97 2.84 14.39 5.78 32.84 Analysis Of Initial Parameter Estimates /* Compute GEE parameter estimates for an unstructured covariance structure. With only two sub-plots this is equivalent to the exchangeable correlation model */ proc genmod data=set1; class loc ; model loss/total= x1 x2 x3 x4 x5 x6 x7 slope / dist=binomial link=logit itprint converge=1e-8 maxit=50; repeated subject=loc / type=un modelse covb corrw; run; Parameter DF Estimate Intercept x1 x2 x3 x4 x5 x6 x7 slope Scale 1 1 1 1 1 1 1 1 1 0 -1.5487 0.0323 -1.0051 0.1522 -0.2727 0.6562 1.3506 -1.1172 0.7856 1.0000 Standard Wald 95% Error Confidence Limits 0.2277 0.2380 0.3618 0.3621 0.2766 0.3897 0.3561 0.4646 0.1371 0.0000 -1.9950 -0.4342 -1.7143 -0.5574 -0.8147 -0.1076 0.6527 -2.0279 0.5169 1.0000 Parameter Pr > ChiSq Intercept x1 x2 x3 x4 x5 x6 x7 slope Scale <.0001 0.8920 0.0055 0.6741 0.3242 0.0922 0.0001 0.0162 <.0001 -1.1024 0.4989 -0.2959 0.8619 0.2694 1.4200 2.0485 -0.2066 1.0543 1.0000 ChiSquare 46.26 0.02 7.72 0.18 0.97 2.84 14.39 5.78 32.84 NOTE: The scale parameter was held fixed. 624 625 Last Evaluation Of The Generalized Gradient And Hessian GEE Model Information Correlation Structure Subject Effect Number of Clusters Correlation Matrix Dimension Maximum Cluster Size Minimum Cluster Size Unstructured loc (136 levels) 136 2 2 2 Iteration History For GEE Parameter Estimates Iter 0 1 2 3 Prm1 Prm6 -1.548714 0.6561945 -1.548991 0.7019773 -1.548988 0.7019642 -1.548989 0.70197 Prm2 Prm7 0.0323321 1.3505943 0.0334912 1.3096132 0.0334707 1.3102313 0.033471 1.3102316 Prm3 Prm8 -1.005088 -1.117218 -1.033577 -1.133245 -1.033953 -1.133288 -1.033956 -1.133289 Prm4 Prm9 Prm5 0.1522385 0.7856059 0.1479605 0.7843777 0.1479469 0.7843802 0.1479475 0.7843806 -0.272664 626 -0.24497 -0.245076 -0.245078 Grad Prm1 Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Prm8 Prm9 Prm1 Prm2 Prm3 Prm4 Prm5 -9.089E-7 176.97859 143.34113 64.883073 18.897032 39.396613 55.922266 26.788608 10.399388 111.95357 -8.395E-7 143.34113 143.34113 55.922266 10.399388 36.484802 55.922266 23.876798 10.399388 90.062312 -6.848E-7 64.883073 55.922266 64.883073 2.5661978 26.788608 55.922266 26.788608 1.7283638 40.460711 -6.721E-8 18.897032 10.399388 2.5661978 18.897032 7.105E-15 1.7283638 1.421E-14 10.399388 12.591011 4.9742E-7 39.396613 36.484802 26.788608 7.105E-15 39.396613 23.876798 26.788608 7.105E-15 23.549996 Gradient Prm1 Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Prm8 Prm9 Prm6 -6.316E-7 55.922266 55.922266 55.922266 1.7283638 23.876798 55.922266 23.876798 1.7283638 34.328207 Prm7 6.9117E-7 26.788608 23.876798 26.788608 1.421E-14 26.788608 23.876798 26.788608 1.421E-14 14.994503 Prm8 -6.623E-8 10.399388 10.399388 1.728363 10.399388 7.105E-15 1.7283638 1.421E-14 10.399388 7.2474518 Prm9 -6.46E-7 111.95357 90.062312 40.460711 12.591011 23.549996 34.328207 14.994503 7.2474518 143.3098 627 Covariance Matrix (Empirical) Covariance Matrix (Model-Based) Prm1 Prm1 Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Prm8 Prm9 0.11642 -0.10598 -0.09483 -0.09654 0.0009203 0.09609 -0.004322 0.09756 -0.01674 Prm2 Prm3 Prm4 Prm5 -0.10598 0.13328 0.09637 0.09610 -0.02771 -0.12360 0.02835 -0.11929 0.000612 -0.09483 0.09637 0.31102 0.06537 -0.000239 -0.29506 -0.03778 -0.06823 -0.001912 -0.09654 0.09610 0.06537 0.30971 0.0000137 -0.06766 0.005461 -0.30932 0.000641 0.000920 -0.02771 -0.000239 0.0000137 0.17605 0.02679 -0.17510 0.02337 -0.001448 Prm1 Prm1 Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Prm8 Prm9 0.13142 -0.12181 -0.11250 -0.10806 0.002421 0.10950 -0.003849 0.10795 -0.01571 Prm1 Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Prm8 Prm9 Prm7 0.09609 -0.12360 -0.29506 -0.06766 0.02679 0.36020 -0.03929 0.08361 0.001057 -0.004322 0.02835 -0.03778 0.005461 -0.17510 -0.03929 0.29445 -0.01955 0.004114 -0.12181 0.14966 0.11228 0.10285 -0.02860 -0.14174 0.03128 -0.13426 0.001040 Prm3 Prm4 Prm5 -0.11250 0.11228 0.20404 0.06787 0.000369 -0.19443 -0.03371 -0.06493 0.0000132 -0.10806 0.10285 0.06787 0.39579 -0.000521 -0.07099 0.009810 -0.39647 0.008370 0.002421 -0.02860 0.0003693 -0.000521 0.22449 0.02818 -0.22502 0.03198 -0.003726 Covariance Matrix (Empirical) Covariance Matrix (Model-Based) Prm6 Prm2 Prm8 Prm6 Prm9 0.09756 -0.11929 -0.06823 -0.30932 0.02337 0.08361 -0.01955 0.50996 -0.002421 -0.01674 0.000612 -0.001912 0.000641 -0.001448 0.001057 0.004114 -0.002421 0.02590 Prm1 Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Prm8 Prm9 0.10950 -0.14174 -0.19443 -0.07099 0.02818 0.26392 -0.03766 0.10646 0.005864 Prm7 -0.003849 0.03128 -0.03371 0.009810 -0.22502 -0.03766 0.35906 -0.04859 0.002353 Prm8 Prm9 0.10795 -0.13426 -0.06493 -0.39647 0.03198 0.10646 -0.04859 0.57248 -0.008068 -0.01571 0.001040 0.0000132 0.008370 -0.003726 0.005864 0.002353 -0.008068 0.02399 628 629 Working Correlation Matrix Row1 Row2 Col1 Col2 1.0000 0.2681 0.2681 1.0000 Analysis Of GEE Parameter Estimates Model-Based Standard Error Estimates Parameter Estimate Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Parameter Estimate Intercept x1 x2 x3 x4 x5 x6 x7 slope -1.5490 0.0335 -1.0340 0.1479 -0.2451 0.7020 1.3102 -1.1333 0.7844 Standard Error 0.3625 0.3869 0.4517 0.6291 0.4738 0.5137 0.5992 0.7566 0.1549 95% Confidence Limits -2.2595 -0.7248 -1.9193 -1.0851 -1.1737 -0.3049 0.1358 -2.6162 0.4808 -0.8385 0.7917 -0.1486 1.3810 0.6836 1.7089 2.4847 0.3497 1.0880 Z -4.27 0.09 -2.29 0.24 -0.52 1.37 2.19 -1.50 5.06 630 Pr>|Z| <.0001 0.9311 0.0221 0.8141 0.6050 0.1718 0.0288 0.1342 <.0001 Intercept x1 x2 x3 x4 x5 x6 x7 slope Scale -1.5490 0.0335 -1.0340 0.1479 -0.2451 0.7020 1.3102 -1.1333 0.7844 1.3673 Standard Error 0.3412 0.3651 0.5577 0.5565 0.4196 0.6002 0.5426 0.7141 0.1609 . 95% Confidence Limits -2.2177 -0.6821 -2.1270 -0.9428 -1.0675 -0.4743 0.2467 -2.5329 0.4690 . -0.8802 0.7490 0.0591 1.2387 0.5773 1.8783 2.3738 0.2663 1.0998 . Z -4.54 0.09 -1.85 0.27 -0.58 1.17 2.41 -1.59 4.87 . Pr>|Z| <.0001 0.9269 0.0637 0.7904 0.5592 0.2421 0.0158 0.1125 <.0001 . NOTE: The scale parameter for GEE estimation was computed as the square root of the normalized Pearson’s chi-square. 631 Specify a log-linear model for the mean response Tβ log(μij ) = β0 + β1X1ij + . . . + βk Xkij = Xij Poisson Regression Suppose Yij ∼ P oisson(μij ) is a count provided by the i-th subject at the j -th inspection time • Number of skin tumors on a mouse Specify a Poisson distribution for each observed count Yij ∼ P oisson(μij ) Then P r(Yij = r) = μrij e−μij r! , r = 0, 1, 2, .... E(Yij ) = μij • Number of epileptic seizures in successive two week periods (example 1.6 and example 8.4 in DHLZ) V ar(Yij ) = μij where • Number of birds found at different locations along a transect μij = e−(β0+β1X1ij +...+βkXkij ) The standard assumption is that the Yij are all independent. In longitudinal studies, counts observed on the same subject at different points in time may be correlated. 632 633 • Explanatory variables X1= 0 for placebo 1 for progabide treatment Epileptic seizure data X2= (Age in years) - 29 • Thall and Vail(1990), Biometrics 46, 657-671 • Table 1.5 and Example 8.5 in DHLZ • The subjects were 59 epileptics suffering from partial seizures who were randomized to treatment – anti-epileptic drug progabide (m1 = 31) – placebo (m2 = 28) 634 Time = 0,2,4,6,8 weeks • Repeated measures on the i-th patient Yi0 = 0.25(Number of seizures during 8 weeks prior to start of treatment) Yi1 = Number of seizures during weeks 1 and 2 after start of treatment Yi2 = Number of seizures during weeks 3 and 4 after start of treatment Yi3 = Number of seizures during weeks 5 and 6 after start of treatment Yi4 = Number of seizures during weeks 7 and 8 after start of treatment 635 /* Use the GEE option in PROC GENMOD to fit a Poisson regression model to the epileptic seizure data from Thall and Vail(1990). */ data set1; infile ’c:\stat565\seizures.dat’; input y1-y4 trt base age; patient = _N_; age=(age-29); y0 = base/4; z = 1; run; proc format; value trt 0 = ’Placebo’ 1 = ’Progabide’; /* Modify the data file to put repeated measures on different lines. Also create a time variable. */ data set2; set set1; y = y0; time=0; xt=time; y = y1; time=2; xt=time; y = y2; time=4; xt=time; y = y3; time=6; xt=time; y = y4; time=8; xt=time; run; output; output; output; output; output; 636 proc sort data=set2; by trt time; run; proc means data=set2 n mean stderr noprint; by trt time; var y; output out=means mean=y; run; axis1 label=(f=swiss h=1.2 a=90 r=0 "Seizures per 2 weeks") order = 5 to 10 by 1 length= 4.5in value=(f=swiss h=1.0); axis2 label=(f=swiss h=1.2 "Time(weeks)") order = 0 to 8 by 2 value=(f=swiss h=1.0) w= 4.0 length = 6. in; SYMBOL1 V=dot H=1.5 w=4 l=1 i=join cv=black; SYMBOL2 V=circle H=1.5 w=4 l=3 i=join cv=black; PROC GPLOT DATA=means; PLOT y*time=trt / vaxis=axis1 haxis=axis2; TITLE1 ls=0.4 H=2.0 F=swiss "Two-Week Seizure Rates"; legend frame across=2 down=3; footnote h=2 " "; format trt trt.; RUN; 637 638 The GENMOD Procedure Model Information /* Data Set Distribution Link Function Dependent Variable Observations Used Use PROC GENMOD in SAS to fit a log-linear model with no correlation among repeated measurements. Standard errors are based on independent Poisson counts. */ proc genmod data=set2; class patient; model y = z time trt time*time age trt*age / noint dist=poisson link=log covb itprint converge=.0000001 maxit=50; run; Class Values patient 1 2 3 21 22 38 39 55 56 WORK.SET2 Poisson Log y 295 Class Level Information Class patient Levels 59 4 5 6 23 24 40 41 57 58 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 26 27 28 29 30 31 32 33 34 35 36 37 42 43 44 45 46 47 48 49 50 51 52 53 54 59 Parameter Effect Prm1 Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Intercept z time trt time*time age trt*age 639 640 Criteria For Assessing Goodness Of Fit Criterion Iteration History For Parameter Estimates Iter 0 Ridge 0 Log Likelihood 2139.1864 1 0 2623.44379 2 0 2654.9526 3 0 2655.22079 4 0 2655.22082 5 0 2655.22082 Prm1 Prm4 Prm7 Prm2 Prm5 Prm3 Prm6 0 0.0592947 -0.078664 0 -0.017065 -0.065618 0 -0.070177 -0.057241 0 -0.07897 -0.056049 0 -0.079092 -0.056035 0 -0.079092 -0.056035 2.2871859 -0.023962 0.2123583 0.0290219 2.0638572 -0.016116 0.1298421 0.0247138 2.0551993 -0.011475 0.0814555 0.0222975 2.0595697 -0.01089 0.0754608 0.0220365 2.0596283 -0.010885 0.0754051 0.0220349 2.0596283 -0.010885 0.0754051 0.0220349 Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X2 Log Likelihood DF Value Value/DF 289 289 289 289 2661.2980 2661.2980 4154.2822 4154.2822 2655.2208 9.2086 9.2086 14.3747 14.3747 Analysis Of Parameter Estimates 641 Parameter DF Estimate Intercept z time trt time*time age trt*age Scale 0 1 1 1 1 1 1 0 0.0000 2.0596 0.0754 -0.0791 -0.0109 0.0220 -0.0560 1.0000 Standard Wald 95% Error Confidence Limits 0.0000 0.0488 0.0257 0.0422 0.0031 0.0050 0.0063 0.0000 Parameter Intercept z time trt time*time age trt*age Scale 0.0000 1.9640 0.0251 -0.1618 -0.0170 0.0122 -0.0684 1.0000 0.0000 2.1552 0.1257 0.0036 -0.0048 0.0319 -0.0437 1.0000 Pr > ChiSq . <.0001 0.0033 0.0608 0.0004 <.0001 <.0001 642 ChiSquare . 1783.69 8.62 3.52 12.34 19.34 79.00 Criteria For Assessing Goodness Of Fit Criterion /* Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X2 Log Likelihood Use PROC GENMOD in SAS to obtain GEE estimates of coefficients in a log-linear model with an exchangeable correlation structure for repeated measurements. Standard errors are based on this correlation structure. Results for a robust covariance estimator are also provided. */ DF Value Value/DF 289 289 289 289 2661.2980 2661.2980 4154.2822 4154.2822 2655.2208 9.2086 9.2086 14.3747 14.3747 Analysis Of Initial Parameter Estimates proc genmod data=set2; class patient; model y = z time trt time*time age trt*age / noint dist=poisson link=log covb itprint converge=.0000001 maxit=50; repeated subject=patient / type=exch modelse covb corrw; run; Parameter DF Estimate Intercept z time trt time*time age trt*age Scale 0 1 1 1 1 1 1 0 0.0000 2.0596 0.0754 -0.0791 -0.0109 0.0220 -0.0560 1.0000 Standard Wald 95% Error Confidence Limits 0.0000 0.0488 0.0257 0.0422 0.0031 0.0050 0.0063 0.0000 0.0000 1.9640 0.0251 -0.1618 -0.0170 0.0122 -0.0684 1.0000 Parameter Pr > ChiSq Intercept z time trt time*time age trt*age Scale . <.0001 0.0033 0.0608 0.0004 <.0001 <.0001 0.0000 2.1552 0.1257 0.0036 -0.0048 0.0319 -0.0437 1.0000 643 ChiSquare . 1783.69 8.62 3.52 12.34 19.34 79.00 644 GEE Model Information The GENMOD Procedure Correlation Structure Subject Effect Number of Clusters Correlation Matrix Dimension Maximum Cluster Size Minimum Cluster Size Exchangeable patient (59 levels) 59 5 5 5 The GENMOD Procedure Covariance Matrix (Model-Based) Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Prm2 Prm3 Prm4 0.05570 -0.004227 -0.04823 0.0004383 -0.000861 0.0008610 -0.004227 0.002676 -3.86E-18 -0.000310 5.325E-19 -4.72E-19 -0.04823 -3.86E-18 0.09807 4.374E-19 0.0008610 0.001240 Iteration History For GEE Parameter Estimates Covariance Matrix (Model-Based) Iter 0 1 2 3 Prm1 Prm6 Prm2 Prm7 Prm3 Prm4 0 0.0220349 0 0.0270354 0 0.0270001 0 0.0269995 2.0596283 -0.056035 2.0570934 -0.062262 2.0566815 -0.062231 2.0566817 -0.06223 0.0754051 -0.079092 -0.010885 0.0754012 -0.079228 -0.010884 0.0754012 -0.078869 -0.010884 0.0754012 -0.078869 -0.010884 645 Prm5 Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Prm5 Prm6 Prm7 0.0004383 -0.000310 4.374E-19 0.0000391 -6.44E-20 5.496E-20 -0.000861 5.325E-19 0.0008610 -6.44E-20 0.001362 -0.001362 0.0008610 -4.72E-19 0.001240 5.496E-20 -0.001362 0.002173 646 Working Correlation Matrix Covariance Matrix (Empirical) Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Prm2 Prm3 Prm4 0.02772 -0.000767 -0.03698 0.0000715 0.0007528 -0.000076 -0.000767 0.002171 0.002781 -0.000225 0.0002396 -0.000645 -0.03698 0.002781 0.08632 -0.000284 -0.001074 -0.001856 Row1 Row2 Row3 Row4 Row5 Prm5 Prm6 Prm7 0.0000715 -0.000225 -0.000284 0.0000254 -0.000031 0.0000760 0.0007528 0.0002396 -0.001074 -0.000031 0.0007463 -0.000746 -0.000076 -0.000645 -0.001856 0.0000760 -0.000746 0.001410 Intercept z time trt time*time age trt*age Analysis Of GEE Parameter Estimates Model-Based Standard Error Estimates Intercept z time trt time*time age trt*age Scale 0.0000 2.0567 0.0754 -0.0789 -0.0109 0.0270 -0.0622 3.7927 0.0000 0.2360 0.0517 0.3132 0.0063 0.0369 0.0466 . 95% Confidence Limits 0.0000 1.5941 -0.0260 -0.6927 -0.0231 -0.0453 -0.1536 . 0.0000 2.5192 0.1768 0.5349 0.0014 0.0993 0.0291 . 0.7212 1.0000 0.7212 0.7212 0.7212 Col3 Col4 Col5 0.7212 0.7212 1.0000 0.7212 0.7212 0.7212 0.7212 0.7212 1.0000 0.7212 0.7212 0.7212 0.7212 0.7212 1.0000 Z . 8.71 1.46 -0.25 -1.74 0.73 -1.34 . Pr>|Z| . <.0001 0.1450 0.8012 0.0819 0.4644 0.1819 . NOTE: The scale parameter for GEE estimation was computed as the square root of the normalized Pearson’s chi-square. 649 0.0000 2.0567 0.0754 -0.0789 -0.0109 0.0270 -0.0622 Standard Error 0.0000 0.1665 0.0466 0.2938 0.0050 0.0273 0.0376 95% Confidence Limits 0.0000 1.7304 -0.0159 -0.6547 -0.0208 -0.0265 -0.1358 0.0000 2.3830 0.1667 0.4970 -0.0010 0.0805 0.0114 Z . 12.35 1.62 -0.27 -2.16 0.99 -1.66 648 /* Standard Error 1.0000 0.7212 0.7212 0.7212 0.7212 Parameter Estimate 647 Parameter Estimate Col2 Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Covariance Matrix (Empirical) Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Col1 Use PROC GENMOD in SAS to obtain GEE estimates of coefficients in a log-linear model with an autocorrelation correlation structure for repeated measurements. Standard errors are based on the model and also on a robust covariance estimator. */ proc genmod data=set2; class patient; model y = z time trt time*time age trt*age / noint dist=poisson link=log covb obstats itprint converge=.0000001 maxit=50; repeated subject=patient / type=ar(1) modelse covb corrw; run; 650 Pr>|Z| . <.0001 0.1056 0.7884 0.0307 0.3230 0.0975 GEE Model Information Correlation Structure Subject Effect Number of Clusters Correlation Matrix Dimension Maximum Cluster Size Minimum Cluster Size AR(1) patient (59 levels) 59 5 5 5 The GENMOD Procedure Covariance Matrix (Model-Based) Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Iteration History For GEE Parameter Estimates Iter Prm1 Prm6 Prm2 Prm7 0 0 0.0220349 0 0.0174423 0 0.0179254 0 0.0179171 0 0.0179172 2.0596283 -0.056035 2.044588 -0.051091 2.0447496 -0.05163 2.0447376 -0.051621 2.0447377 -0.05162 Prm3 Prm4 Prm5 0.0754051 -0.079092 -0.010885 0.0836068 -0.068481 -0.011834 0.0835014 -0.069229 -0.011822 0.0835029 -0.069209 -0.011822 0.0835029 -0.069209 -0.011822 Prm2 Prm3 Prm4 0.05623 -0.004009 -0.04661 0.0002306 -0.000429 0.0004291 -0.004009 0.002898 -1.05E-17 -0.000293 1E-19 -1.58E-19 -0.04661 -1.05E-17 0.09429 1.179E-18 0.0004291 0.001503 Covariance Matrix (Model-Based) 1 2 3 4 Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Prm5 Prm6 Prm7 0.0002306 -0.000293 1.179E-18 0.0000372 -9.19E-21 1.653E-20 -0.000429 1E-19 0.0004291 -9.19E-21 0.001353 -0.001353 0.0004291 -1.58E-19 0.001503 1.653E-20 -0.001353 0.002124 651 652 Working Correlation Matrix Covariance Matrix (Empirical) Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Prm2 Prm3 Prm4 0.02811 -0.001517 -0.03616 0.0001492 0.0005895 0.0001351 -0.001517 0.002852 0.004495 -0.000303 0.0003495 -0.000852 -0.03616 0.004495 0.07461 -0.000415 -0.000902 -0.001433 Row1 Row2 Row3 Row4 Row5 Col1 Col2 Col3 1.0000 0.8122 0.6597 0.5359 0.4353 0.8122 1.0000 0.8122 0.6597 0.5359 0.6597 0.8122 1.0000 0.8122 0.6597 Col4 Col5 0.5359 0.6597 0.8122 1.0000 0.8122 0.4353 0.5359 0.6597 0.8122 1.0000 Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Covariance Matrix (Empirical) Prm2 Prm3 Prm4 Prm5 Prm6 Prm7 Prm5 Prm6 Prm7 0.0001492 -0.000303 -0.000415 0.0000334 -0.000043 0.0000923 0.0005895 0.0003495 -0.000902 -0.000043 0.0006165 -0.000617 0.0001351 -0.000852 -0.001433 0.0000923 -0.000617 0.001190 653 Parameter Estimate Intercept z time trt time*time age trt*age 0.0000 2.0447 0.0835 -0.0692 -0.0118 0.0179 -0.0516 Standard Error 0.0000 0.1676 0.0534 0.2731 0.0058 0.0248 0.0345 95% Confidence Limits 0.0000 1.7162 -0.0212 -0.6046 -0.0232 -0.0307 -0.1192 0.0000 2.3733 0.1882 0.4662 -0.0005 0.0666 0.0160 Z . 12.20 1.56 -0.25 -2.04 0.72 -1.50 654 Pr>|Z| . <.0001 0.1179 0.8000 0.0409 0.4705 0.1346 Random effects models for discrete data • Natural extension of models for Gaussian continuous responses Analysis Of GEE Parameter Estimates Model-Based Standard Error Estimates Parameter Estimate Intercept z time trt time*time age trt*age Scale Standard Error 0.0000 2.0447 0.0835 -0.0692 -0.0118 0.0179 -0.0516 3.7975 0.0000 0.2371 0.0538 0.3071 0.0061 0.0368 0.0461 . 95% Confidence Limits 0.0000 1.5800 -0.0220 -0.6711 -0.0238 -0.0542 -0.1420 . • Now, denote random effects as Ui (instead of bi) Z 0.0000 2.5095 0.1890 0.5326 0.0001 0.0900 0.0387 . . 8.62 1.55 -0.23 -1.94 0.49 -1.12 . Pr>|Z| . <.0001 0.1208 0.8217 0.0526 0.6262 0.2627 . General model • responses yij have conditional density f (yij |Ui) = exp[{(yij θij −ψ(θij ))}/φ+c(yij , φ)] • with conditional moments NOTE: The scale parameter for GEE estimation was computed as the square root of the normalized Pearson’s chi-square. μij = E[Yij |Ui] = ψ (θij ) and νij = V ar(Yij |Ui) = ψ (θij )φ • h(μij ) = xij β + dij Ui and νij = ν(μij )φ 655 656 Interpretation • The random effects Ui are independent with common distribution F (typically normal): this accounts for heterogeneity across individuals • For linear links with Gaussian responses, coefficients in marginal and random effects models are interpreted in the same way • Can make inferences about individuals (as opposed to the population) • For nonlinear links, this is no longer the case Example: logistic model for binary data ⎛ μ ⎞ ij ⎠ h(μij ) = log ⎝ 1−μ = β0 + β1Xij + Ui ij νij = V ar(Yij |Ui) = μij (1 − μij ) • To obtain the marginal model from the conditional model, the following integral must be evaluated (for binary data with random intercept) P (Yij = 1) = P (Yij = 1|Ui)dF (Ui) Correlation and extra-variation are introduced by random effects = 657 exp(β0,re + Ui + β1,rexij ) 1 + exp(β0,re + Ui + β1,rexij ) f (Ui; v 2) 658 • Let βm and βre denote the coefficients from the marginal and random effects models, respectively Model Fitting for random effects models • can use ML or REML (see Laird and Ware, 1982) • |Bk,m| ≤ |βk,re| for k = 1, . . . , p • This discrepancy V ar(Ui) increases with • Example: logistic regression with a random intercept (Ui ∼ N (0, v 2), then βm = (c2 v 2 + 1)−1/2 βre where c2 ≈ .346) • Integrate out the random effects to obtain the marginal likelihood: This usually cannot be done analytically • Typical approaches – Approximate likelihood [Laplace approximation]: Breslow and Clayton, 1993 JASA, Goldstein and Rasbash, 1996, JRSS-A • Alternatives: Marginalized multi-level models (Heagerty and Zeger, 2000, Statistical Science; Heagerty, 2000 Biometrics: marginal regression coefficients with random effects – Gibbs sampling: Zeger and Karim, 1991 JASA; Daniels and Gatsonis, 1999 JASA 659 /* /* Use GLIMMIX in SAS to fit a Poisson regression model with random subject effects to the epileptic seizure data from Thall and Vail(1990). */ 660 /* Modify the data file to put repeated measures on different lines. Also create a time variable. */ data set2; set set1; y = y0; time=0; xt=time; y = y1; time=2; xt=time; y = y2; time=4; xt=time; y = y3; time=6; xt=time; y = y4; time=8; xt=time; run; This program is stored in the file seizglmm.sas */ data set1; infile ’c:\stat565\seizures.dat’; input y1-y4 trt base age; patient = _N_; age=(age-29); y0 = base/4; z = 1; run; /* output; output; output; output; output; Use the GLIMMIX macro to include random patient effects */ %inc ’c:\stat565\sas\glmm800.sas’ / nosource; run; proc format; value trt 0 = ’Placebo’ 1 = ’Progabide’; proc print data=set1; run; 662 663 Two-Week Seizure Rates /* The Mixed Procedure Log-linear models with fixed and random effects */ Model Information %glimmix(data=set2, stmts=%str( class patient; model y = trt time time*time age age*trt / solution; random intercept / type=un subject=patient;), error=poisson, link=log, converge=1e-8, maxit=20, out=setp ) run; Data Set Dependent Variable Weight Variable Covariance Structure Subject Effect Estimation Method Residual Variance Method Fixed Effects SE Method Degrees of Freedom Method WORK._DS _z _w Unstructured patient REML Profile Model-Based Containment Class Level Information Class Levels patient Values 59 1 2 3 14 15 24 25 34 35 44 45 54 55 proc print data=setp (obs=5); run; 4 5 6 16 17 26 27 36 37 46 47 56 57 7 8 9 18 19 28 29 38 39 48 49 58 59 10 20 30 40 50 664 11 21 31 41 51 12 22 32 42 52 13 23 33 43 53 665 Solution for Fixed Effects Parameter Search CovP1 CovP2 Variance 0.6749 2.1112 2.1112 Res Log Like -355.0676 Iteration History Iteration Evaluations -2 Res Log Like Criterion 1 1 710.13514506 0.00000000 Convergence criteria met. Effect Estimate Standard Error DF t Value Pr>|t| Intercept trt time time*time age trt*age 1.7740 -0.1650 0.07541 -0.01088 0.009338 -0.02939 0.1739 0.2282 0.03732 0.004501 0.02871 0.03395 55 234 234 234 234 234 10.20 -0.72 2.02 -2.42 0.33 -0.87 <.0001 0.4706 0.0445 0.0164 0.7453 0.3875 Type 3 Tests of Fixed Effects Effect Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) Residual patient 0.6749 2.1112 trt time time*time age trt*age Num DF Den DF F Value Pr > F 1 1 1 1 1 234 234 234 234 234 0.52 4.08 5.85 0.11 0.75 0.4706 0.0445 0.0164 0.7453 0.3875 GLIMMIX Model Statistics Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) Description 710.1 714.1 714.2 718.3 666 Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson Chi-Square Extra-Dispersion Scale Value 556.5463 263.6195 507.7219 240.4929 2.1112 667 Obs 1 2 3 4 5 _y _offset _wght _orig 2.75 5.00 3.00 3.00 3.00 Obs StdErr Pred 1 2 3 4 5 0.31910 0.31521 0.31591 0.31528 0.31966 0 0 0 0 0 1 1 1 1 1 DF Alpha 234 234 234 234 234 0.05 0.05 0.05 0.05 0.05 Obs lowereta uppereta 1 2 3 4 5 0.63153 0.74648 0.76529 0.69965 0.53706 Obs 1 2 3 4 5 1.88890 1.98850 2.01008 1.94195 1.79661 y y y y y y time age trt patient 2.75 5.00 3.00 3.00 3.00 0 2 4 6 8 2 2 2 2 2 0 0 0 0 0 Lower Upper Resid 0.63153 0.74648 0.76529 0.69965 0.53706 1.88890 1.98850 2.01008 1.94195 1.79661 -0.22012 0.27373 -0.25104 -0.19923 -0.06595 mu dmu 3.52619 3.92548 4.00556 3.74641 3.21182 3.52619 3.92548 4.00556 3.74641 3.21182 1 1 1 1 1 Pred 1.26022 1.36749 1.38768 1.32080 1.16684 etam stderreta 1.26022 1.36749 1.38768 1.32080 1.16684 0.3191 0.3152 0.3159 0.3153 0.3197 stderrmu lowermu uppermu 1.12522 1.23735 1.26541 1.18117 1.02668 /* This program uses features in the SAS GLIMMIX macro to fit logistic regression models to nest predation data from a split-plot experiment. The code is stored in the file nestglmm.sas */ 1.88049 2.10956 2.14961 2.01305 1.71098 6.61211 7.30457 7.46391 6.97232 6.02917 var resraw reschi deta _w _z 3.52619 3.92548 4.00556 3.74641 3.21182 -0.77619 1.07452 -1.00556 -0.74641 -0.21182 -0.28448 0.37325 -0.34579 -0.26540 -0.08134 0.28359 0.25475 0.24965 0.26692 0.31135 3.52619 3.92548 4.00556 3.74641 3.21182 1.04010 1.64122 1.13664 1.12156 1.10089 data set1; infile ’c:\stat565\nestall.dat’; input wshed $ loc $ round roadside $ roadtype $ adjhab $ rdepth rwidth foreback percip mtemp ftotal floss btotal bloss ctotal closs; if(roadtype=’grav’) then x1=1; else x1=0; if(adjhab=’nrc’) then x2=1; else x2=0; if(roadside=’nfen’) then x3=1; else x3=0; if(roadside=’wd’) then x4=1; else x4=0; x5=x1*x2; x6=x2*x4; x7=x1*x3; slope=0; loss=floss; total=ftotal; output; slope=1; loss=bloss; total=btotal; output; keep wshed loc x1 x2 x3 x4 x5 x6 x7 slope loss total; run; 668 /* Fit a model using GLIMMIX to include random location effects */ %inc ’c:\stat565\sas\glmm800.sas’ / nosource; run; 669 Data Set Dependent Variable Weight Variable Covariance Structure Subject Effect Estimation Method Residual Variance Method Fixed Effects SE Method Degrees of Freedom Method /* Logistic regression with random effects for transect and sub-transects */ Class Level Information Class %glimmix(data=set1, stmts=%str( class loc ; model loss/total = x1 x2 x3 x4 x5 x6 x7 slope / solution ddfm=kr; random intercept slope / subject=loc g gcorr;), error=binomial, link=logit, converge=1e-8, maxit=20, out=setp ) run; proc print data=setp (obs=5); run; 670 WORK._DS _z _w Variance Components loc REML Profile Prasad-Rao-JeskeKackar-Harville Kenward-Roger loc Levels 136 Values 1 10 100 101 102 103 104 105 106 107 108 109 11 110 111 112 . . . . . . . . . . 92 93 94 95 96 97 98 99 Dimensions Covariance Parameters Columns in X Columns in Z Per Subject Subjects Max Obs Per Subject Observations Used Observations Not Used Total Observations 3 9 2 136 2 272 0 272 671 Fit Statistics Parameter Search CovP1 CovP2 CovP3 Variance Res Log Like -2 Res Log Like 0.9868 1.2259 0.7644 0.7644 -521.3338 1042.6675 -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) Solution for Fixed Effects Iteration History Iteration Evaluations -2 Res Log Like 1 1042.66752512 Convergence criteria met. 1 2 Effect loc Intercept slope 1 1 Col1 Intercept x1 x2 x3 x4 x5 x6 x7 slope Col2 0.9868 1.2259 Subject Intercept slope Residual loc loc -1.7023 0.04349 -0.9033 0.1095 -0.3174 0.4538 1.5878 -1.0348 0.7507 0.3948 0.4363 0.6029 0.6688 0.4968 0.6623 0.6449 0.8064 0.1664 DF t Value 125 113 140 119 126 133 115 135 144 -4.31 0.10 -1.50 0.16 -0.64 0.69 2.46 -1.28 4.51 GLIMMIX Model Statistics Description Covariance Parameter Estimates Cov Parm Standard Error _y 0.0 0.0 0.2 0.5 0.2 _offset 0 0 0 0 0 _wght 5 5 5 4 5 _orig y y y y y Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson Chi-Square Extra-Dispersion Scale 0.9868 1.2259 0.7644 loss 0 0 1 2 1 Upper -1.83461 -1.06390 -1.59365 -0.82062 -1.14603 Obs 1 2 3 4 5 mu 0.05551 0.11416 0.07799 0.15645 0.17951 Resid -1.05877 -1.12887 1.69678 2.60307 0.13908 dmu 0.05243 0.10113 0.07191 0.13198 0.14729 eta -2.83406 -2.04892 -2.46998 -1.68485 -1.51964 stderrmu 0.026612 0.050590 0.032003 0.057926 0.027944 total 5 5 5 4 5 stderreta 0.50758 0.50026 0.44506 0.43891 0.18972 lowermu 0.02118 0.04592 0.03402 0.07249 0.13088 Value x1 1 1 1 1 1 x2 1 1 0 0 0 uppermu 0.13769 0.25656 0.16887 0.30563 0.24121 lowereta -3.83350 -3.03393 -3.34632 -2.54907 -1.89324 x3 1 1 1 1 0 x4 0 0 0 0 0 x5 1 1 0 0 0 uppereta -1.83461 -1.06390 -1.59365 -0.82062 -1.14603 var resraw 0.05243 -0.05551 0.10113 -0.11416 0.07191 0.12201 0.13198 0.34355 0.14729 0.02049 674 163.1343 213.4204 107.3120 140.3909 0.7644 673 StdErr Obs x6 x7 slope loc Pred Pred DF Alpha Lower 1 0 1 0 1 -2.83406 0.50758 263.000 0.05 -3.83350 2 0 1 1 1 -2.04892 0.50026 263.000 0.05 -3.03393 3 0 1 0 2 -2.46998 0.44506 263.000 0.05 -3.34632 4 0 1 1 2 -1.68485 0.43891 263.000 0.05 -2.54907 5 0 0 0 3 -1.51964 0.18972 257.500 0.05 -1.89324 Obs 1 2 3 4 5 <.0001 0.9208 0.1363 0.8702 0.5241 0.4944 0.0153 0.2016 <.0001 Estimate 672 Obs 1 2 3 4 5 Pr>|t| 0.00000000 Estimated G Matrix Row Estimate Criterion Effect 1 1042.7 1048.7 1048.8 1057.4 /* Logistic regression with heterogeneous extra-binomial variance adjustments for the foreslope and backslope*/ %glimmix(data=set1, stmts=%str( class loc ; model loss/total = x1 x2 x3 x4 x5 x6 x7 slope / solution ddfm=kr; repeated / type=un subject=loc r rcorr; ), error=binomial, link=logit, converge=1e-8, maxit=20, out=setp ) run; proc print data=setp (obs=5); run; 675 Model Information Data Set Dependent Variable Weight Variable Covariance Structure Subject Effect Estimation Method Residual Variance Method Fixed Effects SE Method Degrees of Freedom Method WORK._DS _z _w Unstructured loc REML None Prasad-Rao-JeskeKackar-Harville Kenward-Roger Parameter Search CovP1 CovP2 CovP3 Res Log Like 1.6352 0.5326 2.1321 -499.0835 Iteration Evaluations Class Level Information loc Levels 136 998.1670 Iteration History 1 Class -2 Res Log Like Values -2 Res Log Like Criterion 1 998.16698466 Convergence criteria met. 0.00000000 Estimated R Matrix for loc 1/Weighted by _w 1 10 100 101 102 103 104 105 106 107 108 109 11 110 111 112 113 114 115 116 117 118 119 12 . . . . . . . . . . 92 93 94 95 96 97 98 99 Row Col1 Col2 1 2 6.2378 1.4630 1.4630 4.2165 Dimensions Covariance Parameters Columns in X Columns in Z Subjects Max Obs Per Subject Observations Used Observations Not Used Total Observations Estimated R Correlation Matrix for loc 1/Weighted by _w 3 9 0 136 2 272 0 272 Row Col1 Col2 1 2 1.0000 0.2853 0.2853 1.0000 676 677 Solution for Fixed Effects Effect Intercept x1 x2 x3 x4 x5 x6 x7 slope Covariance Parameter Estimates Cov Parm Subject UN(1,1) UN(2,1) UN(2,2) loc loc loc Estimate 1.6352 0.5326 2.1321 Estimate Standard Error DF t Value Pr>|t| -1.5433 0.0237 -1.0199 0.1615 -0.2280 0.6558 1.3425 -1.1119 0.7851 0.3442 0.3741 0.5694 0.5682 0.4304 0.6130 0.5561 0.7288 0.1572 191 158 251 154 181 222 165 218 183 -4.48 0.06 -1.79 0.28 -0.53 1.07 2.41 -1.53 4.99 <.0001 0.9496 0.0745 0.7766 0.5969 0.2858 0.0169 0.1285 <.0001 Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) 998.2 1004.2 1004.3 1012.9 GLIMMIX Model Statistics Description Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson Chi-Square Extra-Dispersion Scale 678 Value 546.8712 546.8712 491.4987 491.4987 1.0000 679 Obs 1 2 3 4 5 _y _offset _wght _orig 0.0 0.0 0.2 0.5 0.2 0 0 0 0 0 5 5 5 4 5 loss total x1 y y y y y 0 0 1 2 1 5 5 5 4 5 1 1 1 1 1 x2 x3 x4 x5 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 0 0 0 StdErr Obs x6 x7 slope loc Pred Pred DF Alpha Lower 1 0 1 0 1 -2.83406 0.50758 263.000 0.05 -3.83350 2 0 1 1 1 -2.04892 0.50026 263.000 0.05 -3.03393 3 0 1 0 2 -2.46998 0.44506 263.000 0.05 -3.34632 4 0 1 1 2 -1.68485 0.43891 263.000 0.05 -2.54907 5 0 0 0 3 -1.51964 0.18972 257.500 0.05 -1.89324 Obs 1 2 3 4 5 Upper -1.83461 -1.06390 -1.59365 -0.82062 -1.14603 Obs 1 2 3 4 5 mu 0.05551 0.11416 0.07799 0.15645 0.17951 Resid -1.05877 -1.12887 1.69678 2.60307 0.13908 dmu 0.05243 0.10113 0.07191 0.13198 0.14729 eta -2.83406 -2.04892 -2.46998 -1.68485 -1.51964 stderrmu 0.026612 0.050590 0.032003 0.057926 0.027944 stderreta 0.50758 0.50026 0.44506 0.43891 0.18972 lowermu 0.02118 0.04592 0.03402 0.07249 0.13088 uppermu 0.13769 0.25656 0.16887 0.30563 0.24121 lowereta -3.83350 -3.03393 -3.34632 -2.54907 -1.89324 uppereta -1.83461 -1.06390 -1.59365 -0.82062 -1.14603 What if you ignore correlation? • Incorrect inferences about β ’s • Inefficient estimates of β ’s • IF YOU DON’T MODEL IT CORRECTLY, AT LEAST ’ADJUST’ FOR IT var resraw 0.05243 -0.05551 0.10113 -0.11416 0.07191 0.12201 0.13198 0.34355 0.14729 0.02049 680 Should I spend much effirt on modelling the covariance structure? • Situation 1: regression of y on x is the main focus and m >> n: invest majority of time modelling the mean, less time on correlation (use robust methods) • Situation 2: correlation of prime interest or m small or subject specific prediction is very important (for example, subject-specific curves): need mean and covariance models approximately correct 682 681