I. Overview - Survey, Statistics & Psychometrics

Department of Statistics Introduction to Modeling Change Over Time with Generalized Mixed Models using SAS PROC GLIMMIX A Short Course – 14 May 2007 Instructor: Walt Stroup, Ph.D. Professor & Chair, UNL Department of Statistics Department of Statistics Outline of ShortCourse (G/C = Growth/Change Model) 1. Introduction a. motivating examples b. Social Science HLM-speak vs. BioStat GLMM-speak 2. GLMM / HLM a. essential background b. recurring modeling issues 3. SAS / GLIMMIX syntax 4. G/C Models - 1st part of the picture: Factorial trt designs a. with various error structures & distributions b. with repeated measures & correlated errors 5. G/C Models - 2nd part of the picture: Random Effects issues a. random coefficients b. prediction vs. estimation 6. G/C Models – 3rd part of the picture - GLM issues:  Binary, count, rate, zero-inflated models 7. Power & Planning 8. Nonlinear mixed models 14 May 2007 SSP Core Facility 2 Department of Statistics Recurring Themes  “Mixed Model” Issues − fixed or random? − error terms – which one & are they correlated? − std error & d.f. − prediction or estimate? (“inference space”)  “GLM” Issues − what distribution?  incl “is it really a distribution & does it matter”? − what link – “data” vs “model” scale? − overdispersion − computational issues 14 May 2007 SSP Core Facility 3 Department of Statistics Recurring Themes  George Bernard Shaw: “America and England are two peoples separated by a common language.” picture of GB Shaw  Generalized Mixed Models have − AgStat-speak − BioStat-speak − Social/Behavioral Science Stat (HLM) speak  One goal: serve as translator 14 May 2007 SSP Core Facility 4 Department of Statistics I. Introduction  General considerations for modeling  Several examples illustrating generalized and mixed models  Typology of models  Background theory  Decision chart to match model with software available in SAS 14 May 2007 SSP Core Facility 5 Department of Statistics General Model considerations  A Model is a description of the components of an observation  observation = systematic + random  Nelder: random = ephemeral + noise or random=random model + random error  Alternative: random = design components + remaining variation  “All models are wrong but some are useful” – G.E.P Box 14 May 2007 SSP Core Facility 6 Department of Statistics General Mixed Model Setting  Y is vector of responses (observable)  u is vector of random (design induced) effects [not (directly) observable]  relevant distributions o Y|u ~ fC (  , R ) o u ~ fR ( 0, G ) Inexact (but useful) •HLM level 1 •Biostat – subject-specific •Level  2  Model is of conditional mean of Y|u   E (Y | u ) h( X ,  , Z , u ) 14 May 2007 SSP Core Facility 7 Department of Statistics Typology of Models Type Mean Model Distribution NLMM h(X,,Z,u) y|u general, u normal ** GLMM h(X+Zu) y|u general, u normal * LMM X+Zu u, y|u normal NLM h(X,) y normal GLM h(X) y general LM X y normal * for PROC GLIMMIX ** for this course 19-20 Oct 2006 (G/N)LMM can be more general GLIMMIX Short Course for Procter & Gamble 8 Department of Statistics Example 1 Random Effects Model       Data: Output 4.1, p. 94, SAS for Linear Models, 4th ed. 20 packages of ground beef 3 samples per package 2 counts per sample response variable: microbial count response = mean + sample + count + error i.e. observation = systematic + random model + error 14 May 2007 SSP Core Facility 9 Department of Statistics Model for Example 1 yijk    pi  s ( p )ij  eijk i  1, 2,..., 20; j  1, 2,3; pi i.i.d. N (0,  P2 ); k  1, 2 s ( p )ij i.i.d. N (0,  S2 ); eijk i.i.d. N (0,  2 ) yijk is observation [ log(count) ]  is overall mean (systematic / fixed) pi, s(p)ij are random model effects eijk is random error Convention: fixed Greek; random Latin 14 May 2007 SSP Core Facility 10 Department of Statistics Hierarchical Levels classroom Level 2 students Level 1 size school Level 3 14 May 2007 SSP Core Facility level small 1 medium 2 large 3 11 Department of Statistics Hierarchical Level to Statistical Model classroom students school yijk  k th student, j th classroom, i th school yijk  mean  school  classroom  student yijk    si  c( s)ij  eijk Level 1 (student): yijk   0ij  eijk  0ij    si  c( s)ij Level 3 Level 2 (classroom): yijk   0i  c( s )ij  eijk GLIMMIX-speak HLM-speak  0i    si 14 May 2007 SSP Core Facility 12 Department of Statistics Modeling Issues 1. Estimate i2’s 2. Estimate, standard error, and interval estimate of  3. Estimates of package, sample effects 4. a.k.a. Estimates of school and classroom effects 14 May 2007 SSP Core Facility 13 Department of Statistics Singer: HLM to MIXED  Unconditional means model Radenbush & Byrk (2002) yij   0 j  rij  0 j   00  u0 j rij ~ N  0, 2 GLIMMIX  yij    ai  eij u0 j ~ N  0, 00  ai ~ N  0, A2  eij ~ N  0, 2   Include Level 2 Covariate one-way random effects model "HLM-speak"  0 j   00   01  MEANSES j  u0 j  yij   00   01  MEANSES j  u0 j  rij "GLIMMIX-speak" yij    1 X j  s j  eij 14 May 2007 SSP Core Facility 14 Department of Statistics Example 2 Blocking & Multi-Location  Data: SAS for Linear Models: Output 3.7, discussed as mixed model in section 4.3; Output 11.30; SAS for Mixed Models, 2nd ed. Section 6.6  Output 11.30 discussed here  3 treatments  8 locations  location represent a population  3-12 blocks depending on location  response = trt + loc + blk(loc) + trtloc + error i.e. observation = systematic+random model+error 14 May 2007 SSP Core Facility 15 Department of Statistics Example 2 framed by Extending School / Classroom Example classroom students school Treatment classroom students school 14 May 2007 Treatment SSP Core Facility 16 Department of Statistics Model with Treatment classroom students Treatment school yijkl    trt  school (trt )  classroom( school )  student yijkl     i  s ( )ij  c( s, )ijk  eijkl Level 1: yijkl   0ijk  eijkl Level 2: yijkl   0ij  c( s, )ijk  eijkl Level 3: between school model + trt as above 14 May 2007 SSP Core Facility 17 Department of Statistics Modeling Issues 1. Appropriate error term to test treatment 2. Standard error of treatment mean − (inference space) 3. Intra-block vs. inter-block analysis 14 May 2007 SSP Core Facility 18 Department of Statistics ANOVA (ignoring block) Source d.f. Expected Mean Square Treatment 2 2  2  k1 LT  QTRT Location 7 2  2  k1 LT  k2 L2 Loc  Trt 14 2  2  k1 LT error dfe 2 Test of TRT affected If Location fixed: 14 May 2007 Source d.f. Expected Mean Square Treatment 2  2  QTRT Location 7  2  QLOC Loc  Trt 14  2  QLT error dfe 2 SSP Core Facility 19 Department of Statistics Inference Space Assuming Locations are Fixed Var(trt mean)= 2 # obs/trt  Std. error(trt mean)= MS(error)  0.91 # obs/trt HOWEVER... if Locations are Random Var(trt mean)= 2  2  k ( L2   LT ) # obs/trt  Std. error(trt mean)= 14 May 2007 2 ˆ 2  k (ˆ L2  ˆ LT ) SSP Core Facility # obs/trt  3.62 20 Department of Statistics Where does Uncertainty Arise? Loc 1 Loc 2 Only from variation among obs within locations? Locations fixed Or does variation among locations also contribute? Locations random Loc 7 14 May 2007 Loc 8 SSP Core Facility 21 Department of Statistics Intra- vs. Inter-block analysis  Intra- (fixed) block analysis based only on within block treatment differences  Inter-block analysis also accounts for variance among blocks (random combines inter- and intra-)  Lead to equivalent tests when all treatments appear equally in each block  Not equivalent otherwise  In most cases, combined inter-/intra-block analysis is more efficient 14 May 2007 SSP Core Facility 22 Department of Statistics Example 3 Repeated Measures/Longitudinal  Data: SAS for Linear Models, Output 8.1; SAS for Mixed Models, Chapter 5  3 treatments (2 test drugs + placebo)  ni patients per treatment  8 times of measurement (1, 2, 3, ..., 8 hours post trt)  baseline measurement at time 0  response = trt + hour + trthour + pat(trt) + error observation = systematic + random model + error i.e.  Variations on this theme  are “latent growth models” 14 May 2007 SSP Core Facility 23 Department of Statistics Growth Models – Singer HLM-speak to GLIMMIX-speak HLM Unconditional Linear Growth Model GLIMMIX Level 1 (within individual) Level 1 Within subjects yij   0 j   1 j  timeij   rij rij ~ N  0,  2  Level 2 Between subjects  0 j   00  u0 j Level 2:  1 j  10  u1 j   0   00  01    u0 j   u  ~ MVN    ,    0 1 j 11        yij    00  u0 j    10  u1 j  timeij  rij    00  10  timeij    u0 j  u1 j  timeij  rij    between subject    within subjects    population-averaged    subject-specific  14 May 2007 PA SSP Core Facility SS 24 Department of Statistics Singer (1998)  Excellent paper translating HLM-speak to Proc Mixed  Uses Radenbusch & Byrk examples  Fair Warning to Readers, however – it’s dated − new features & output revisions in SAS − some of the output encouraged confusion or poor practice − specifics  revised output of Fit Statistics  Misleading output for variance estimates deleted  Kenward-Roger procedure for d.f. & std errors  I’ll update & make switch to Proc GMIMMIX 14 May 2007 SSP Core Facility 25 Department of Statistics Modeling Issues 1. Errors may be correlated a. May affect conclusions b. How to select covariance model 2. Denominator degrees of freedom 3. Bias in standard errors and test statistics 14 May 2007 SSP Core Facility 26 Department of Statistics Impact of Correlated Errors 14 May 2007 Covariance Model den df F-value Pr>F errors independent 483 7.11 <0.0001 errors correlated no structure (bias corrected) 69 (98.1) 4.06 (3.66) <0.0001 AR(1) 483 3.93 <0.0001 AR(1) bias corrected 424 3.89 <0.0001 SSP Core Facility 27 Department of Statistics Example 4      Data: SAS for Mixed Models, Section 14.5 2 treatment (Test Drug, Control) 8 clinics clinics represent a population nij subjects at jth location on ith treatment  response: favorable or unfavorable (fij = # fav)  response = trt + clinic + clinicloc + error = systematic + random model + error 14 May 2007 SSP Core Facility i.e. observation 28 Department of Statistics Modeling Issues 1. Response (fij / nij) is binomial, not normal 2. Response may not be linear in model parameters 3. Errors may not be additive 4. Variance of binomial & normal are different a. heterogeneous b. depends of location parameter 14 May 2007 SSP Core Facility 29 Department of Statistics Generalized Linear Mixed Model e.g. Logistic mixed model let  ij  Pr{favorable response | trt i, clinic j}   ij  Model: log      i  c j  (tc)ij  1    ij   2 c j i.i.d. N(0, C2 ); (tc)ij i.i.d. N(0, TC ) observations = proportion = f ij nij ( fij | c j , (tc)ij ) ~ Binomial ( ij , nij )  ij modeled by 14 May 2007 exp[   i  c j  (tc)ij ] 1  exp[   i  c j  (tc)ij ] SSP Core Facility 30 Department of Statistics Example 5     SAS for Linear Models, Output 10.39 2 treatments ni persons per treatment 4 times of measurement  response = number of seizures (count)  baseline and age observations  response = trt + hour + trthour + baseline & age pat(trt) + error  i.e. observation = systematic + random model + error 14 May 2007 SSP Core Facility 31 Department of Statistics Modeling Issues  Count typically not ~ normal  Poisson (or negative binomial) more likely  Generalized Linear Model Issues − Linear model not good direct model of mean − Variance depends on mean  Repeated Measures Issues − Observations within subjects correlated over time − Between subject variance 14 May 2007 SSP Core Facility 32 Department of Statistics Example 6     SAS for Mixed Models, Section 1.5.6 5 treatments observed in each of 4 randomized blocks several measurements at days between 130 and 180 growing degree days  response = (trt,day) + block + blktrt + error i.e. observation = systematic + random model + error 14 May 2007 SSP Core Facility 33 Department of Statistics Emergence over TIME by TRT Black: NoTill Red: SumBlade (summer) Cyan: SB&SD Green: SpDisk (spring) Blue: SpPlow 14 May 2007 SSP Core Facility 34 Department of Statistics Modeling Issues  “Usual” mixed model and repeated measures issues, plus  Linear model is poor model of trtday means 14 May 2007 SSP Core Facility 35 Department of Statistics Nonlinear Mixed Model Mixed Model: yijk  ij  bk  wik  eijk ij is trt  day mean; bk is block effect wiku is between subject error Gompertz Model : ij   i exp{ exp[ i  ( i  date j )]}  i is asymptote of i th treatment  i is "slope" of i th treatment i is inflection point of i th treatment i 14 May 2007 SSP Core Facility 36 Department of Statistics Typology of Models Type Mean Model Distribution NLMM h(X,,Z,u) y|u general, u normal ** GLMM h(X+Zu) y|u general, u normal * LMM X+Zu u, y|u normal NLM h(X,) y normal GLM h(X) y general LM X y normal * for PROC GLIMMIX ** for this course 14 May 2007 SSP Core Facility (G/N)LMM can be more general 37 Department of Statistics Generalized Mixed Model SAS Software Decision Table Response Errors Random Effects Mean Model Linear? SAS Proc Response Errors Random Effects Mean Model Linear? SAS Proc 14 May 2007 Normal Indep Corr no yes yes no yes no yes no GLM MIXED GLIMMIX NLIN MIXED NLMIXED %NLINMIX MIXED GLIMMIX NLMIXED %NLINMIX GLIMMIX Non-Normal Indep Correl no yes GENMOD GLIMMIX yes no yes no yes GLIMMIX NLMIXED NLMIXED GLIMMIX (GENMOD) SSP Core Facility no 38 Department of Statistics Essential GLMM Background 14 May 2007 SSP Core Facility 39 Department of Statistics First How do I run a SAS Program?  ??????? It’s easier than the urban legends would have you believe 14 May 2007 SSP Core Facility  40 Department of Statistics Basic Parts of SAS Program  DATA Step  PROC Step Data your_choice_of_name; Input list of variables; /* $ after alphameric var */ Datalines; data – one line / obs, one column per variable ; Proc GLIMMIX Data= your_choice_of_name; CLASS block group & trt var; MODEL response=block trt covar / options; ... Run;  Modify existing data set (Data __; Set__;) 14 May 2007 comment Data new_data_set_name; Set [old – e.g.] your_choice_of_name; program & data manipulation statements. e.g. LogY=Log(Y); SSP Core Facility 41 Department of Statistics Example of SAS Program DATA Step data demo1; input classroom trt $ time count; sc=sqrt(count); datalines; 1 std 1 12 1 std 2 16 1 std 4 17 1 std 8 24 2 exper 1 17 2 exper 2 24 2 exper 4 30 2 exper 8 32 11 std 1 16 11 std 2 15 11 std 4 22 11 std 8 23 8 exper 1 15 8 exper 2 20 8 exper 4 24 8 exper 8 27 ; 14 May 2007 PROC Step proc glimmix data=demo1; class classroom trt time; model sc=trt time trt*time / dist=normal ddfm=kr; random classroom(trt); lsmeans trt*time; ods output lsmeans=lsm; run; Data; Set; + new PROC data plot_growth; set lsm; log_time=log2(time); symbol i=join value=circle; proc gplot data=plot_growth; plot estimate*log_time=trt; run; SSP Core Facility 42 Department of Statistics II. Generalized Mixed Model Theory  Clarify Fixed vs Random effects  Linear Models − LM to LMM + GLM to GLMM  Estimation and Inference for − LMM − GLM − GLMM  For GLMM: − what follows naturally from GLM and LMM − Special Issues 14 May 2007 SSP Core Facility 43 Department of Statistics Fixed vs. Random Effects?  Fixed Effect? − levels observed = population of interest (except regression) − levels deliberately chosen − inference: systematic relationship between y and   Random Effect? − observed levels represent target population − random sample? -- ideal (but seldom perfectly realized) − makes sense to conceptualize probability distribution  Bottom Line: do observed levels of effect plausibly represent a probability distribution? − yes  random effect − no  fixed effect 14 May 2007 SSP Core Facility 44 Department of Statistics General Structure of Model  Nelder: observation=systematic + random  General approach: − likelihood consists of two parts  observation (y | u)  random effects u − model is mathematical description of  = E(y | u)  Distribution: − observation y | u ~ f(,R) − random effects u ~ MVN(0,G)  Model:  = h(X,,Z,u)  h() called “inverse link” 14 May 2007 SSP Core Facility 45 Department of Statistics Linear Model (LM)  No random effects  simple ANOVA (one error term)  multiple regression Assumption: y MVN (  , R) LM: Model  by X , usually represented as y  X   e; e N (0, R ) alternative representation (helpful for transition to GLMM) y 14 May 2007 MVN ( X  , R ) SSP Core Facility 46 Department of Statistics Generalizations of LM LM (Linear Model) obs ~ normal fixed effects only obs ~non-normal fixed effects only GLM: (Generalized Linear Model) obs ~ normal Random Effects LMM: (Linear Mixed Model) obs ~ non-normal random effects GLMM (generalized linear mixed model) 14 May 2007 SSP Core Facility 47 Department of Statistics GLM: Generalized Linear Model  Binomial: Logistic regression; Probit models  Poisson: Log-linear models Assumption: y dist (  , R) R is a function of  V ( ) called "Variance function" -- more later GLM: model  =g( ) by X  -- called "link function" alternatively, model  by h( X  )  " inverse link " Note: here y  or g (  )   X   e makes no sense Instead: y 14 May 2007 dist  h( X  ), R  SSP Core Facility 48 Department of Statistics LMM: Linear Mixed Model  Multi-error models; split-plot, multi-location  Repeated measures a.k.a. Longitudinal data Assume: y | u MVN (  , R ) u MVN (0, G ) More vocabulary: LMM: Model  by X   Zu Familiar notation: y  X   Zu  e; u  e   G MVN  0 alternatively: y|u 0 R  “G-side” concerns V(u) “R-side” concerns V(e) MVN  X   Zu  ; u ~ MVN (0, G ) or (marginal model) y 14 May 2007 MVN ( X  ,V ); V  ZGZ   R SSP Core Facility 49 Department of Statistics GLMM: Generalized Linear Mixed Model Assume: dist (  , R ) y|u as with GLM R depends on V (  ) u MVN (0, G ) GLMM models link function:   h  X   Zu  inverse link: GLMM: y|u Marginal Model: 14 May 2007  =E ( y | u ) by  =g (  )  X   Zu Modelling will involve •Distribution •Link (or inv link) •G-side •R-side dist  h  X   Zu  , R   f ( y | u ) f (u )du (more later) SSP Core Facility 50 Department of Statistics Some Grounding Before Moving On  “Hessian Fly” example, Gotway & Stroup (1997, JABES)  “Hessian Fly” not so important, but design & data structure are  16 treatments, 4 replications: 1 3 4x4 Lattice 2 5 6 1 5 2 4 7 8 9 13 10 14 10 13 14 3 7 4 12 15 16 11 15 12 16 1 6 2 5 1 14 13 2 11 16 12 15 7 12 11 8 1 14 13 10 5 10 9 6 3 8 7 4 3 16 15 4 − 16 incomplete blocks organized into 9 11 4 complete blocks  Response: Yij/nij (damaged / obs per trt x block unit) 14 May 2007 SSP Core Facility 6 8 51 Department of Statistics Linear Model (LM) Randomized Complete Block yij     i   j  eij ; eij i.i.d. N  0, 2   i  block effect;  i  treatment effect proc glimmix; class block entry; model pct=block entry; Incomplete Block Model - Intra-block analysis incomplete block replaces complete block in denoting  i proc glimmix; class inc_block entry; model pct=inc_block entry; 14 May 2007 SSP Core Facility 52 Department of Statistics Linear Mixed Model (LMM) Randomized Complete Block - Random block effects yij    ri   j  eij ri i.i.d. N  0,  R2  ; eij i.i.d. N  0,  2  ; ri  block effect;  i  treatment effect proc glimmix; class block entry; model pct=entry; random block; G-side modeling block effect Incomplete block (recovery of interblock information) Replace “block” by “inc_block”) 14 May 2007 SSP Core Facility 53 Department of Statistics LMM G-side / R-side Two alternative “G-side” specifications: proc glimmix; class block entry; model pct=entry; random block; R-side specification proc glimmix; class block entry; model pct=entry; random intercept/subject=block; proc glimmix; class block entry; model pct=entry; random _residual_ / type=cs subject=block; Here, it doesn’t matter (all equivalent) but for more complex models, the distinctions will matter 14 May 2007 SSP Core Facility 54 Department of Statistics Generalized Linear Model (GLM) yij Binomial  nij ,  ij    ij GLM ("Logit ANOVA" model): log  1 ij  proc glimmix; class block entry; model y/n = block entry;       i   j  or replace “block” by “inc_block” for intra-block logit ANOVA More on GLIMMIX syntax later Here, note Y/N causes default to Binomial distribution & Logit link (same as GENMOD) 14 May 2007 SSP Core Facility 55 Department of Statistics Generalized Linear Mixed Model (GLMM) yij | block effects Binomial  nij ,  ij  block effects ri i.i.d. N  0,  R2    ij  GLM ("Logit ANOVA" mixed model): log     ri   j  1    ij   proc glimmix; proc glimmix; class block entry; class block entry; model y/n = entry; model y/n = entry; random intercept / subject=block; random block; Marginal model not equivalent 14 May 2007 proc glimmix; class block entry; model y/n = entry; random _residual_ / type=cs subject=block; SSP Core Facility 56 Department of Statistics II. Inference in LM, GLM, LMM, and GLMM Inference for fixed effects based on estimable functions In LM theory, K  estimable if it can be expressed as AE ( y ) i.e. K   AX  OLS ˆ  ( X X )  X y theorem : K  estimable iff K '  K '( X X )  ( X X ) Main advantage K ˆ invariant to choice of ( X WX )  i.e. when X not full rank,  has no intrinsic interpretation K  does (e.g. treatment difference, marginal (least squares) mean 14 May 2007 SSP Core Facility 57 Department of Statistics II. Examples of Estimable Functions e.g . one way model: yij     j  eij ; i  1, 2,3, 4; j  1,..., n Estimable functions include  Trt marginal ("Least Squares") mean (LSMean)  + i e.g .  1 0 0 0 for i  1 Trt differences e.g. 1   2  k   1 SS(trt) k    0 1 1 0 0 K such that all  i equal 0 1 0 0 1 e.g . K   0 0 1 0 1   0 0 0 0 1 14 May 2007 SSP Core Facility 58 Department of Statistics II. Common Inference Results for GLM K ˆ ~ approx MVN ( K  , K ( X WX )  K ) exact for LM Wald statistic: purpose: test H 0 : K   0 Wald  ( K ˆ )[ K ( X WX )  K ]1 ( K ˆ ) 2 approx ~  rank (K ) Note in OLS Wald  14 May 2007 SS ( H 0 ) 2 SSP Core Facility 59 Department of Statistics II. GLM: Inference with Unknown Scale Parameter Recall, in OLS SS ( H 0 ) Wald  2 But what if  2 unknown? Think ANOVA: Thus, Wald rank ( K ) SS ( H 0 ) Use ˆ 2  SS ( H 0 ) MSE SS ( H 0 ) dfh ~ F( dfh ,dfe ) MSE Generalization: in GLM, scale parameter  14 May 2007 Pearson  2 Deviance or dfe dfe SSP Core Facility 60 Department of Statistics II. Extension of GLM Scale Parameter Quasi-Likelihood  Overdispersion Counts Poisson  E ( y )  Var ( y ) but in practice E ( y )  Var ( y ) Quasi-likelihood: you specify E  y    Var ( y )    “Working Correlation” Repeated Measures Assumed distribution  Var ( y )  diag V (  )  But in reality, errors are correlated, so model variance as Var ( y )  R 2 AR 2 where R 2  diag  V (  )  1 1 1 A is working correlation - structure analogous to true R-side in LMM 14 May 2007 SSP Core Facility 61 Department of Statistics II. GLM: Deviance and Likelihood Ratio Test Full model: X i.e.   h( X  ) Decompose as X 1 1  X 2  2 Suppose we want to test H 0 :  2  0 1. Fit full model Dev( X  )  2 log[ ( X  )  ( y )] 2. Fit reduced model X 1 1 Dev( X 1  )  2 log[ ( X 1 1 )  ( y )] 3. LR statistic Dev( X  )  Dev ( X 1 1 ) 14 May 2007 SSP Core Facility 62 Department of Statistics II. LMM: The “Mixed Model Equations” ( y )  ( y  X   Zu )R 1 ( y  X   Zu )  uG 1u  ( y)  X R 1 ( y  X   Zu )   ( y) and  Z R 1 ( y  X   Zu )  G 1u u solving yields   X R 1 X  1  X R Z       X R 1 y  X R 1Z      Z R Z  G   u   Z R 1 y  1 Mixed Model Solution 1 note: Marginal Model Solution uˆ  GZ V 1 ( y  X  ) and ˆ  ( X V 1 X )  X V 1 y 14 May 2007 SSP Core Facility 63 Department of Statistics II. LMM Inference – G and R known Inference based on Predictable functions K   M u "predictable" if K  is estimable (reduces to estimable function K  if focus on fixed effects only) K  1. Var[ K   M (u  uˆ )]  [ K  M ]C   M   X R X X R Z  where C   1 1 1    Z R X Z R Z  G   2. Let L   K  M  and  =    u 1 1 _ Wald statistic for tests on L is ( Lˆ)[ LCL]1 ( Lˆ) ~  2 rank ( L ) 14 May 2007 SSP Core Facility 64 Department of Statistics II. LMM Inference – G and R unknown 1. Replace G and R by Gˆ and Rˆ  estimate variance and covariance components 2. Denote Cˆ as C with estimated var/cov components ˆ 3. "Naive" Var[ L(  ˆ)]  LCL ˆ )  LCL but E ( LCL  Kenward-Roger adjustment 4. Approximate F ˆ ]( L )    ( L  ) [ L CL Wald  rank ( L) rank ( L) approx Frank ( L ), F may be biased  ;  often must be approximated 14 May 2007 SSP Core Facility 65 Department of Statistics II. LMM: Variance Component Estimation Several methods 1. For variance-component-only models: use EMS from ANOVA 2. Maximum likelihood − problem: biased 3. Restricted maximum likelihood 4. Several computational approaches a. Newton Raphson b. Fisher Scoring c. EM 14 May 2007 SSP Core Facility 66 Department of Statistics What’s Wrong with ML?  An example to illustrate  SAS for Mixed Models, Data Set 1.5.1  Incomplete Block design from Cochran & Cox, Experimental Designs, p 456  15 treatments  15 blocks  4 treatments observed per block 14 May 2007 SSP Core Facility 67 Department of Statistics C&C Example: ML and two alternatives Intrablock (fixed block) analysis equivalent to PROC GLM proc glimmix data=cc456; class trt bloc; model y=trt bloc; Inter/Intra-block (random block)analysis –default proc glimmix data=cc456; class trt bloc; model y=trt; random bloc; PROC MIXED default give same result Inter/Intra-block (random block) analysis – ML proc glimmix data=cc456 method=mspl; class trt bloc; model y=trt; random bloc; 14 May 2007 SSP Core Facility same as Proc MIXED METHOD=ML; 68 Department of Statistics ML vs Alternative Results: Which is Right? Intrablock (fixed block) Type III Tests of Fixed Effects ˆ  8.62 2 Effect Num Den D D F F F Value trt 14 31 1.23 0.3012 Type III Tests of Fixed Effects Intra/interblock (random) block default ˆ R2  4.65 ˆ 2  8.56 Effect Num D F trt Intra/interblock (random) block - ML ˆ  4.50 2 R 14 May 2007 ˆ  6.04 Pr > F 14 Den D F 36.2 F Value Pr > F 1.48 0.1676 Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F 14 49.04 2.02 0.0352 2 trt SSP Core Facility 69 Department of Statistics Simulation  ML or REML  1000 simulated data sets using C & C, p 456 design  B2/2 = 0.5  Recorded type I error rate for Ftrt − intrablock − − 14 May 2007 Variable N Mean REML random block ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ fixd_rej05 1000 0.0590000 ML random block REML_rej05 1000 0.0610000 ML_rej05 1000 0.2140000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ SSP Core Facility 70 Department of Statistics II. LMM with estimated G and R Bias in std error and test statistics Kenward & Roger (Biometrics, 1997) Consider estimable function K  When  unknown, estimates used to obtain Vˆ  "naive" estimate Var ( K ˆ )  K ( X Vˆ 1 X )  K Using Taylor series expansion, can show E[ K ( X Vˆ 1 X )  K ] 2 1  1  ( X V X) 1     ˆ ˆ  K ( X V X ) K   cov( i ,  j ) K K 2 i, j  i  j 14 May 2007 SSP Core Facility 71 Department of Statistics II. LMM: Degrees of Freedom Simple Case model: yijk     i  b j  (ab)ij  eijk bj N (0,  B2 ); (ab)ij ANOVA Source 14 May 2007 2 N (0,  AB ); eijk N (0,  2 ) EMS A 2  2  n AB  QA B 2  2  n AB  na B2 AB 2  2  n AB error 2 SSP Core Facility 72 Department of Statistics II. Degrees of Freedom (2) Trt diff: 1   2  nb  ( Var (ˆ1  ˆ 2 )  2 2  n 2 AB ) 2  MS ( AB) nb    denominator d.f.=df ( AB) Trt mean:  + i  nb   b  1   1 1  nb   b  MS ( AB)   b  MS (B) Var ( ˆ +ˆ i )  1 2 ( 2  n AB  n B2 )   approximated via Satterthwaite's procedure 14 May 2007 SSP Core Facility 73 Department of Statistics II. Satterthwaite Approximation for linear combination of MS MS   ci MSi i approximate d.f. for MS is 2 2  b -1     1 MSAB  MSB c MS    i i   b     b i      e.g. 2 2 2 2 ci MSi   b -1  2    1 2  i df   MSAB     MSB  i  b     b    df ( AB )   df ( B )          14 May 2007 SSP Core Facility 74 Department of Statistics II. Satterthwaite Approximation in LMM Approximation: 2[ E ( K ( X V 1 X )  K )]2 2( K ( X V 1 X )  K ) 2  or 1  Var ( K ( X V X ) K ) Var ( K ( X V 1 X )  K ) For vector K (e.g. treatment contrast): Approximate Var ( K ( X V 1 X )  K ) by g Ag  ( K ( X V 1 X )  K ) g , where   vector of (co)variance components  A2  {trace   V  ZGZ   R, 14 May 2007  V (P     i  V   P     j        )   1 P  V 1  V 1XCX V 1 SSP Core Facility 75 Department of Statistics II. GLMM Estimation GLMM is model of E ( y | u ) Link form: g  E ( y | u )     X   Zu Inverse link form: E ( y | u )   = h( X   Zu ) More general expression of distribution of  y | u  Var  y | u   R  R AR 2 1 2 1 R 2  diag  V ( i )   A is "working correlation matrix"   Estimation: as with LMM, may choose to focus on 1 1.  only GLS equations in LMM; Generalized Estimating Equations with GLMM 2.  and u 14 May 2007 several approaches SSP Core Facility 76 Department of Statistics II. Working Correlation Recall Gotway & Stroup (1997) Hessian Fly Example Gotway and Stroup considered spatial variation among e.u. proc glimmix; class block entry; model y/n=entry; random intercept / subject=block; random _residual_ / type=sp(sph)(row col) subject=block; 1 2 5 6 1 5 2 6 3 4 7 8 9 13 10 14 9 10 13 14 3 7 4 11 12 15 16 11 15 12 16 1 6 2 5 1 14 13 2 11 16 12 15 7 12 11 8 1 14 13 10 5 10 9 6 3 8 7 4 3 16 15 4 8 MODEL sets up Binomial GLM, Logit link RANDOM _RESIDUAL_ sets up a working correlation based on SPHERICAL semivariogram 14 May 2007 SSP Core Facility 77 Department of Statistics II. Marginal (PA) vs Subject-Specific Inference Marginal Mean: E ( y ) Population Averaged (PA) Conditional Mean: E ( y | u ) Note: E ( y )  E  E ( y | u )   E  h( X   Zu )  SS (true GLMM) In general, cannot be further simplied Example: log link, u ~normal   E ( y | u )  exp( X   Zu )  E ( y )  E  exp( X   Zu )   exp( X  ) M u ( Z ) M u ( Z ) is moment generating function of U eval at Z   u2  E ( y )  exp( X  ) exp   2 14 May 2007    u2    log  E ( y )   X    2     SSP Core Facility 78 Department of Statistics II. More on PA (marginal) vs. SS Probit-normal model: Pr( y  1| u )   ( X   Zu ); u N (0, G ) can show X   E ( y)     ( X  )   Z GZ  1  in LMM, model X   Zu  e; u N (0, I  u2 ); e N (0, I  e2 ) 1  .     1 .   and X   e; e N  0, R  ; R   2  .     1   are equivalent. However, in GLMM, they are not. Yield different estimates, std. errors, etc. 14 May 2007 SSP Core Facility 79 Department of Statistics II. Estimation of GLMM  model E(y|u)  inverse link: E(y|u)=h(X+Zu)  link: g[E(y|u)]==X+Zu  to estimate  and u need to evaluate f(y), f(y|u) − approximate e.g. by Taylor series expansion  Penalized Quasi-Likelihood (SAS %GLIMMIX)  SAS PROC GLIMMIX (next slides) − numerical integrate joint density  Gauss-Hermite Quadrature (Proc NLMIXED) − stochastically evaluate integral  Monte Carlo Markov Chain (WinBugs – not in this course) 14 May 2007 SSP Core Facility 80 Department of Statistics II. Computational Method Comparison  GEE − Computationally easy − Meaning of marginal results in GLM?  Linearized GLMM (current PROC GLIMMIX) − uses familiar LMM analogs (but many are ad hoc & need further research) − allows considerable R-side flexibility − adequate for many GLMM; breaks down for certain cases (binary data)  Integral Approximation (PROC NLMIXED) − better approximation that Linearized GLMM − BUT: ML only, simple G-side models only, no R-side  LaPlace − computationally less demanding than Integral approximation but often “accurate enough”; same limitations as Integral approximations  MCMC − simple models only; limited & temperamental software − but in extreme cases, only way to get accurate results 14 May 2007 SSP Core Facility 81 Department of Statistics Modeling Considerations 14 May 2007 SSP Core Facility 82 Department of Statistics Basic Parts of SAS Program  DATA Step  PROC Step Data your_choice_of_name; Input list of variables; /* $ after alphameric var */ Datalines; comment data – one line / obs, one column per variable ; proc glimmix data=demo1; class classroom trt time; model sc=trt time trt*time / dist=normal ddfm=kr; random classroom(trt); lsmeans trt*time; ods output lsmeans=lsm; run; 14 May 2007 SSP Core Facility 83 Department of Statistics III. Modeling Considerations  Overdispersion  Marginal (PA) vs Conditional (SS) models  “Data” vs “Model” Scale 14 May 2007 SSP Core Facility 84 Department of Statistics III. Model Considerations      Variance Model & Overdispersion Choice of Link Function Choice of Distribution Choice of Model Effects Correlated Errors?  Any of the above could show up as “overdispersion” 14 May 2007 SSP Core Facility 85 Department of Statistics III. GLMM: Model Considerations  Common dilemma  Design, e.g. like “Hessian fly” example  BINOMIAL data  Recover interblock information - BLOCK random Model (Logit GLMM): or equivalently  ij  1 2 5 6 1 5 2 6 3 4 7 8 9 13 10 14 9 10 13 14 3 7 4 8 11 12 15 16 11 15 12 16 1 6 2 5 1 14 13 2 11 16 12 15 7 12 11 8 1 14 13 10 5 10 9 6 3 8 7 4 3 16 15 4 exp    ri   j  1  exp    ri   j    ij log   1  ij       ri   j  Analysis reveals that the data are overdispersed 14 May 2007 SSP Core Facility 86 Department of Statistics III. Hessian Fly Example proc glimmix data=HessianFly; class block entry; model y/n = entry; random block; Fit Statistics -2 Res Log Pseudo-Likelihood 182.21 Generalized Chi-Square 107.96 Gener. Chi-Square / DF 2.25 Evidence of Overdispersion when >>1 14 May 2007 SSP Core Facility 87 Department of Statistics III. Overdispersion  Observed variance > variance under presumed model  Symptom: Deviance/DFE or chi-square/DFE >> 1  Uniquely a GLM / GLMM issue − not a consideration with LM, LMM − y|u ~ normal implies variance not a function of mean  When is there an issue − If Var(y) = f[E(y)] and − using scale adjustment requires unrealistic assumptions 14 May 2007 SSP Core Facility 88 Department of Statistics III. Common fix for Overdispersion Multiply variance by scale parameter. Here:  1    proc glimmix data=HessianFly; class block entry; model y/n= entry; random block; random _residual_; Covariance Parameter Estimates Cov Parm Subject Intercept block 0 . 2.2668 0.4627 Residual (VC) estimates  Issue: not a true likelihood Covariance Parameter Estimates vs. w/o ˆ 14 May 2007 Estimate Standard Error Cov Parm Subject Intercept block SSP Core Facility Estimate Standard Error 0.01116 0.03116 89 Department of Statistics Impact of Scale Parameter on Inference Type III Tests of Fixed Effects no scale parameter Effect entry with scale parameter adjustment Num Den DF DF 15 45 F Value Pr > F 6.90 <.0001 Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F entry 15 45 3.03 0.0020 failure to account for overdispersion tends to increase type I error rate but is this the best way to address the problem? 14 May 2007 SSP Core Facility 90 Department of Statistics III. Mean – Variance Overdispersion Models Var ( y )  f (  ,  )  1    ,  No scale parameter  binomial, poisson  Nonlinear scale parameter  1-  1+  negative binomial, gen. poisson, beta   2 Linear scale parameter  gamma, inverse gaussian   No mean parameter  normal  14 May 2007 SSP Core Facility 91 Department of Statistics III. Marginal or Conditional Formulation  For many models (notably LMM) there are equivalent forms − conditional (mixed, SS) model − marginal (PA) model − lead to the same marginal log-likelihood  Distinction results from − G-side model; random model effects − R-side model; marginal model 14 May 2007 SSP Core Facility 92 Department of Statistics III. Example: variance component (G-side) vs. Compound symmetry (R-side) yij    ri   j  eij ri i.i.d. N  0,  R2  eij i.i.d. N  0,  2   R2   2  R2 ...  R2    2 2 2    ...  R R  Var Yi    R2 J   2 I    ... ...   2 2 R    14 May 2007 SSP Core Facility 93 Department of Statistics III. Compound Symmetry Equivalent 2    2 2 2 R Let  C   R   and  =  2 2      R  Model: yij     i  Eij   if i  k (same block) Var  Eij    Corr  Eij , Ekl    0 otherwise  1  ...    1 ...   Models equivalent if   0 2   Var Yi    C ... ...    1   2 C 14 May 2007 SSP Core Facility 94 Department of Statistics III. G-side / R-side proc glimmix; class block entry; model y/n=entry; random block; proc glimmix; class block entry; model y/n=entry; random intercept / subject=block; R-side model same model proc glimmix; G-side class block entry; model y/n=entry; random _residual_ / type=CS subject=block; proc mixed; class block entry; model y=entry; repeated / type=CS subject=block; 14 May 2007 SSP Core Facility 95 Department of Statistics III. Variance Component vs CS in GLMM  Variance component model is GLMM  CS model is GEE  They are not equivalent Conditional model:  yij | ui logit  ij     ri   j  exp    ri   j     Binomial       ri   j   1  exp   marginal distribution is p( yij )   p  yij | ui  p(ui ) dui Marginal model: logit  ij   i   j with working correlation matrix defined by CS form yij is NOT Binomial, merely borrow Binomial-like quasi-likelihood form Does such a distribution actually exist? 14 May 2007 SSP Core Facility 96 Department of Statistics III. Conditional vs. Marginal Results Marginal Conditional Fit Statistics Gener. Chi-Square / DF Fit Statistics 2.27 Gener. Chi-Square / DF Covariance Parameter Estimates Cov Parm Subject Intercept block Residual (VC) Covariance Parameter Estimates Estimate 0 2.2668 Cov Parm Subject Estimate CS block -0.03247 Residual Type III Tests of Fixed Effects Effect Den DF F Value Pr > F entry 15 45 3.03 0.0020 14 May 2007 2.2992 Type III Tests of Fixed Effects Num DF which is right? 2.30 Effect Num DF Den DF F Value Pr > F entry 15 45 2.99 0.0023 •fit statistic? •can you simulate data using mechanism implied by model? SSP Core Facility 97 Department of Statistics III. Marginal or Conditional?  How to choose? − Conditional: G-side; Marginal: R-side − Fit statistic? (may help; may deceive)  General recommendation − G-side formulation preferred for non-normal data − G-side effects operate inside the link function & hence always lead to valid conditional & marginal distributions − R-side effects operate outside the link function − for non-normal data, models implied by R-side effects may be vacuous 14 May 2007 SSP Core Facility 98 Department of Statistics III. Impact of Model Effects  Back to Hessian Fly Data  Incomplete Block Design  Try more appropriate model Fit Statistics Gener. Chi-Square / DF 1.41 Covariance Parameter Estimates proc glimmix; class inc_block entry; model y/n-entry; random intercept / subject=inc_block; 14 May 2007 Cov Parm Subject Intercept inc_block Estimate 0.4971 Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F entry 15 33 6.33 <.0001 SSP Core Facility 99 Department of Statistics III. Inference  After model fit & estimation, inference begins  Also want at least some of following  comparisons among groups (trt, entry...) − test hypotheses − obtain confidence intervals − obtain predictions − further model checking 14 May 2007 SSP Core Facility 100 Department of Statistics III. Scale issue for GLM, GLMM  For GLM, GLMM there are two “natural scales” − linear (or model) scale (e.g. logit) − data scale  May be other scales, depending on context − odds − odds ratio 14 May 2007 SSP Core Facility 101 Department of Statistics III. Choosing the Scale  Example: Hessian Fly – binomial dist, logit link  Data: measured as 0/1; per e.u. as Y/N  Main focus: entry effect on P{indiv resp = 1} Link: Inverse Link: 14 May 2007   ij log   1  ij     ij     ri   j  exp îj  îj  1  exp îj  SSP Core Facility 102 Department of Statistics III. Scale and Inference Main tool of inference: estimable functions e.g. entry "LS Mean" ˆ +ˆ j ˆ j  ˆ j entry difference These are estimated on the "linear" or "model" scale can denote: ˆ or ˆ  ˆ  j j j Main focus of inference: on data scale e.g. P resp  1| entry  i  ˆ j entry difference between probabilities ˆ j  ˆ j Require "inverse linking": 14 May 2007 ˆ j    exp ˆ j SSP Core Facility   1  exp ˆ j 103 Department of Statistics III. Inverse Linking  Estimation occurs on model scale  But reporting typically must occur on data scale Estimate: ˆ  K ˆ Std error: s.e. ˆ   k Var ( ˆ )k Confidence interval:   z   s.e. ˆ   2 Inverse linked estimate ˆ  h ˆ    e.g. exp ˆ  1  exp ˆ   h ˆ   “delta” s.e.  ˆ     s.e ˆ     rule  Inverse linked confidence interval h( LowerB ), h(UpperB ) Inverse linked std error 14 May 2007 SSP Core Facility 104 Department of Statistics III. Model & Data Scale – Hessian Fly Example Solutions for Fixed Effects Effect entry Intercept Estimate Standard Error DF t Value Pr > |t| -1.9057 0.4886 15 -3.90 0.0014 entry 1 3.8001 0.6327 33 6.01 <.0001 entry 2 3.4821 0.6186 33 5.63 <.0001 Estimates Estimate Standard Error Lower Upper Mean Standard Error Mean entry 1 1.8944 0.4608 0.9568 2.8319 0.8693 0.05237 0.7225 0.9444 entry 2 1.5765 0.4321 0.6974 2.4555 0.8287 0.06133 0.6676 0.9210 diff entry 1-2 0.3179 0.5793 -0.8607 1.4965 0.5788 0.1412 0.2972 0.8171 Label linear or model scale 14 May 2007 SSP Core Facility Lower Mean Upper Mean which of these data scale make NO sense? 105 Department of Statistics on to GLIMMIX 14 May 2007 SSP Core Facility 106 Department of Statistics IV. GLIMMIX Syntax  SAS software for GLMs & Mixed models  Basic GLIMMIX syntax  Similarities & Differences vs existing SAS Procs  New features 14 May 2007 SSP Core Facility 107 Department of Statistics IV. SAS Software for Linear Models  LM − Proc GLM, MIXED − Proc GLIMMIX  GLM − Proc GENMOD − Proc GLIMMIX Proc NLMIXED  LMM − Proc MIXED − Proc GLIMMIX  GLMM − Proc GLIMMIX 14 May 2007 Proc NLMIXED SSP Core Facility 108 Department of Statistics IV. PROC GLIMMIX Syntax  What’s familiar (from MIXED & GENMOD) − − − − − − CLASS MODEL DIST and LINK options in MODEL (like GENMOD) RANDOM (for G-side) ESTIMATE, CONTRAST, LSMEANS ODS  What’s new or different − − − − − RANDOM _RESIDUAL_ (replaces REPEATED for R-side) LSMESTIMATE new options in LSMEANS (e.g. better options for factorial exp) NLOPTIONS Model diagnostics 14 May 2007 SSP Core Facility 109 Department of Statistics IV. Relation between GLMM Structure and GLIMMIX Code y | u ~ dist   , R  Var (u )  G GLMM: g   | u   X   Zu Var  y | u   V PV 2 1 2 1 proc glimmix; class variables; model <resp>=<fixed effects> /dist= link= ; random <g-side effects> / <options>; random _residual_ / type= subject= ; run; 14 May 2007 SSP Core Facility 110 Department of Statistics IV. NLOPTIONS Statement  New Statement in GLIMMIX  Controls Optimization technique, Line Search Method, number of Iterations, etc proc glimmix; class id a b; model y=a b a*b; random _residual_ / type=cs subject=id(a); nloptions tech=nrridge maxiter=100; TECH=NRRIDGE causes GLIMMIX to use MIXED computing algorithm (good for comparison...) 14 May 2007 SSP Core Facility 111 Department of Statistics IV. Programming Statements  Similar to GENMOD, NLIN, NLMIXED  GLIMMIX supports statements using DATA step syntax  Use to transform variables, define quantities to output, user-defined link, variance, etc.  For example.... proc glimmix; class block entry; pct=y/n; model pct=entry; random intercept / subject=block; 14 May 2007 SSP Core Facility 112 Department of Statistics IV. Some GLIMMIX Defaults Useful to Know  In MODEL statement − response Y= NORMAL distribution & IDENTITY link − response Y/N= BINOMIAL distribution and LOGIT link  For distributions without scale parameter in variance function (e.g. Binomial, Poisson) − no scale parameter assumed (unlike %GLIMMIX macro) − obtain scale parameter with RANDOM _RESIDUAL_  Optimization method automatically matched based on DISTRIBUTION & LINK 14 May 2007 SSP Core Facility 113 Department of Statistics IV. Estimation Methods in PROC GLIMMIX  Defaults depend on model, distribution, and link  May be altered with METHOD= option − in PROC statement  METHOD= options − variations on pseudo-likelihood Restricted obj fct − RSPL (like REML) − RMPL − MSPL Unrestricted obj fct (like ML) − MMPL 14 May 2007 SSP Core Facility subject specific (conditional or mixed) model population averaged (marginal) model 114 Department of Statistics IV. Defaults & Methods (continued)  GLMM Default Method is RSPL  For LMM, this is REML − GLIMMIX uses different algorithm than MIXED, TECH=NRRIDG uses MIXED algorithm − you can get slightly different numbers with MIXED/GLIMMIX  METHOD=MSPL yields ML estimates  Methods appear in literature as MPL, PQL  Gaussian adaptive quadrature and LaPlace algorithms will be added to V 9.2 − not available yet & not discussed here 14 May 2007 SSP Core Facility 115 Department of Statistics IV. Examples proc glimmix; class id; model y=x / dist=poisson; run; proc glimmix; class id; model y=x / dist=poisson; random _residual_; run; proc glimmix; class id; _variance_=_mu_*_mu_; model y=x / dist=poisson; run; 14 May 2007 SSP Core Facility Poisson regression Log link Poisson regression Log link add scale parameter Poisson regression Log link change variance function 116 Department of Statistics IV. “GLM-mode” vs “GLMM-mode”  Use following trick to get GLM (GENMOD) type model via pseudo-likelihood proc glimmix; class id; model y=x / dist=poisson; random _residual_; proc glimmix; class id; model y=x / dist=poisson; random _residual_ / subject=id; 14 May 2007 SSP Core Facility “GLM-mode” max likelihood “GLMM-mode” pseudo likelihood this is a GEE with indep working corr 117 Department of Statistics IV. Distributions supported by GLIMMIX 14 May 2007 Discrete Continuous Binary Binomial Beta Normal Poisson Lognormal Geometric Negative Binomial Multinomial −Nominal −Ordinal Gamma Exponential Inverse Gaussian Shifted T SSP Core Facility 118 Department of Statistics IV. MIXED to GLIMMIX – R-side proc mixed; class loc id trt time; model y=trt | time; random loc; repeated / type=ar(1) subject=id(loc); proc glimmix; class loc id trt time; model y=trt | time; random intercept / subject=loc; random _residual_ / type=ar(1) subject=id(loc); when you use GLIMMIX, you will notice it is much fussier about SUBJECT= statement when nested subject structure is present (MIXED more likely to let you get away with ignoring SUBJECT) 14 May 2007 SSP Core Facility 119 Department of Statistics IV. More on R-side proc mixed; class loc id trt time; model y=trt | time; random loc; repeated time / type=ar(1) subject=id(loc); alternative form of random residual e.g when time points missing, unsorted etc. proc glimmix; class loc id trt time; model y=trt | time; random intercept / subject=loc; random time / type=ar(1) subject=id(loc) residual; ** vs random _residual_ / type=ar(1) subject=id(loc); 14 May 2007 SSP Core Facility 120 Department of Statistics IV. MIXED to GLIMMIX - Estimate  MIXED: single row ESTIMATE statements proc mixed; class trt; model y=trt a x trt*a trt*x; estimate ’10 3’ trt 1 -1 trt*a 10 -10 trt*x 3 -3; estimate ’20 3’ trt 1 -1 trt*a 20 -20 trt*x 3 -3; estimate ’30 3’ trt 1 -1 trt*a 30 -30 trt*x 3 -3;  GLIMMIX: multi-row with multiplicity adjustment proc glimmix; class trt; model y=trt a x trt*a trt*x; estimate ’10 3’ trt 1 -1 trt*a 10 -10 trt*x 3 -3, ’20 3’ trt 1 -1 trt*a 20 -20 trt*x 3 -3, ’30 3’ trt 1 -1 trt*a 30 -30 trt*x 3 -3 / adjust=scheffe; 14 May 2007 SSP Core Facility 121 Department of Statistics IV. MIXED vs. GLIMMIX - LSMEANS  Example: Factorial PROC MIXED; class A B; model y=A|B; lsmeans A B/diff; lsmeans A*B/diff slice=(A B); gives you table of all possible differences tests – but does not estimate – simple effects A given B, vice versa PROC GLIMMIX; gives multiple range class A B; display users love  model y=A|B; lsmeans A B/diff lines; lsmeans A*B / slice=(A B) slicediff=(A B); restricts A*B diffs to actual simple effects, e.g. A1-A2|Bj 14 May 2007 SSP Core Facility 122 Department of Statistics IV. GLIMMIX – LSMEANS (1) Main Effects B Least Squares Means B Estimate Standard Error 1 18.5300 1.3226 13.69 14.01 <.0001 2 26.5200 1.3226 13.69 20.05 <.0001 4 28.2800 1.3226 13.69 21.38 <.0001 8 25.3000 1.3226 13.69 19.13 <.0001 DF t Value Pr > |t| T Grouping for B Least Squares Means LS-means with the same letter are not significantly different. B Estimate 4 28.2800 A A proc glimmix data=AxB_example; class block A B; model y=A|B/ddfm=satterth; random block block*B; lsmeans A B/diff lines; lsmeans A*B/slicediff=(A B); run; 14 May 2007 SSP Core Facility 2 26.5200 A A 8 25.3000 A 1 18.5300 B 123 Department of Statistics IV. GLIMMIX – LSMEANS (2) Simple Effects proc glimmix data=AxB_example; class block A B; model y=A|B/ddfm=satterth; random block block*B; lsmeans A B/diff lines; lsmeans A*B/slicediff=(A B); run; A*B Least Squares Means A*B Least Squares Means A B Estimate Standard Error r 1 20.0000 r 2 r r A B Estimate Standard Error 1.4769 s 1 17.0600 1.4769 27.8400 1.4769 s 2 25.2000 1.4769 4 28.1800 1.4769 s 4 28.3800 1.4769 8 24.8000 1.4769 s 8 25.8000 1.4769 Simple Effect Comparisons of A*B Least Squares Means By B Simple Effect Level A _A B1 r B2 Estimate Standard Error DF t Value Pr > |t| s 2.9400 1.3144 16 2.24 0.0399 r s 2.6400 1.3144 16 2.01 0.0618 B4 r s -0.2000 1.3144 16 -0.15 0.8810 B8 r s -1.0000 1.3144 16 -0.76 0.4578 14 May 2007 SSP Core Facility 124 Department of Statistics IV. GLIMMIX – LSMEANS (3)  lsmeans a*b / diff; gave you this Differences of A*B Least Squares Means Estimate Standard Error DF t Value Pr > |t| A B _A _B r 1 r 2 -7.8400 1.8796 19.49 -4.17 0.0005 r 1 r 4 -8.1800 1.8796 19.49 -4.35 0.0003 r 1 r 8 -4.8000 1.8796 19.49 -2.55 0.0192 r 1 s 1 2.9400 1.3144 16 2.24 0.0399 r 1 s 2 -5.2000 1.8796 19.49 -2.77 0.0121 r 1 s 4 -8.3800 1.8796 19.49 -4.46 0.0003 r 1 s 8 -5.8000 1.8796 19.49 -3.09 0.0060 r 2 r 4 -0.3400 1.8796 19.49 -0.18 0.8583 r 2 r 8 3.0400 1.8796 19.49 1.62 0.1219 r 2 s 1 10.7800 1.8796 19.49 5.74 <.0001 etc 14 May 2007 SSP Core Facility 125 Department of Statistics IV. GLIMMIX -- LSMESTIMATE Example: Simple Effect in 2-Factor Factorial Model: yijk  ij  eijk     i   j   ij  eijk Simple Effect, e.g. A|B ij  ij   i   i    ij   i j estimate ‘A|B’ a*b 1 0 0 0 -1 0 0 0; must write estimate ‘A|B’ a 1 -1 a*b 1 0 0 0 -1 0 0 0; new GLIMMIX alternative 14 May 2007 not estimable lsmestimate a*b ‘A|B’ 1 0 0 0 -1 0 0 0;  Defined on ij not on model effects  Allows multiple LSMESTIMATES & ADJUST= for multiplicity SSP Core Facility 126 Department of Statistics IV. ODS Graphics With GLIMMIX  Not available with MIXED ods html; ods graphics on; ods select MeanPlot; proc glimmix data=AxB_example; class block A B; model y=A|B/ddfm=satterth; random block block*B; lsmeans A*B/plot=MeanPlot (sliceby=A join cl); run; ods graphics off; ods html close; run; 14 May 2007 SSP Core Facility 127 Department of Statistics Factorial Treatment Design  Treatment Design vs Experiment (or study) Design  Factorial is type of treatment design  Factor A, a levels; Factor B, b levels; etc  Main inference tools: − simple effects; e.g. method effect | variety j − interaction; i.e. simple effects equal for all j − main effects 14 May 2007 SSP Core Facility 128 Department of Statistics Model: yijk  ij  Eijk Eijk is generic random structure yijk  k th obs on ij th A  B specific form depends on design ij  ij th A  B mean Simple effect: A | B j : ij  ij B | A i : ij  ij Interaction: equal simple effects  no interaction e.g. ij  ij  ij  ij Main effect: i  i or  j   14 May 2007 j SSP Core Facility 129 Department of Statistics GLIMMIX Features  Can estimate / test − simple effects − main effect − depending on which is appropriate  ODS graphics can graph / plot effects of interest  SLICE can focus on simple effects in presence of interaction  SLICEDIFF can estimate simple effects of interest 14 May 2007 SSP Core Facility 130 Department of Statistics Modeling & Design 14 May 2007 SSP Core Facility 131 Department of Statistics But My Study is not a Designed Experiment!  Comparative Study: any study whose purpose is to compare treatments or conditions (includes assessing change over time). Includes “quasiexperiments” & surveys with comparative objectives + designed experiments. Design principles apply to all!  Most modeling issues are study design issues  Most modeling errors result from poor understanding of design principles 14 May 2007 SSP Core Facility 132 Department of Statistics If you are modeling, you need to understand design principles!! 14 May 2007 SSP Core Facility 133 Department of Statistics Key Terms in Design  Treatment Design: factors and levels & how they are structured in the study. E.g factorial, planned obs over time  Experiment Design: Organization of experimental units (e.g into matched pairs, blocks, strata, clusters); plan by which they are assigned to treatment levels.  Experimental Unit: (e.u.) Smallest entity to which treatment levels (or treatment combinations) are independently assigned. E.U.s are legitimate units of replication  Sampling Unit: Unit on which measurement is taken. May be e.u. itself or subset of e.u. A.k.a. pseudo-replicate  Pseudo-replication: use of S.U.s as units of replication; common form of inappropriate design & analysis 14 May 2007 SSP Core Facility 134 Department of Statistics Factorial & Experiment Designs  idea: experimental unit is smallest entity to which treatment level independently applied  e.u. may be different size for different factors  e.g. from SAS for Mixed Models, Section 4.6 − 2 type  3 dose example  dose applied to cage; type to animal in cage  e.u. for dose: cage with 2 animals  e.u. for type (and dose  type): animal   split-plot  many variations (including repeated measures) 14 May 2007 SSP Core Facility 135 Department of Statistics Adding to Model classroom exp std curriculum students school Treatment Participate in Prof Devel classroom std exp curriculum 14 May 2007 students school Treatment Do Not Participate SSP Core Facility 136 Department of Statistics V. Factorial Treatment Designs  Basic Features  Come in Many (many, many) design forms  Experiment design & “quasi-experiment” or survey “study design” − key to deciding what’s random & what’s fixed − non-mixed (LM and GLM only) software is UNACCEPTABLE for these types of problems  Includes repeated measures (change... growth)  Normal and non-normal data 14 May 2007 SSP Core Facility 137 Department of Statistics Type x Dose Design Dose 1 Type 1 Type 2 Type 2 type 1 Type 2 type 2 Dose 2 Dose 3 or... 14 May 2007 Dose = Professional Development Trt Type = Curriculum SSP Core Facility 138 Department of Statistics Figure 4.1 Possible design layouts for 22 factorial experiment Treatments codes: From SAS for Mixed Models Treatment design: 2 x 2 factorial A1B1 A1B2 A2B1 A2B2 a. Completely Randomized b. Randomized complete block Blk 1 Experiment design: many many variations Blk 2 Blk 3 Blk 4 c. Row-Column (Latin Square) Here are 7 (seven) col1 col2 col3 col4 d. Split-plot 1, whole plot completely randomized row 1 row2 row3 row4 14 May 2007 SSP Core Facility 139 Department of Statistics e. Split-plot 2, whole plot in randomized complete blocks f. Split-block, a.k.a. strip-split-plot Blk 1 Blk 1 Blk 3 Blk 2 Blk 4 Blk 2 Blk 3 Blk 4 g. Split-plot 3. whole plot in rowcolumn (2 Latin squares) col1 14 May 2007 col2 col3 col4 row 1 Row 3 row2 Row 4 Even with 2 x 2 factorial these seven are not all we’re just getting started! SSP Core Facility 140 Department of Statistics Split Block Example Side L R Microchip wafer Position (same meaning both sides) 14 May 2007 SSP Core Facility 141 Department of Statistics Choosing right model – step 1 What is the experimental unit? figure  effect  4.1.a 4.1.b 4.1.c 4.1.d 4.1.e 4.1.f 4.1.g CRD RCB LS block? no yes row col split plot CR no split plot RCB yes splitblock yes split-plot LS row col A eu(A*B) blk*A*B row*col eu(A) blk*A blk*A row*col B eu(A*B) blk*A*B row*col B*eu(A) blk*A*B blk*B row*col*B A*B eu(A*B) blk*A*B row*col B*eu(A) blk*A*B blk*A*B row*col*B 14 May 2007 SSP Core Facility 142 Department of Statistics Common Models in PROC MIXED/GLIMMIX Design CRD (Figure 4.1.a) RCB (Fig 4.1.b) Latin Square (4.1.c) Split-plot CR (4.1.d) Split-plot RCB (4.1.e) Split-block (4.1.f) Split-plot LS (4.1.g) SAS – class, model and random statements class eu a b; model y=a b a*b; class block a b; model y=a b a*b; Random block; or Random intercept / subject=block; class row col a b; model y= a b a*b; Random row col; class eu a b; model y=a b a*b; random eu(a); class block a b; model y=a b a*b; random block block*a; class block a b; model y=a b a*b; random block block*a block*b; class row col a b; model y=a b a*b; random row col row*col; (or, equivalently random row col row*col*a;) MODEL  treatment design RANDOM  experiment (study) design 14 May 2007 SSP Core Facility 143 Department of Statistics Model for split-plot: school-classroom example 1. list factor effects 2. list e.u. for that effect 3. each e.u.  a random model effect e.u. Effect school prof dev trt classroom(school) curriculum classroom(school) p.d  curr Strategy: e.g.  model: yijk  ij  s (t )ik  eijk ij    pi  c j  pcij or alternative expression Eijk  school (trt )ik  eijk note! student is sampling unit (not an e.u.) 14 May 2007 SSP Core Facility 144 Department of Statistics Model for split-plot – Dose x Type example Strategy: e.g. 1. list factor effects 2. list e.u. for that effect 3. each e.u.  a random model effect Effect e.u. dose block  dose type block  dose  type dose  type block  dose  type  model: yijk  ij  block  (b  d )ik  eijk ij    di  t j  dtij or alternative expression Eijk  block  (b  d )ik  eijk note! bloc  type NOT in model (not an e.u.) 14 May 2007 SSP Core Facility 145 Department of Statistics Conventional ANOVA Source EMS bloc dose  S2  t W2  QD w.p. error † bloc  dose   t type  S2  QT dose  type  S2  QDT s.p. error ††  S2 14 May 2007 2 S 2 W SSP Core Facility H a.k.a. between subjects error HH a.k.a. within subjects error 146 Department of Statistics Standard errors of various terms Main effects of dose of type i  i  j  Simple effects type|dosei ij  ij  dose|type j ij  ij j  rt  Var=  2  ( ) rd ( S2  t W2 ) Var= 2 2 S  r Var=  2  ( r Var= 2 ( S2 ) 2 S   W2 ) Note: you can use MS() directly except for dose|typej 14 May 2007 SSP Core Facility 147 Department of Statistics Programming in Proc GLIMMIX proc glimmix; class bloc type dose; model y=type|dose; random intercept dose / subject=bloc; ** i.e. random bloc bloc*dose; lsmeans type*dose / diff lines slicediff=(type dose) slice=(type dose); ods output lsmeans=lsm; run; simple effect with “MRT lines” simple effect all possible mean differences only differences tests only You can use ODS to output LSMEANS and GPLOT for interaction plots, Or use ODS graphics directly 14 May 2007 SSP Core Facility 148 Department of Statistics Type x Dose: Selected Output Covariance Parameter Estimates Estimate Standard Error block 2.0735 2.7320 block 4.5132 2.8291 4.3189 1.5270 Cov Parm Subject Intercept dose Residual Type III Tests of Fixed Effects Num DF Den DF F Value Pr > F type 1 16 2.78 0.1151 dose 3 12 13.63 0.0004 type*dose 3 16 2.29 0.1176 Effect 14 May 2007 SSP Core Facility 149 Department of Statistics Type x Dose LSMeans type*dose Least Squares Means Estimate Standard Error DF t Value Pr > |t| type dose r 1 20.0000 1.4769 20.23 13.54 <.0001 r 2 27.8400 1.4769 20.23 18.85 <.0001 r 4 28.1800 1.4769 20.23 19.08 <.0001 r 8 24.8000 1.4769 20.23 16.79 <.0001 s 1 17.0600 1.4769 20.23 11.55 <.0001 s 2 25.2000 1.4769 20.23 17.06 <.0001 s 4 28.3800 1.4769 20.23 19.22 <.0001 s 8 25.8000 1.4769 20.23 17.47 <.0001 14 May 2007 SSP Core Facility 150 Department of Statistics Type x Dose: “MRT Lines” T Grouping for type*dose Least Squares Means LS-means with the same letter are not significantly different. type dose s 4 Estimate 28.3800 A A r r 4 2 28.1800 27.8400 A A however ... A  A s 8 25.8000 A A s 2 25.2000 A A r 8 24.8000 A r 1 20.0000 B s 1 17.0600 C 14 May 2007 SSP Core Facility 151 Department of Statistics A Factorial Inference Flowchart The Prime Directive: Interactions first!!!!! Interaction? Non-ignorable Negligible Interpret Simple Effects Interpret Main Effects Full Wheelbarrow 14 May 2007 SSP Core Facility 152 Department of Statistics Plots of Differences between Means  LSMEANS allows various plots of mean differences  DIFFPlot: plots interval estimates of mean differences  ANoMPlot: (ANalysis of Means) plots difference between each treatment and the overall mean  ControlPlot: Plots each treatment vs control (e.g. like Dunnett test) 14 May 2007 SSP Core Facility 153 Department of Statistics SAS for Mean Difference Plots  From Type x Dose example ods html; ods graphics on; ods select Anomplot DiffPlot; proc glimmix data=variety_eval; class block type dose; model y=type|dose/ddfm=satterth; random block block*dose; lsmeans dose/plot=DiffPlot; lsmeans dose/plot=AnomPlot; *lsmeans type*dose/plot=DiffPlot; *lsmeans type*dose/plot=AnomPlot; run; ods graphics off; ods html close; run; 14 May 2007 SSP Core Facility 154 Department of Statistics SAS for Mean Difference Plots: DIFFPLOT 14 May 2007 SSP Core Facility 155 Department of Statistics SAS for Mean Difference Plots: ANoMPLOT 14 May 2007 SSP Core Facility 156 Department of Statistics Mean Difference Plots – Control Plots  From SAS for Linear Models – Output 3.17-3.22  Randomized Complete Block  5 Irrigation Treatments: Flood (control), Basin, Spray, Sprinkler, Trickle ods html; ods graphics on; ods select ControlPlot; proc glimmix order=data; class bloc irrig; model fruitwt=irrig; random bloc; lsmeans irrig/diff=control('flood') plot=controlplot adjust=dunnett; ods graphics off; ods html close; 14 May 2007 SSP Core Facility run; run; 157 Department of Statistics Dunnett-style Control Plot 14 May 2007 SSP Core Facility 158 Department of Statistics Back to Type x Dose Data: Interaction Plot 14 May 2007 SSP Core Facility 159 Department of Statistics Type x Dose: Simple Effects SLICE: test only Tests of Effect Slices for type*dose Sliced By dose Tests of Effect Slices for type*dose Sliced By type Num D F Den D F F Value Pr > F r 3 19.49 8.12 0.0010 s 3 19.49 13.58 <.0001 type Num D F Den D F F Value Pr > F 1 1 16 5.00 0.0399 2 1 16 4.03 0.0618 4 1 16 0.02 0.8810 8 1 16 0.58 0.4578 dose Simple Effect Comparisons of type*dose Least Squares Means By dose SLICEDIFF estimates etc 14 May 2007 Simple Effect Level type _type dose 1 r dose 2 Estimate Standard Error DF t Value Pr > |t| s 2.9400 1.3144 16 2.24 0.0399 r s 2.6400 1.3144 16 2.01 0.0618 dose 4 r s -0.2000 1.3144 16 -0.15 0.8810 dose 8 r s -1.0000 1.3144 16 -0.76 0.4578 SSP Core Facility 160 Department of Statistics Type x Dose: Simple Effect Estimates by Type Simple Effect Comparisons of type*dose Least Squares Means By type Simple Effect Level dose _dose type r 1 type r Estimate Standard Error DF t Value Pr > |t| 2 -7.8400 1.8796 19.49 -4.17 0.0005 1 4 -8.1800 1.8796 19.49 -4.35 0.0003 type r 1 8 -4.8000 1.8796 19.49 -2.55 0.0192 type r 2 4 -0.3400 1.8796 19.49 -0.18 0.8583 type r 2 8 3.0400 1.8796 19.49 1.62 0.1219 type r 4 8 3.3800 1.8796 19.49 1.80 0.0876 type s 1 2 -8.1400 1.8796 19.49 -4.33 0.0003 type s 1 4 -11.3200 1.8796 19.49 -6.02 <.0001 type s 1 8 -8.7400 1.8796 19.49 -4.65 0.0002 type s 2 4 -3.1800 1.8796 19.49 -1.69 0.1066 type s 2 8 -0.6000 1.8796 19.49 -0.32 0.7530 type s 4 8 2.5800 1.8796 19.49 1.37 0.1855 14 May 2007 SSP Core Facility 161 Department of Statistics Effect of dose? contrast contrast contrast contrast contrast contrast 'logdose linear' dose -3 -1 1 3; 'logdose quad' dose 1 -1 -1 1; 'logdose cubic' dose -1 3 -3 1; 'type x linear' dose*type -3 -1 1 3 3 1 -1 -3; 'type x quad' dose*type 1 -1 -1 1 -1 1 1 -1; 'type x cubic' dose*type -1 3 -3 1 1 -3 3 -1;  Log(Dose) otherwise..... contrast 'dose linear' dose -11 -7 1 17; contrast 'dose quad' dose 20 -4 -29 13; contrast 'dose cubic' dose -8 14 -7 1; contrast 'type x linear' dose*type -11 -7 1 17 11 7 -1 -17; contrast 'type x quad' dose*type 20 -4 -29 13 -20 4 29 -13; contrast 'type x cubic' dose*type -8 14 -7 1 8 -14 7 -1; 14 May 2007 SSP Core Facility 162 Department of Statistics LogDose contrast results Contrasts Num Den DF DF F Value Pr > F logdose linear 1 12 18.25 0.0011 logdose quad 1 12 22.54 0.0005 logdose cubic 1 12 0.08 0.7780 type x linear 1 16 6.22 0.0240 type x quad 1 16 0.04 0.8515 type x cubic 1 16 0.61 0.4472 Label 14 May 2007 SSP Core Facility 163 Department of Statistics Direct Regression – borrow from ANCOVA proc glimmix data=variety_eval; class block type dose; model y=type logdose(type) ld_sq(type) / noint ddfm=satterth solution; random intercept dose / subject=block; contrast 'equal quad by type?' ld_sq(type) 1 -1; run; Contrasts Solutions for Fixed Effects Effect type Estimate Standard Error DF t Value type r 20.1890 1.4204 19.62 14.21 Label type s 17.0200 1.4204 19.62 11.98 logdose(type) r 9.8890 2.0181 21.45 4.90 logdose(type) s 10.9800 2.0181 21.45 5.44 equal quad by type? ld_sq(type) r -2.8050 0.6447 21.45 -4.35 ld_sq(type) s -2.6800 0.6447 21.45 -4.16 14 May 2007 SSP Core Facility N u m D F Den DF F Value Pr > F 1 17 0.04 0.8497 can re-fit with LD_SQ common to both types 164 Department of Statistics Example 3        From SAS for Mixed Models, Section 4.7 4 “conditions” 3 diets Condition applied in incomplete block design 2 conditions per block Diet applied to cages within condition Condition is whole plot, diet is split-plot 14 May 2007 SSP Core Facility 165 Department of Statistics “Plot plan” diet 1 diet 2 diet 3 diet 2 diet 1 diet 3 diet 2 diet 1 diet 3 diet 1 diet 3 diet 2 14 May 2007 SSP Core Facility 166 Department of Statistics Model?     blocking? yes e.u. with respect to condition “1/2 block” e.u. with repect to diet: “1/3 condition e.u.” e.u. w.r.t. cond x diet: same as diet Model: yijk  ij  blkk  wik  eijk 14 May 2007 SSP Core Facility 167 Department of Statistics SAS Program proc glimmix data=fix2; class cage condition diet / ddfm=kr; model gain=condition diet condition*diet/ddfm=satterth; random intercept condition / subject=cage; run; data & program: file ch4-ex3.sas 14 May 2007 SSP Core Facility 168 Department of Statistics Selected Output Type III Tests of Fixed Effects Covariance Parameter Estimates Estimate Standard Error cage 3.0376 5.0791 cage 0 . 27.8429 8.7672 Cov Parm Subject Intercept condition Residual Num DF Den DF F Value Pr > F condition 3 23.61 2.71 0.0677 diet 2 20.17 0.93 0.4090 condition*diet 6 20.17 1.73 0.1661 Effect how should one deal with negative variance component estimate? • revert to ANOVA via PROC GLM ? • in MIXED, use NOBOUND option ? • in GLIMMIX, use LowerB • alternatively, redefine model • may be CS with plots in block negatively correlated 14 May 2007 SSP Core Facility 169 Department of Statistics Comparison with SAS Proc GLM proc glm data=fix2; class cage condition diet; model gain=cage condition cage*condition diet condition*diet; random cage cage*condition/test; lsmeans condition diet condition*diet; Tests of Hypotheses for Mixed Model Analysis of Variance Source DF Type III SS Mean Square F Value cage 5 198.277778 39.655556 2.73 * condition 3 171.666667 57.222222 3.95 Error 3 43.500000 14.500000 Error: MS(cage*condition) * This test assumes one or more other fixed effects are zero. * Pr > F 0.2185 0.1446 Source cage*condition diet DF 3 2 Type III SS 43.500000 52.055556 Mean Square 14.500000 26.027778 F Value 0.46 0.82 Pr > F 0.7144 0.4561 condition*diet 6 288.388889 48.064815 1.52 0.2333 16 504.888889 31.555556 Error: MS(Error) 14 May 2007 SSP Core Facility 170 Department of Statistics More GLM output non-estimability results from inappropriate definition of estimability (based on fixed & random eff) inescapable consequence of Proc GLM with mixed model 14 May 2007 Least Squares Means condition 1 2 3 4 gain LSMEAN Non-est Non-est Non-est Non-est diet normal restrict suppleme gain LSMEAN 57.9166667 55.5000000 58.1666667 condition 1 1 1 2 2 2 3 3 3 4 4 4 diet normal restrict suppleme normal restrict suppleme normal restrict suppleme normal restrict suppleme SSP Core Facility DON’T use Proc GLM with mixed models! gain LSMEAN Non-est Non-est Non-est Non-est Non-est Non-est Non-est Non-est Non-est Non-est Non-est Non-est 171 Department of Statistics GLM vs MIXED issues  REML default: variance component estimates set to 0 − if BLOCK affected, type I error rate  − if error term affected, power may  − better to allow negative estimates − In MIXED: NOBOUND or METHOD=TYPE3 − In GLIMMIX: LowerB  vs. GLM uses implied MS regardless  GLM: inappropriate NON-EST artifact of incomplete block design  Standard errors for means, many simple effects (including SLICE) incorrect in GLM (no fix!!) 14 May 2007 SSP Core Facility 172 Department of Statistics GLIMMIX Option (1) – Like NOBOUND in MIXED proc glimmix data=fix2; class cage condition diet; model gain=condition|diet/ddfm=kr; Covariance Parameter Estimates random intercept condition / Standard subject=cage; Cov Parm Subject Estimate Error Intercept cage 5.0288 4.7149 condition cage -6.2404 4.8693 Residual 31.5556 11.1566 parms / lowerb=(1e-4,-10,1e-4); run; Type III Tests of Fixed Effects Num DF Den DF F Value Pr > F condition 3 4.718 4.31 0.0798 diet 2 16 0.82 0.4561 condition*diet 6 16 1.52 0.2333 Effect 14 May 2007 SSP Core Facility 173 Department of Statistics GLIMMIX Option (2) – is it really correlation? proc glimmix data=fix2; class cage condition diet; Covariance Parameter Estimates model gain=condition|diet/ddfm=kr; Cov Parm Subject Estimate random intercept / subject=cage; Intercept cage 5.0271 random _residual_ / type=cs CS cage*condition Residual -6.2402 31.5567 =  2  CC  2  0.2466 14 May 2007 run; Type III Tests of Fixed Effects Interblock correlation 2 CC subject=condition*cage; Num DF Den DF F Value Pr > F condition 3 4.717 4.31 0.0798 diet 2 16 0.82 0.4561 condition*diet 6 16 1.52 0.2334 Effect SSP Core Facility 174 Department of Statistics Modeling Change over Time      Regression over time Latent growth / change models Random coefficients over time Repeated measures experiment Longitudinal Data 14 May 2007 SSP Core Facility 175 Department of Statistics From Acock – BMI Data b mi 50 40 30 20 10 1 9 9 7 1 9 9 8 1 9 9 9 2 0 0 0 2 0 0 1 2 0 0 2 2 0 0 3 Note – my sample differs from Acock’s, so the numbers won’t match y e a r f r m 14 May 2007 y e a r f r m y e a r f r m y e a r f r m SSP Core Facility y e a r f r m y e a r f r m 176 Department of Statistics Basic Growth Model  Simplest model involves slope & intercept  In “Stat-speak” yij   0   1  time i  eij obs=intercept  slope  time + error this is just linear regression e1 j , e2 j ,..., etj  may be independent N  0,  2  or may be correlated (more later) 14 May 2007 SSP Core Facility 177 Department of Statistics Basic Growth Model in SAS in PROC GLM Estimate Standard Error t Value Pr > |t| 21.38349324 0.55631931 38.44 <.0001 0.68444085 0.15429522 4.44 <.0001 Parameter proc glm; model bmi=year; run; Intercept year regression equation: yˆ  21.38  0.684  Year Source DF Sum of Squares Mean Square F Value Pr > F Model 1 432.856378 432.856378 19.68 <.0001 Error 229 5037.468822 21.997680 Corrected Total 230 5470.325200 R-Square Coeff Var Root MSE bmi Mean 0.079128 20.01197 4.690168 23.43682 very deceptive – more shortly 14 May 2007 SSP Core Facility 178 Department of Statistics Growth Model in SAS - II in PROC GLIMMIX proc glimmix; class id; model bmi=year/solution; random _residual_ /subject=id; estimate 'y-hat in 1997' intercept 1 year 0 / cl; estimate 'y-hat in 2000' intercept 1 year 3 / cl; estimate 'y-hat in 2003' intercept 1 year 6 / cl; run; selected output next page 14 May 2007 SSP Core Facility 179 Department of Statistics Basic Growth Model – Selected GLIMMIX Output Covariance Parameter Estimates Cov Parm Residual (VC) Estimate Standard Error 21.9977 2.0558 Note: residual VC est = MSE from GLM ANOVA Solutions for Fixed Effects Effect Intercept year Estimate Standard Error DF t Value Pr > |t| 21.3835 0.5563 32 38.44 <.0001 0.6844 0.1543 197 4.44 <.0001 Estimates Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper y-hat in 1997 21.3835 0.5563 197 38.44 <.0001 0.05 20.2864 22.4806 y-hat in 2000 23.4368 0.3086 197 75.95 <.0001 0.05 22.8283 24.0454 y-hat in 2003 25.4901 0.5563 197 45.82 <.0001 0.05 24.3930 26.5872 Label 14 May 2007 SSP Core Facility 180 Department of Statistics G/C Model – Issue I – Account for ID  Recall R2 for Basic Growth Model very low  You must account for variation among subjects (ID) proc glm; class id; model bmi=id year; run; okay proc glimmix; better class id; model bmi=year/solution; random id; /* or random intercept / subject = id 14 May 2007 SSP Core Facility 181 Department of Statistics Selected Output from GLM vs. 0.079 R-Square 0.815282 Covariance Parameter Estimates from GLIMMIX Cov Parm Subject Intercept id Estimate Standard Error 17.2449 4.4950 5.1293 0.5168 Residual vs. 21.998 Solutions for Fixed Effects estimates don’t change std errors do Effect Intercept year 14 May 2007 Estimate Standard Error DF t Value Pr > |t| 21.3835 0.7712 32 27.73 <.0001 0.6844 0.07451 197 9.19 <.0001 SSP Core Facility 182 Department of Statistics Growth Change Modeling Issue - II  Correlated Errors Recall: In Model yij   0   1   year i  eij e1 j , e2 j ,..., etj  may be independent N  0,  2  or may be correlated Correlation Modeled by Covariance Model • Failure to model correlation increases P{type I error} • Over-modeling correlation decreases Power 14 May 2007 SSP Core Facility 183 Department of Statistics Covariance models Indep  =I 2 identical to split-plot 1      1   2   CS  = 1     1  NOTE: CS is reparameterization of Indep AR(1) 14 May 2007 1   1 2   =    2  1 SSP Core Facility 3   2   1 184 Department of Statistics More covariance models Toep ANTE(1) UN 14 May 2007 1 1  1 2   =    2 1 . 3  2   1   1  12  1 2 1  1 3 1  2  1 4 1  2 3    2         2 2 3 2 2 4 2 3  =    32  3 4 3    2  4    12  12  13  14    2    2 23 24  =  2   3  34   2   4   SSP Core Facility 185 Department of Statistics Issues in Repeated Measures       Impact of covariance structure? Selection of appropriate covariance? Bias in std errors, test statistics Degrees of freedom Nonlinear models over time Non-normal errors 14 May 2007 SSP Core Facility 186 Department of Statistics Basic G/C Model with Covariance Model  Also known as Autocorrelation degree of freedom proc glimmix; and class id; std error bias model bmi=year/solution / ddfm=kr; must be dealt with more later random intercept / subject=id; random _residual_ /subject=id type=ar(1); run; Competing Covariance Models compared via Fit Statistics •AICC BIC •HQIC CAIC 14 May 2007 SSP Core Facility 187 Department of Statistics Selected Output for G/C Model w/ Autocorrelation variance, covariance & correlation estimates Covariance Parameter Estimates Estimate Standard Error id 14.8587 4.6202 id 0.5623 0.1144 7.7165 1.8981 Cov Parm Subject Intercept AR(1) Residual Fit Statistics -2 Res Log Likelihood Solutions for Fixed Effects 1111.69 AIC (smaller is better) 1117.69 AICC (smaller is better) 1117.79 BIC (smaller is better) 1122.18 CAIC (smaller is better) 1125.18 HQIC (smaller is better) 1119.20 Generalized Chi-Square 1767.07 Gener. Chi-Square / DF 7.72 Effect Intercept year 14 May 2007 Estimate Standar d Error DF t Value Pr > |t| 21.3238 0.8042 32 26.52 <.0001 0.6896 0.1102 197 6.26 <.0001 estimate – slight effect std error – bigger effect used to assess cov model SSP Core Facility 188 Department of Statistics      random coeff correl errors prediction add Gender add emotional prob 14 May 2007 SSP Core Facility 189 Department of Statistics     Repeated Measure Experiments a.k.a. Longitudinal Data Assign e.u. to treatments May use any design (completely random, blocked, row-column, split-plot ....) Observations at planned times Objectives 1. assess changes in response over time 2. assess treatment effect on (1) 14 May 2007 SSP Core Facility 190 Department of Statistics Typical repeated Measures Data from SAS for Linear Models, Chapter 8 SAS for Mixed Models, 2nd ed, Chapter 5 14 May 2007 SSP Core Facility 191 Department of Statistics From BMI Data: Are G/C Curves Equal by Gender? interaction plot of G/C curve by gender 14 May 2007 SSP Core Facility 192 Department of Statistics FYI – SAS Code to Get Interaction Plot ods html; ods graphics on; ods select MeanPlot; proc glimmix data=bmi_uni_anc; class gender id year; model bmi=gender|year / solution ddfm=kr; random intercept / subject=id(gender); random _residual_ / type=ar(1) subject=id(gender); lsmeans gender*year / plot=MeanPlot (sliceby=gender join cl); run; ods graphics off; ods html close; run; 14 May 2007 SSP Core Facility 193 Department of Statistics Model Model: yijk  ij  id ( gender )ik  eijk where ij  genderi  yearj mean can express as: ij    gi  yrj   g  yr ij id ( gender )ik is between subjects error NI (0,  B2 ) like whole-plot error eijk is within subjects error, like split-plot error, except... Let eik   ei1k ei 2 k ... eiTk  eik MVN (0, ) translates to: proc glimmix data=bmi_uni_anc; class gender id year; model bmi=gender|year / solution ddfm=kr; random intercept / subject=id(gender); random _residual_ / type=ar(1) subject=id(gender); 14 May 2007 SSP Core Facility 194 Department of Statistics Back to SAS for Mixed Models Example Model: yijk  ij  s (trt )ik  eijk where ij  trti  time j mean s (trt )ik is between subjects error NI (0,  B2 ) like whole-plot error eijk is within subjects error, like split-plot error, except... Let eik   ei1k ei 2 k ... eiTk  eik MVN (0, ) Hence Var ( y ik )  Vik  Z S Z S  B2  ; typically J T  B2      V  Var y  I AK  Vik A  # trt's, K =#subj/trt 14 May 2007 SSP Core Facility 195 Department of Statistics Middle Ground between MANOVA and Split-Plot in Time via Proc GLIMMIX PROC GLIMMIX; CLASSES SUBJ TRT TIME; MODEL Y= TRT TIME TRT*TIME; RANDOM INTERCEPT / SUBJECT=SUBJ(TRT); RANDOM TIME / TYPE=AR(1) SUBJECT=SUBJ(TRT) RESIDUAL; *LSMEANS TRT TIME TRT*TIME; TITLE 'MIXED - AR(1) ERRORS'; RUN; RANDOM specifies between subjects effects (G-side) RANDOM...RESIDUAL specifies within subjects effect (R-side) in many models, G- and R-side effects are not identifiable 14 May 2007 SSP Core Facility 196 Department of Statistics Modeling Covariance among Repeated Measures PROC MIXED DATA=univ; CLASSES SUBJ TRT TIME; MODEL Y= TRT TIME TRT*TIME; REPEATED TIME / TYPE=UN SSCP SUBJECT=SUBJ(TRT); ODS OUTPUT CovParms=cp; run; data times; Computes covariance between do time1=1 to 8; pairs of measurements do time2=1 to time1; dist=time1-time2; (same subject, different times) output; based on Sum of squares & end; end; cross-products matrix data covplot; merge times cp; then plots them by distance proc gplot data=covplot; plot adjcorr*dist=time1; 14 May 2007 SSP Core Facility 197 Department of Statistics Plot of Covariance by Distance 14 May 2007 SSP Core Facility 198 Department of Statistics Idealized Plots CS=Subj(Trt), AR(1), AR(1)+Subj(Trt) AR(1) + Subj(Trt) CS = random Subj(Trt) AR(1) only 14 May 2007 SSP Core Facility 199 Department of Statistics Model Fitting Criteria in Version 8 1. Compound Symmetry proc glimmix; classes subj trt time; model y= trt time trt*time; random time / residual type=cs subject=subj(trt); title 'mixed - compound symmetry'; Fit Statistics 14 May 2007 -2 Res Log Likelihood 839.39 AIC (smaller is better) 843.39 AICC (smaller is better) 843.47 BIC (smaller is better) 845.75 CAIC (smaller is better) 847.75 HQIC (smaller is better) 844.02 Generalized Chi-Square 767.61 Gener. Chi-Square / DF 4.80 SSP Core Facility 200 Department of Statistics Comparison of Models Smaller is Better Compound Symmetry Neg2LogLike 839.4 Parms 2 AIC 843.4 AICC 843.5 HQIC 844.0 BIC 845.7 CAIC 847.7 AR(1) + Subj(TRT) random effect Neg2LogLike 788.7 Parms 3 AIC 794.7 AICC 794.8 HQIC 795.6 BIC 798.2 CAIC 801.2 Parms 36 AIC 832.5 AICC 854.1 HQIC 843.7 BIC 874.9 CAIC 910.9 Unstructured Neg2LogLike 760.5 ANTE(1) TOEP 14 May 2007 Neg2LogLike 780.7 Parms 15 AIC 810.7 AICC 814.0 HQIC 815.3 BIC 828.3 CAIC 843.3 Neg2LogLike 784.9 Parms AIC 800.9 AICC 801.9 HQIC 803.4 BIC 810.4 CAIC 818.4 8 SSP Core Facility 201 Department of Statistics How do Model Fitting Criteria Compare?  Guerin & Stroup (2000) compared AIC, BIC, HQIC, CAIC for simulated AR(1) and ARH(1) data  CAIC tends to select simpler models  AIC tends to select most complex models *  complex -- AIC > HQIC > BIC > CAIC -- simple  Model too simple (correlation model not adequate)  Type I error rate too high  Model too complex (correlation over-modeled)  Type I error control not affected, but power suffers  *Since 2000, SAS added AICC to address AIC issue  Best choice depends on severity of Type I vs II error 14 May 2007 SSP Core Facility 202 Department of Statistics An Inference Issue CS: Type 3 Tests of Fixed Effects Num Effect DF TRT 3 TIME 7 TRT*TIME 21 Den DF 20 140 140 F Value 0.74 109.04 1.98 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value TRT 3 20 0.75 TIME 7 140 60.55 TRT*TIME 21 140 1.48 Pr > F 0.5425 <.0001 0.0106 AR(1)+between subj: UN: Type 3 Tests of Fixed Effects Num Effect DF TRT 3 TIME 7 TRT*TIME 21 Den DF 20 20 20 F Value 0.74 101.31 1.37 Pr > F 0.5344 <.0001 0.0921 Pr > F 0.5425 <.0001 0.2450 UN similar to MANOVA but MANOVA Trt*Time p-value was 0.50 14 May 2007 SSP Core Facility 203 Department of Statistics Bias & Options for Adjusting  SAS Default uses estimated (co)variance components in V std errors biased , t-, F-statistics biased   “Robust” (a.k.a. “sandwich) estimate of K’V-1K available using EMPIRICAL option in MIXED  Kenward & Roger (Biometrics, 1997) proposed adjustment; available using DDFM=KR option in MODEL statement of MIXED  Guerin & Stroup (2000) evaluated KR option of SAS Version 8 with simulated AR(1) and ARH(1) data  Biased F resulted in inflated Type I error rates unless KR option used (for α=0.05, rejection rates >0.10 for TYPE=AR(1), up to 0.20 with TYPE=ANTE(1), UN 14 May 2007 SSP Core Facility 204 Department of Statistics Sandwich (“Robust”) Estimator ÔLS   X X  X y  OLS estimate of  :     Var ÔLS   X X  X  Var ( y )  X  X X    X X  X VX  X X   GLS estimate is:    Var ˆGLS  X Vˆ 1 Let Vˆ0    X Vˆ X  X VX  X Vˆ X  ˆ GLS  X Vˆ 1 X  1  1 y  V based on residuals eˆ  y  X ˆ ˆ ˆ ˆ 1 Yields Vˆ0  Vˆ 1 eeV    "Sandwich" estimator: Var ˆGLS  X Vˆ 1 X 14 May 2007 SSP Core Facility    ˆ ˆ ˆ 1 X X Vˆ 1 X X Vˆ 1 eeV   205 Department of Statistics How does the sandwich estimator perform? proc mixed empirical; classes subj trt time; model y=trt time trt*time; random intercept/ subject=subj(trt); random time / type=ar(1) subject=subj(trt) residual; run; Type 3 Tests of Fixed Effects Effect TRT TIME TRT*TIME Num DF Den DF F Value Pr > F 3 7 20 20 140 140 1.31 121.57 9.04 0.2981 <.0001 <.0001 vs. F=1.48; p=0.0921 using default 14 May 2007 SSP Core Facility 206 Department of Statistics Kenward and Roger proc glimmix; classes subj trt time; model y= trt time trt*time/ddfm=kr; random intercept / subject=subj(trt); random time / type=ar(1) subject=subj(trt) residual; Type 3 Tests of Fixed Effects Effect TRT TIME TRT*TIME 14 May 2007 Num DF Den DF F Value Pr > F 3 7 21 20.5 109 117 0.77 50.90 1.24 0.5219 <.0001 0.2330 SSP Core Facility 207 Department of Statistics Alternative KR adjustment • in SAS, KR adjustment uses Hessian matrix by default • you can cause it to use the Information matrix instead • no documented advantage one way or another PROC glimmix scoremod scoring=51; CLASSES SUBJ TRT TIME; MODEL Y= TRT TIME TRT*TIME/ddfm=kr; RANDOM intercept / subject=SUBJ(TRT); Random _resid_ / TYPE=AR(1) SUBJECT=SUBJ(TRT); nloptions technique=nrridg; Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F TRT TIME TRT*TIME 3 7 21 20.5 112 119 0.77 54.18 1.28 0.5264 <.0001 0.2010 vs. F=1.24, p=0.2330 using Hessian 14 May 2007 SSP Core Facility 208 Department of Statistics Alternative Model for Change in BMI by Gender Level 1: ytj   0 j   1 j  yrt  etj Level 2:  0 j   0  genderi  id ( gender )ij  1 j  1  g1i  Repeated Measures ANCOVA Model yijk   0  genderi  id ( gender )ij   1  g1i   yrt  eijk   0i  1i  id ( gender )ij  eijk proc glimmix data=bmi_uni_anc; class gender id year; model bmi=gender yr(gender) / noint solution ddfm=kr; random intercept / subject=id(gender); random _residual_ / type=ar(1) subject=id(gender); contrast 'male vs female intercept' gender 1 -1; contrast 'male vs female slope' yr(gender) 1 -1; run; 14 May 2007 SSP Core Facility 209 Department of Statistics Selected Output Contrasts Num DF Den DF F Value Pr > F male vs female intercept 1 165.9 3.89 0.0501 male vs female slope 1 204.5 1.57 0.2111 Covariance Parameter Estimates Cov Parm Subject Intercept id(gender) 15.1933 AR(1) id(gender) 0.2928 Residual Estimate 7.8871 Label Solutions for Fixed Effects Estimate Standard Error DF t Value Pr > |t| Effect gender gender 0 20.1988 0.6084 165.9 33.20 <.0001 gender 1 21.8298 0.5596 165.9 39.01 <.0001 yr(gender) 0 0.7860 0.08207 204.5 9.58 <.0001 yr(gender) 1 0.6462 0.07549 204.5 8.56 <.0001 14 May 2007 SSP Core Facility 210 Department of Statistics Alternative Model proc glimmix data=bmi_uni; class gender id; model bmi=gender year(gender) / noint solution ddfm=kr; random intercept year(gender) / subject=id type=un; contrast 'male vs female intercept' gender 1 -1; contrast 'male vs female slope' year(gender) 1 -1; run; This is a random coefficient model Next section 14 May 2007 SSP Core Facility 211 Department of Statistics Response Surface Split Plot with Repeated Measures  4 treatment factors (A, B, C, D) − 2 levels each      3 factors (A, B, C) applied to P( subject) treatment design: central composite design subjects split into 2 sub-units level of D randomly assigned to each sub-unit observations at 3 planned times (H) 14 May 2007 SSP Core Facility 212 Department of Statistics Central Composite Design 14 May 2007 SSP Core Facility 213 Department of Statistics Model for Central Composite Split-Split Plot Effect e.u. A, B, C main effects & interactions P(A B C) D D  P(A B C) D  (A, B, C) D  P(A B C) H and all interactions H  D  P(A B C) involving H  yhijklm  f ( X Ai , X Bj , X Ck )  dl  f l ( X Ai , X Bj , X Ck )  hm  dhlm  f m ( X Ai , X Bj , X Ck )  p (abc) hijk  dp (abc) hijkl  ehijklm 14 May 2007 SSP Core Facility 214 Department of Statistics SAS Statements proc glimmix; class ca cb cc p d u; *model y=a b c a*a b*b c*c a*b a*c b*c d d*a d*b d*c t t*t t*a t*b t*c t*d/htype=1 htype=3 ddfm=kr; model y=d a(d) b(d) c(d) a*a b*b c*c a*b a*c b*c t(d) t*t t*a t*b t*c /noint solution htype=1 ddfm=kr; random p(ca cb cc) d*p(ca cb cc); 14 May 2007 SSP Core Facility 215 Department of Statistics Solutions for Fixed Effects Key output Covariance Parameter Estimates d Estimate Standard Error d 0 53.5687 2.3344 d 1 31.7168 2.3344 a(d) 0 16.8226 1.8101 a(d) 1 11.2226 1.8101 b(d) 0 19.5049 1.8101 b(d) 1 12.3715 1.8101 c(d) 0 4.4019 1.8101 1 3.5352 1.8101 Cov Parm Subject Intercept p(ca*cb*cc) 24.3200 d p(ca*cb*cc) 4.5151 c(d) 11.4944 a*a 0.4980 3.2427 b*b -2.5020 3.2427 c*c 5.1647 3.2427 a*b 6.2083 1.8872 a*c -2.8333 1.8872 b*c 1.2083 1.8872 Residual Estimate Effect Fit Statistics AICC (smaller is better) 14 May 2007 573.40 SSP Core Facility t(d) 0 9.4200 0.5504 t(d) 1 0.02442 0.5504 t*t -0.1487 1.1114 a*t 0.1160 0.5078 b*t 1.7331 0.5078 c*t 0.3513 0.5078 216 Department of Statistics Complex Split-split-plot revisited       Recall A, B, C applied to units P P split in two, levels of D to each half Measured a 3 times Previous analysis assumed split on time Actually repeated measures Split-plot + repeated measures 14 May 2007 SSP Core Facility 217 Department of Statistics CCD Split-plot + repeated measures proc glimmix data=CCD_SpltPlt; class ca cb cc p d u; *model y=a b c a*a b*b c*c a*b a*c b*c d d*a d*b d*c t t*t t*a t*b t*c t*d/htype=1 htype=3 ddfm=kr; model y=d a(d) b(d) c(d) a*a b*b c*c a*b a*c b*c t(d) t*t t*a t*b t*c / noint solution htype=1 ddfm=kr; random intercept / subject=p(ca cb cc); random _residual_ / type=sp(pow)(t) subject=d*p(ca cb cc); run; AICC: 573.4 as split-split-plot 551.1 as repeated measures using SP(POW) note SP(POW) is generalization of AR(1) for unequally spaced times 14 May 2007 SSP Core Facility 218 Department of Statistics Unreplicated Split-Plot  SAS for Mixed Models, Section 16.7  Quilt divided in half  Each “half sheet” received 2 x 2 x 3 factorial − 2 pH levels (low high) − 2 temp (cold hot) − 3 dry cycles (air machine-delicate machine-normal  Material cut from each unit − washed 10, 20, 30, 40, 50 times  Breaking strength monitored  Materials observed so reps by sheet lost 14 May 2007 SSP Core Facility 219 Department of Statistics Model for Breaking Strength Experiment yijklm  ijkl  rm  wijkm  eijklm where ijkl is the mean of the ijkth pH  water temperature  dry cycle (i=8,10; j=35,55; k=air, delicate, normal) at the lth time of washing (l=10.20.30.40.50), rm is the effect of the mth block (m=1,2 in the design, but m=1 only in the data) wijkm is the ijkmth between subjects (or whole-plot) error effect, assumed NID(0,  W2 ) eijklm is the within subjects (or split-plot) error effect, 2 assumed NID(0,  ) 14 May 2007 SSP Core Facility 220 Department of Statistics ANOVA for Breaking Strength Experiment Source of Variation block 1 pH (P) 1 wash temp (T) 1 dry cycle (D) 2 PT 1 PD 2 TD 2 PTD 2 between subject error 11 no. of washes (W) 4 WP 4 WT 4 WD 8 WPT 4 WPD 8 WTD 8 WPTD 8 within subjects error 14 May 2007 d.f. 48 SSP Core Facility but these become 0 when blocking by “half quilt” distinction lost 221 Department of Statistics Breaking Strength vs # Washes by pH 14 May 2007 SSP Core Facility 222 Department of Statistics Breaking Strength vs # Washes by Temp 14 May 2007 SSP Core Facility 223 Department of Statistics Breaking Strength vs # Washes by Dry Cycle 14 May 2007 SSP Core Facility 224 Department of Statistics Revised ANOVA Pool negligible effects to get between & within error Source of Variation 14 May 2007 d.f. pH (P) 1 wash temp (T) 1 dry cycle (D) 2 between subject error 7 linear effect of no. of washes (W Lin) 1 W LinP 1 W LinT 1 W LinD 2 within subjects error 43 SSP Core Facility 225 Department of Statistics GLIMMIX Program for Breaking Strength Experiment proc glimmix data=shellie; class pH water_temp dry_cycle; model breaking_strength=pH water_temp dry_cycle w w*pH w*water_temp w*dry_cycle / solution; random pH*water_temp*dry_cycle; contrast 'air vs dryer effect on wear' w*dry_cycle 2 -1 -1; contrast 'delicate v normal effect on wear' w*dry_cycle 0 1 -1; run; 14 May 2007 SSP Core Facility 226 Department of Statistics Revised GLIMMIX - Estimate Regression over # of Washes proc glimmix data=shellie; class pH water_temp dry_cycle; model breaking_strength= w(pH) w(water_temp) w(dry_cycle)/noint solution; random pH*water_temp*dry_cycle; estimate 'slope: ph 8, cold, air‘ w(ph) 1 0 w(water_temp) 1 0 w(dry_cycle) 1 0 0; estimate 'slope: ph 8, cold, delicate' w(ph) 1 0 w(water_temp) 1 0 w(dry_cycle) 0 1 0; estimate 'slope: ph 8, cold, normal' w(ph) 1 0 w(water_temp) 1 0 w(dry_cycle) 0 0 1; estimate 'slope: ph 8, hot, air‘ w(ph) 1 0 w(water_temp) 0 1 w(dry_cycle) 1 0 0; estimate 'slope: ph 8, hot, delicate' w(ph) 1 0 w(water_temp) 0 1 w(dry_cycle) 0 1 0; etc for all pH – temp – dry cycle combinations 14 May 2007 SSP Core Facility 227 Department of Statistics Regression – Selected Output Label Estimate Standard Error slope: ph 8, cold, air -0.00024 0.000077 slope: ph 8, cold, delicate -0.00047 0.000077 Estimate Standard Error slope: ph 8, cold, normal -0.00050 0.000077 0.1070 0.001895 slope: ph 8, hot, air -0.00050 0.000077 slope: ph 8, hot, delicate -0.00073 0.000077 slope: ph 8, hot, normal -0.00076 0.000077 slope: ph 10, cold, air -0.00082 0.000077 slope: ph 10, cold, delicate -0.00105 0.000077 slope: ph 10, cold, normal -0.00108 0.000077 slope: ph 10, hot, air -0.00108 0.000077 slope: ph 10, hot, delicate -0.00131 0.000077 slope: ph 10, hot, normal -0.00134 0.000077 avg slope: ph 8 -0.00053 0.000054 avg slope: ph 10 -0.00111 0.000054 avg slope: cold water -0.00069 0.000054 avg slope: hot water -0.00095 0.000054 avg slope: air dry -0.00066 0.000063 avg slope: delicate dry -0.00089 0.000063 avg slope: normal dry -0.00092 0.000063 Solution for Fixed Effects Effect water temp Intercept 14 May 2007 Dry cycle p H SSP Core Facility 228 Department of Statistics Prediction & Inference Space 14 May 2007 SSP Core Facility 229 Department of Statistics VI. Prediction, “BLUP” and Inference Space  Estimation vs. Prediction  When “BLUP” is a good thing  Inference Space − what is it? − how can we use it?  Performance evaluation issues  Multi-location issues 14 May 2007 SSP Core Facility 230 Department of Statistics Estimation, Prediction, and Inference Space  Estimation based on estimable functions K   Estimation applies to fixed effects only, inference is to entire population  Prediction based on “predictable functions” K   M u  Prediction applies to fixed & random effects, narrows scope of inference to specific subset defined by M’u  Examples: locations, workers, teachers, patients... 14 May 2007 SSP Core Facility 231 Department of Statistics Prediction Example 1 Growth Change Modeling Issue - III  Random Coefficients  Recall Basic Growth Model yij   0  1   year i  eij Level 2:  0   0  b0  1  1  b1  0    02  01    b0   b  ~ MVN  0  ,  2   1       1 proc glimmix data=bmi_uni; class id; model bmi=year/solution ddfm=kr; random intercept year / subject=id type=un solution; random _residual_ /subject=id type=ar(1); 14 May 2007 SSP Core Facility 232 Department of Statistics Selected Output Covariance Parameter Estimates Estimate Solutions for Fixed Effects id 10.8070 Estimate Standard Error t Value UN(2,1) id 0.5873 21.3577 0.6480 32.96 UN(2,2) id 0.2676 0.6870 0.1212 5.67 AR(1) id 0.3024 Cov Parm Subject UN(1,1) Residual Intercept year 4.6021 partial listing 14 May 2007 Effect Solution for Random Effects Estimate Std Err Pred DF Effect Subject Intercept id 73 2.1023 1.3487 165 year id 73 -0.1608 0.3118 165 Intercept id 281 -1.3178 1.3487 165 year id 281 -0.1353 0.3118 165 Intercept id 496 -1.8137 1.3487 165 year id 496 -0.07237 0.3118 165 SSP Core Facility 233 Department of Statistics You can obtain Subject-Specific Estimates proc glimmix data=bmi_uni; class id; model bmi=year/solution ddfm=kr; random intercept year / subject=id type=un solution; random _residual_ /subject=id type=ar(1); estimate 'popn avg slope' year 1 / cl; estimate 'id (73) specific slope' year 1 | year 1 / subject 1 0 cl e; estimate 'id (496) specific slope' year 1 | year 1 / subject 0 0 1 0 cl; estimate 'popn avg intercept' intercept 1 / cl; estimate 'predicted bmi in 1997' intercept 1 year 0 / cl; estimate 'id (73) specific intercept' intercept 1 | intercept 1 / subject 1 0 cl e; estimate 'id (496) specific intercept' intercept 1 | intercept 1 / subject 0 0 1 0 cl; estimate 'predicted bmi in 2000' intercept 1 year 3 / cl; estimate 'id (73) specific 2000 bmi' intercept 1 year 3 | intercept 1 year 3/ subject 1 0 cl; estimate 'id (496) specific 2000 bmi' intercept 1 year 3 | intercept 1 year 3/ subject 0 0 1 0 cl; estimate 'predicted bmi in 2003' intercept 1 year 6 / cl; estimate 'id (73) specific 2003 bmi' intercept 1 year 6 | intercept 1 year 6/ subject 1 0 cl; estimate 'id (496) specific 2003 bmi' intercept 1 year 6 | intercept 1 year 6/ subject 0 0 1 0 cl; run; 14 May 2007 SSP Core Facility 234 Department of Statistics Best Linear Unbiased Prediction  Look closer at Estimate statement estimate 'popn avg slope' year 1 / cl; estimate 'id (73) specific slope' year 1 | year 1 / subject 1 0 cl e; estimate 'id (496) specific slope' year 1 | year 1 / subject 0 0 1 0 cl; estimate 'predicted bmi in 2000' intercept 1 year 3 / cl; estimate 'id (73) specific 2000 bmi' intercept 1 year 3 | intercept 1 year 3/ subject 1 0 cl; estimate 'id (496) specific 2000 bmi' intercept 1 year 3 | intercept 1 year 3/ subject 0 0 1 0 cl; Coefficients to right of vertical bar ( | ) apply to random effects – this is a new idea BLUP - - - estimation (prediction) of random effects 14 May 2007 SSP Core Facility 235 Department of Statistics Selected Estimates from Random Coeff BMI Model Estimates Estimate Standard Error DF Lower Upper popn avg slope 0.6870 0.1214 31.57 0.4396 0.9344 id (73) specific slope 0.5262 0.3833 18.35 -0.2779 1.3303 id (496) specific slope 0.6146 0.3833 18.35 -0.1895 1.4187 popn avg intercept 21.3577 0.6459 31.5 20.0413 22.6742 predicted bmi in 1997 21.3577 0.6459 31.5 20.0413 22.6742 id (73) specific intercept 23.4601 1.4916 33.36 20.4266 26.4935 id (496) specific intercept 19.5440 1.4916 33.36 16.5105 22.5775 predicted bmi in 2000 23.4186 0.7330 31.99 21.9255 24.9117 id (73) specific 2000 bmi 25.0387 0.9928 9.56 22.8127 27.2646 id (496) specific 2000 bmi 21.3878 0.9928 9.56 19.1618 23.6138 predicted bmi in 2003 25.4795 0.9605 31.84 23.5226 27.4365 id (73) specific 2003 bmi 26.6173 1.5462 20.15 23.3936 29.8410 id (496) specific 2003 bmi 23.2316 1.5462 20.15 20.0079 26.4553 Label 14 May 2007 SSP Core Facility 236 Department of Statistics Inference Space Example II:  Workers and machines  From McLean, Sanders & Stroup (1991, American Statistician)  Also Chapter 6, ex 2, SAS for Mixed Models  2 machines  3 operators (sample from population)  inference can apply to population of workers or specific worker  KEY CONCEPT: Inference Space 14 May 2007 SSP Core Facility 237 Department of Statistics Worker-Machine Example: Fixed Effect Inference proc glimmix; class machine operator; model y=machine/ddfm=kr; random operator machine*operator; lsmeans machine / diff; estimate 'BLUE - machine 1' intercept 1 machine 1 0; estimate 'BLUE - diff' machine 1 -1; Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F 1 2 20.26 0.0460 machine based on MS(mach) / MS(Mach*oper) machine Least Squares Means machine these ESTIMATE statements give same result 14 May 2007 Estimate Std Error DF t Value Pr > |t| 1 50.9483 0.2467 2.973 206.50 <.0001 2 51.9567 0.2467 2.973 210.59 <.0001 Differences of machine Least Squares Means machine _machine 1 2 Estimate Std Error DF t Value Pr > |t| -1.0083 0.2240 2 -4.50 0.0460 SSP Core Facility 238 Department of Statistics Worker-Machine Example: Prediction these statements apply inference to specific workers or worker-machine • machine 1 averaged over ONLY THE WORKERS IN THE STUDY • diff between machines for workers in study ONLY •operator 1 averaged over machines, with machine 1 only, oper-specific difference between machines estimate 'BLUP - m1 narrow' intercept 3 machine 3 0 | operator 1 1 1 machine*operator 1 1 1 0 0 0/divisor=3; estimate 'BLUP - diff nrw' machine 3 -3 | machine*operator 1 1 1 -1 -1 1/divisor=3; estimate 'BLUP - oper 1' intercept 2 machine 1 1 | operator 2 0 0 machine*operator 1 0 0 1 0 0/divisor=2; estimate 'BLUP - m1 op1' intercept 1 machine 1 0 | operator 1 0 0 machine*operator 1 0 0 0 0 0; estimate 'BLUP - diff op1' machine 1 -1 | machine*operator 1 0 0 -1 0 0; 14 May 2007 SSP Core Facility 239 Department of Statistics Worker-Machine Example: Prediction (2) Estimates Estimate Standard Error DF t Value Pr > |t| BLUE - machine 1 50.9483 0.2467 2.973 206.50 <.0001 BLUE - diff -1.0083 0.2240 2 -4.50 0.0460 BLUP - m1 narrow 50.9483 0.08993 6 566.53 <.0001 BLUP - diff nrw -1.0083 0.1272 6 -7.93 0.0002 BLUP - oper 1 51.7366 0.1151 6.698 449.30 <.0001 BLUP - m1 op1 51.2979 0.1724 7.885 297.48 <.0001 BLUP - diff op1 -0.8773 0.2567 7.976 -3.42 0.0092 Label BLUE – inference to population of workers BLUP – inference to specific worker or set of workers note impact of standard error 14 May 2007 SSP Core Facility 240 Department of Statistics BLUP a.k.a. “Shrinkage Estimator” Covariance Parameter Estimates Cov Parm operator Estimate 0.1073 machine*operator 0.05100 Residual 0.04852 e.g. operator BLUP is  BLUP is regressed toward mean  BLUP is E(u|Y)  Degree of skrinkage depends of variance component estimates E (oi )  Cov(oi , y j  ) Var ( y j  )  14 May 2007 1 y  j SSP Core Facility  y  241 Department of Statistics Relationship to Proc GLM proc glm; operator y LSMEAN class machine operator; 1 51.7625000 model y=machine|operator; random operator machine*operator/test; vs. 51.74, 0.1151 lsmeans machine operator machine*operator/stderr; machine operator y LSMEAN lsmeans machine/stderr 1 1 51.3550000 e=machine*operator; estimate 'diff' machine 1 -1/e; vs 51.30, 0.1724 run; machine y LSMEAN Standard Error 1 50.9483333 0.1583947 SSP Core Facility 0.1101420 Standard Error 0.1557642 machine y LSMEAN Standard Error 1 50.9483333 0.0899305 std error neither Mixed broad or narrow produced by estimate “m1” intercept 3 machine 3 0 | operator 1 1 1 machine*operator 0 / divisor=3 14 May 2007 Standard Error same as BLUP specific to workers in GLIMMIX 242 Department of Statistics Prediction Example II: Multi-Location Data     From SAS for Mixed Models, 9 Locations 3 blocks per location 4 treatments Major issues  are blocks fixed or random?  if random how does one estimate location-specific treatment effects? 14 May 2007 SSP Core Facility 243 Department of Statistics ANOVA (ignoring block) Source d.f. Expected Mean Square Treatment 3 2  2  k1 LT  QTRT Location 8 2  2  k1 LT  k2 L2 Loc  Trt 24 2  2  k1 LT error dfe 2 Test of TRT affected If Location fixed: 14 May 2007 Source d.f. Expected Mean Square Treatment 3  2  QTRT Location 8  2  QLOC Loc  Trt 24  2  QLT error dfe 2 SSP Core Facility 244 Department of Statistics Inference Space Assuming Locations are Fixed Var(trt mean)= 2 # obs/trt  Std. error(trt mean)= MS(error) # obs/trt HOWEVER... if Locations are Random Var(trt mean)= 2  2  k ( L2   LT ) # obs/trt  Std. error(trt mean)= 14 May 2007 SSP Core Facility 2 ˆ 2  k (ˆ L2  ˆ LT ) # obs/trt 245 Department of Statistics Where does Uncertainty Arise? Loc 1 Loc 2 Only from variation among obs within locations? Locations fixed Or does variation among locations also contribute? Locations random Loc 7 14 May 2007 Loc 8 SSP Core Facility 246 Department of Statistics Location-Specific Effects: BLUP In Multi-Location trial, location-specific effect is e.g. trt 1 vs trt 2 | location j =  1   2   L 1 j    L 2 j    Implies linear combination of fixed and random effect (predictable function = BLUP) 14 May 2007 SSP Core Facility 247 Department of Statistics Basic SAS Programs for fixed location: proc glimmix data=MultiCenter; class location block treatment; model response=location treatment location*treatment; random block(location); lsmeans treatment; lsmeans location*treatment/slice=location slicediff=location; run; for random locations proc glimmix data=MultiCenter; class location block treatment; model response=treatment/ddfm=KR; random location block(location) location*treatment; lsmeans treatment/diff; estimate 'trt1 vs trt2' treatment 1 -1 0; estimate 'loc A vs loc B' | location 1 -1 0; estimate 'trt 1 BLUP' intercept 8 treatment 8 | location 1 1 1 1 1 1 1 1/divisor=8; estimate 'trt1 at loc A blup' intercept 1 treatment 1 0 0 0 | location 1 0 location*treatment 1 0; etc – see ch6 MultiCenter.sas for program in detail 14 May 2007 SSP Core Facility 248 Department of Statistics “Take Home” points  Inference space usually implies random locations  “Broad” inference on treatments applies to entire population  Location-specific inference may be of interest  Requires BLUP  Hans Peter Piepho has proposed mixed-model based measures of commonality among locations  Making locations fixed to maximize error d.f. to test TRT is inappropriate 14 May 2007 SSP Core Facility 249 Department of Statistics GLM Issues 14 May 2007 SSP Core Facility 250 Department of Statistics VII. “GLM” Issues  Bernoulli data − as a binomial − special problems with BINARY data  Counts  Rates 14 May 2007 SSP Core Facility 251 Department of Statistics Common Non-Normal Models  Bernoulli (binary) observations  Categorical data − Binomial − multinomial  Counts Contingency tables − Poisson − Over dispersed (e.g. negative binomial)  Rates  Survival times − Gamma, Weibull  Dispersion measures − variance 14 May 2007 SSP Core Facility 252 Department of Statistics Elements of GLM (Generalized Linear Model)  Systematic model X  Assumed distribution − implied variance structure  Link function  Examples p = (X) or logit(p)=X □ Y~ Poisson() log () = X □ y ~ Bernoulli(p) 14 May 2007 SSP Core Facility 253 Department of Statistics GLM Example  From SAS for Linear Models  Output 10.1, reexpressed in 10.5  Challenger space shuttle data  relate prob{failure} to temperature at launch  DATA: TEMP, TD (# times thermal distress in O-ring, NO_TD 14 May 2007 SSP Core Facility 254 Department of Statistics Approach to modeling  Assess relationship between TEMP and Prob{TD=1}, i.e O-rings show thermal distress Distribution: Bernoulli  Natural parameter: logit = log[p/(1-p)]  Model: logit(Pr{TD})=a+b(Temp)  Inverse link form: Pr{TD}=exp[a+b(Temp)]/{1+exp[a+b(Temp)]} 14 May 2007 SSP Core Facility 255 Department of Statistics SAS Program: Proc GENMOD proc glimmix data=Challenger; model td/total=temp; estimate 'logit at 50 deg' intercept 1 estimate 'logit at 60 deg' intercept 1 estimate 'logit at 64.7 deg' intercept estimate 'logit at 64.8 deg' intercept estimate 'logit at 70 deg' intercept 1 estimate 'logit at 80 deg' intercept 1 run; 14 May 2007 SSP Core Facility temp 50 / ilink; temp 60 / ilink; 1 temp 64.7 / ilink; 1 temp 64.8 / ilink; temp 70 / ilink; temp 80 / ilink; 256 Department of Statistics Relevant Output Fit Statistics Pearson Chi-Square 11.13 Pearson Chi-Square / DF no evidence of overdispersion 0.80 Parameter Estimates Estimate Standard Error D F t Value Pr > |t| Intercept 15.0429 7.3786 14 2.04 0.0608 temp -0.2322 0.1082 14 -2.14 0.0500 Effect  logit ( )  15.04  0.23 X   Pr{TD  1} 14 May 2007 X  temp (F) SSP Core Facility 257 Department of Statistics Relevant Output (2) Estimates Label Estimate Standard Error DF t Value Pr > |t| Mean Standard Error Mean logit at 50 deg 3.4348 2.0232 14 1.70 0.1117 0.9688 0.06121 logit at 60 deg 1.1131 1.0259 14 1.09 0.2962 0.7527 0.1909 logit at 64.7 deg 0.02197 0.6576 14 0.03 0.9738 0.5055 0.1644 logit at 64.8 deg -0.00125 0.6518 14 -0.00 0.9985 0.4997 0.1630 logit at 70 deg -1.2085 0.5953 14 -2.03 0.0618 0.2300 0.1054 logit at 80 deg -3.5301 1.4140 14 -2.50 0.0256 0.02847 0.03911 logit scale 14 May 2007 SSP Core Facility data scale 258 Department of Statistics Alternatives  Express data in binomial form − SAS for Linear Models, 4th ed., output 10.5  Probit link        1 e 2 z2  2 dz std normal c.d.f.  link function is  -1         X  inverse link is     X  14 May 2007 SSP Core Facility 259 Department of Statistics Logit vs Probit Red: probit Blue: logit 14 May 2007 SSP Core Facility 260 Department of Statistics Probit Model proc glimmix data=Challenger; model td/total=temp/link=probit solution; estimate 'logit at 50 deg' intercept 1 temp 50 / ilink; estimate estimate estimate estimate 'logit 'logit 'logit 'logit at at at at 60 deg' intercept 1 64.7 deg' intercept 64.8 deg' intercept 70 deg' intercept 1 temp 60 / ilink; 1 temp 64.7 / ilink; 1 temp 64.8 / ilink; temp 70 / ilink; estimate 'logit at 80 deg' intercept 1 temp 80 / ilink; run; 14 May 2007 SSP Core Facility 261 Department of Statistics Probit Output Parameter Estimates Fit Statistics Pearson Chi-Square 10.98 Pearson Chi-Square / DF 0.78 Effect Estimate Standard Error DF t Value Pr > |t| 8.7750 4.0286 14 2.18 0.0470 -0.1351 0.05839 14 -2.31 0.0364 Intercept temp Estimates Estimate Standard Error DF t Value Pr > |t| Mean Standard Error Mean logit at 50 deg 2.0201 1.1413 14 1.77 0.0985 0.9783 0.05917 logit at 60 deg 0.6692 0.6024 14 1.11 0.2854 0.7483 0.1921 logit at 64.7 deg 0.03421 0.3960 14 0.09 0.9324 0.5136 0.1579 logit at 64.8 deg 0.02070 0.3925 14 0.05 0.9587 0.5083 0.1566 logit at 70 deg -0.6818 0.3244 14 -2.10 0.0541 0.2477 0.1026 logit at 80 deg -2.0328 0.7277 14 -2.79 0.0144 0.02104 0.03678 Label 14 May 2007 SSP Core Facility 262 Department of Statistics Option 3: Use Binary Data proc glimmix data=O_Ring; Careful!! Normal default model td_bin=temp / solution; model td_bin=temp /dist=binomial link=logit solution; estimate 'logit at 50 deg' intercept 1 temp 50 / ilink; estimate 'logit at 60 deg' intercept 1 temp 60 / ilink; estimate 'logit at 64.7 deg' intercept 1 temp 64.7 / ilink; estimate 'logit at 64.8 deg' intercept 1 temp 64.8 / ilink; estimate 'logit at 70 deg' intercept 1 temp 70 / ilink; estimate 'logit at 80 deg' intercept 1 temp 80 / ilink; run; 14 May 2007 SSP Core Facility 263 Department of Statistics Binary Output Fit Statistics Pearson Chi-Square Parameter Estimates 23.17 Pearson Chi-Square / DF Estimate Standard Error DF t Value Pr > |t| Intercept 15.0429 7.3786 21 2.04 0.0543 temp -0.2322 0.1082 21 -2.14 0.0438 Effect 1.10 no evidence of overdispersion Estimates Estimate Standard Error DF t Value Pr > |t| Mean Standard Error Mean logit at 50 deg 3.4348 2.0232 21 1.70 0.1043 0.9688 0.06121 logit at 60 deg 1.1131 1.0259 21 1.09 0.2902 0.7527 0.1909 logit at 64.7 deg 0.02197 0.6576 21 0.03 0.9737 0.5055 0.1644 logit at 64.8 deg -0.00124 0.6518 21 -0.00 0.9985 0.4997 0.1630 logit at 70 deg -1.2085 0.5953 21 -2.03 0.0552 0.2300 0.1054 logit at 80 deg -3.5301 1.4140 21 -2.50 0.0209 0.02847 0.03911 Label 14 May 2007 SSP Core Facility 264 Department of Statistics Binary Data + Random Effects  Binary data in GLM with random effect can be troublesome  Pseudo-likelihood tends to produce biased variance / covariance component estimates  e.g. variance estimates biased down for small cluster size  Larger sample sizes tend to be required  No overdispersion estimate 14 May 2007 SSP Core Facility 265 Department of Statistics Binary GLMM example  courtesy of Oliver Schabenberger  200 subjects  random intercept  logistic link 14 May 2007 data binary; do subject = 1 to 200; ranint = rannor(&seed); do i = 1 to &n; linp = &b0 + ranint; pi = 1/(1 + exp(-linp)); y = ranbin(0,1,pi); output; end; end; drop i; run; SSP Core Facility 266 Department of Statistics Binary GLMM  Schabenberger used two programs proc glimmix data=binary; class subject; model y(event='1') = / dist=binary link=logit s; random intercept / subject=subject; ods select ParameterEstimates CovParms; run; proc nlmixed data=binary; parms s2 1 intercept -1; model y ~ binary(1/(1+exp(-intercept+gamma))); random gamma ~ normal(0,s2) subject=subject; ods select Dimensions ParameterEstimates; run; 14 May 2007 SSP Core Facility 267 Department of Statistics GLIMMIX vs NLMIXED Binary Results cluster size n=4 cluster size n=20 GLIMMIX Covariance Parameter Estimates Cov Parm Subject Intercept subject Covariance Parameter Estimates Estimate Standard Error 0.5251 0.1699 Cov Parm Intercept Standard Err or Estimate Intercept subject 0.9905 Solutions for Fixed Effects Solutions for Fixed Effects Effect Subjec t Estimate Standard Error DF Effect -0.7159 0.09211 199 Intercept 0.1373 Estimate Standard Error DF -0.9239 0.08020 199 Estimate Standard Error DF 1.1512 0.1659 199 -0.9854 0.08691 199 NLMIXED Parameter Estimates Parameter s2 intercept 14 May 2007 Parameter Estimates Estimate Standard Error DF Parameter 0.8159 0.2718 199 s2 -0.8092 0.1085 199 intercept SSP Core Facility 268 Department of Statistics Diagnostics & Alternative Models       Example using count data SAS Linear Models, Output 10.24 Historically, count data assumed ~ Poisson Implies mean=variance In practice, often variance>mean, overdispersion Requires modification − scale to correct std error, test statistics for overdispersion − use different distribution 14 May 2007 SSP Core Facility 269 Department of Statistics Basic analysis + model checking proc glimmix data=a; class BLOCK CTL_TRT a b; model count=CTL_TRT a b a*b/dist=poisson; random intercept / subject=BLOCK; output out=check pred=xbeta pred(ilink)=pred residual=r pearson=resid_pearson; run; data plot; merge check; adjlamda=2*sqrt(pred); ystar=xbeta+(count-pred)/pred; absres=abs(resid_pearson); proc gplot; plot resid_pearson*(pred xbeta); plot (resid_pearson)*adjlamda; plot ystar*xbeta; plot absres*adjlamda; run; 14 May 2007 SSP Core Facility Model checking plots: 1. Residuals vs pred a. use std resid b. or deviance res c. std’ize pred scale look for unequal scatter (wrong dist or var fct) pattern in resid (wrong model or link) 2. y* vs.  (xbeta) linear or wrong link 270 Department of Statistics Evidence of Overdispersion Fit Statistics -2 Res Log Pseudo-Likelihood Generalized Chi-Square Gener. Chi-Square / DF 124.06 100.15 3.34 Gener. chi-square / DF should be  1 >1 indicates overdispersion <1 indicates underdispersion 14 May 2007 SSP Core Facility 271 Department of Statistics Example: plot of residuals x adjlamda 14 May 2007 SSP Core Facility 272 Department of Statistics Another look – absolute value resid vs adjlamda 14 May 2007 SSP Core Facility 273 Department of Statistics Link? Plot ystar x XBeta should be linear – no strong evidence of problem 14 May 2007 SSP Core Facility 274 Department of Statistics Strategy 1: Adjust using scale parameter Poisson log-likelihood is y log( )    log  y !  E ( y )  Var ( y )   Quasi-likelihood allows scale parameter  y t Q dt t Now, E ( y )   14 May 2007 q y log( )      Var ( y )   SSP Core Facility 275 Department of Statistics Implementation with GLIMMIX proc glimmix data=a; class BLOCK CTL_TRT a b; model count=CTL_TRT a b a*b/dist=poisson htype=1,3; random intercept / subject=BLOCK; random _residual_; run; SCALE estimated from RANDOM _RESIDUAL_  Generalized  2 N - rank ( X ) alternatively can use  14 May 2007 deviance N - rank ( X ) SSP Core Facility 276 Department of Statistics Selected Output UnScaled Scaled Type I Tests of Fixed Effects Effect CTL_TRT Type I Tests of Fixed Effects Num DF Den DF F Value Pr > F Effect 1 27 55.83 <.0001 CTL_TRT Type III Tests of Fixed Effects Den DF F Value Pr > F 1 27 16.23 0.0004 Type III Tests of Fixed Effects Num DF Den DF F Value Pr > F CTL_TRT 0 . . . 0.0009 A 2 27 2.67 0.0875 0.06 0.9402 B 2 27 0.02 0.9822 3.11 0.0315 A*B 4 27 0.90 0.4753 Num DF Den DF F Value Pr > F CTL_TRT 0 . . . A 2 27 9.19 B 2 27 A*B 4 27 Effect Num DF Effect Note discrepancy for CTL_TRT and A main effect 14 May 2007 SSP Core Facility 277 Department of Statistics Alternative 2: different distribution e.g. Negative Binomial Standard math - stat text form : ( N  1)!  y (1   ) N  y y! ( N  y  1)! More useful form : let N  y  k and     k ( y  k  1)!     k  yields p.d.f.      y! ( k  1)!    k     k  y k  ( y  k  1)!      k   log L  y log    k log    log    k   k   y! ( k  1)!      k  y log    k log    expon family, but is quasi - likelihood   k k     E ( y )   , Var ( y )    2    , natural param   log   k  k   is the mean and k is the aggregation parameter small k  aggregation; k  Poisson 14 May 2007 SSP Core Facility 278 Department of Statistics Negative Binomial with GLIMMIX proc glimmix data=a; class BLOCK CTL_TRT a b; model count=CTL_TRT a b a*b/dist=negbin htype=1,3; random intercept / subject=BLOCK; run; Type I Tests of Fixed Effects Effect CTL_TRT -2 Res Log Pseudo-Likelihood 84.48 Generalized Chi-Square 28.32 Gener. Chi-Square / DF 0.94 14 May 2007 Den DF F Value Pr > F 1 27 10.08 0.0037 Type III Tests of Fixed Effects Num DF Den DF F Value Pr > F CTL_TRT 0 . . . A 2 27 3.53 0.0436 B 2 27 0.03 0.9753 A*B 4 27 1.02 0.4139 Effect Fit Statistics Num DF SSP Core Facility 279 Department of Statistics Modeling with Offsets  There are cases when modeling count alone is naive  This occurs when counts are “per unit” − − − − − Number of plants per plot Number of patients per county Number of students per district Number of boating accidents per year per lake Number of defects per lot  Accurate model must take units into account  Essentially, based on log(count/unit)  Log(count) is link; log(unit) is “offset” 14 May 2007 SSP Core Facility 280 Department of Statistics Offset defined  Idea: raw count may be artifact of unit size  Count / unit more informative  Offset − adjusts for size − is a regressor whose coefficient is assumed to be 1.0 − used especially in conjuction with Poisson models with log link − accounts for heterogeneity in rates resulting from difference in size 14 May 2007 SSP Core Facility 281 Department of Statistics Modeling with Offsets yi Poisson(i ) i    sizei  exp i  log  E ( yi )   i  log     log  sizei   X   offset   rate per unit size 14 May 2007 SSP Core Facility 282 Department of Statistics Example: Courtesy of Oliver Schabenberger  Some of the data  X is predictor variable  SIZE is the “unit” to be taken into account 14 May 2007 Obs size x count 1 5001 4.597 4 2 7550 4.245 76 3 1744 3.918 2 4 1451 3.273 2 5 5313 4.140 12 6 3687 3.438 4 7 3022 4.763 2 8 8809 4.445 9 9 4436 4.191 3 10 2621 4.835 6 SSP Core Facility 283 Department of Statistics Naive Modeling (not accounting for SIZE) proc glimmix data=test; model count = x / s dist=poisson; ods select FitStatistics ParameterEstimates; run; Fit Statistics Parameter Estimates -2 Log Likelihood 647.12 AIC (smaller is better) 651.12 AICC (smaller is better) 651.45 BIC (smaller is better) 654.50 CAIC (smaller is better) 656.50 HQIC (smaller is better) 652.35 Pearson Chi-Square Pearson Chi-Square / DF 14 May 2007 Effect Estimate Standard Error D F t Value Pr > |t| 2.0978 0.4143 38 5.06 <.0001 -0.01619 0.1002 38 -0.16 0.8725 Intercept x 1078.66 28.39 SSP Core Facility 284 Department of Statistics Poisson Model with Offset proc glimmix data=test; offs = log(size); model count = x /s dist=poisson offset=offs; ods select FitStatistics ParameterEstimates; run; Fit Statistics -2 Log Likelihood Parameter Estimates 318.41 AIC (smaller is better) 322.41 Effect AICC (smaller is better) 322.73 Intercept BIC (smaller is better) 325.79 x CAIC (smaller is better) 327.79 HQIC (smaller is better) 323.63 Pearson Chi-Square 347.09 Pearson Chi-Square / DF 14 May 2007 Estimate Standard Error D F t Value Pr > |t| -7.3168 0.5052 38 -14.48 <.0001 0.2247 0.1225 38 1.83 0.0746 9.13 SSP Core Facility 285 Department of Statistics Alternative to Offset??  Could count/size be treated as binomial? proc glimmix data=test; offs = log(size); model count = x /s dist=poisson offset=offs; output out=gmxout1 pred(ilink)=mu; id _xbeta_ offs _linp_; ods exclude all; run; proc glimmix data=test; model count/size = x /s dist=binomial; output out=gmxout2 pred(ilink)=prob; ods exclude all; run; data gmxout2; set gmxout2; predcount= prob * size; 14 May 2007 SSP Core Facility 286 Department of Statistics Compare Poisson/Offset vs Binomial Results Poisson results MU = pred count Obs _xbeta_ offs _linp_ Bimomial results mu Obs size x count prob predcount 1 -6.28394 8.51739 2.23346 9.3321 1 5001 4.597 4 .001866023 9.3320 2 -6.36302 8.92930 2.56628 13.0173 2 7550 4.245 76 .001724158 13.0174 3 -6.43649 7.46394 1.02745 2.7939 3 1744 3.918 2 .001602034 2.7939 4 -6.58140 7.28001 0.69860 2.0109 4 1451 3.273 2 .001385890 2.0109 5 -6.38661 8.57791 2.19130 8.9468 5 5313 4.140 12 .001683963 8.9469 6 -6.54433 8.21257 1.66823 5.3028 6 3687 3.438 4 .001438241 5.3028 7 -6.24664 8.01367 1.76703 5.8535 7 3022 4.763 2 .001936911 5.8533 8 -6.31809 9.08353 2.76544 15.8860 8 8809 4.445 9 .001803387 15.8860 9 -6.37516 8.39751 2.02235 7.5561 9 4436 4.191 3 .001703368 7.5561 10 -6.23047 7.87131 1.64085 5.1595 10 2621 4.835 6 .001968487 5.1594 predicted counts nearly identical 14 May 2007 SSP Core Facility 287 Department of Statistics ZIP and Hurdle Models  Mixture models for count data − ZIP = “zero-inflated Poisson” − ZINB = “zero-inflated Negative Binomial” − in principle, other zero-inflated models limited only by imagination  Accommodate excess zeros − Excess zeros cause overdispersion  Are not in exponential family  Cannot be fit with PROC GLIMMIX  Can be fit using PROC NLMIXED 14 May 2007 SSP Core Facility 288 Department of Statistics ZIP Model zi Poisson  i   i  1   i  Pr  zi  0  j  0 Pr  yi  j     1   i  Pr  zi  0  j  0  i  1   i  e  i    i j e  i  1   i   j !     Observation prob of 0 from Bernoulli process 14 May 2007 SSP Core Facility j0 j0 prob of zero from Poisson process 289 Department of Statistics Hurdle Model  Two part model − One process generates zeros − Another process generates non-zeros Pr  zi  0   Pr  yi  j     Pr  ui  0    1  Pr  zi  0    1  Pr  u  0   i    zeros from Z process 14 May 2007 observation SSP Core Facility j0 j0 truncated at zero distribution 290 Department of Statistics ZIP or Hurdle?  Number of doctor visits per year  Number of fish caught by sport fishermen  Cancer mortality 14 May 2007 SSP Core Facility 291 Department of Statistics From SAS for Mixed Models, 2nd ed, Ch 15   %let pi = 0.27; data zip; do s = 1 to 100; u = rannor(556712); Credit: do i = 1 to 20; x = int(ranuni(0)*100); Oliver y = int(rannor(0)*100); Schabenberger if (ranuni(0) < &pi) then do; count = 0; lambda = .; end; else do; lambda = exp(-2 + 0.01*x + 0.01*y + u); count = ranpoi(0,lambda); end; output; end; end; drop i u lambda; run; 14 May 2007 SSP Core Facility 292 Department of Statistics ZIP Model with Random Effects proc nlmixed data=zip; parameters b0=0 b1=0 b2=0 a0=0 s2u=1; /* linear predictor for the inflation probability */ linpinfl = a0; /* infprob = inflation probability for zeros */ /* = logistic transform of the linear predictor*/ infprob = 1/(1+exp(-linpinfl)); /* Poisson mean */ lambda = exp(b0 + b1*x + b2*y + u); /* Build the ZIP log likelihood */ if count=0 then ll = log(infprob + (1-infprob)*exp(-lambda)); else ll = log((1-infprob)) + count*log(lambda)-lgamma(count+1)-lambda; model count ~ general(ll); random u ~ normal(0,s2u) subject=s; estimate "inflation probability" infprob; run; 14 May 2007 SSP Core Facility 293 Department of Statistics ZIP NLMIXED Selected Results true parameter values b0=-2 b1=b2=0.01 a0=-0.9946 s2u=1 Fit Statistics -2 Log Likelihood 2803.6 AIC (smaller is better) 2813.6 AICC (smaller is better) 2813.7 BIC (smaller is better) 2826.7 Parameter Estimates Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 -1.9979 0.1530 99 -13.06 <.0001 0.05 -2.3014 -1.6944 -0.00224 b1 0.01011 0.001299 99 7.78 <.0001 0.05 0.007535 0.01269 -0.15649 b2 0.01016 0.000394 99 25.78 <.0001 0.05 0.009378 0.01094 -0.0434 a0 -1.0934 0.1594 99 -6.86 <.0001 0.05 -1.4097 -0.7771 -0.00034 s2u 1.0828 0.2095 99 5.17 <.0001 0.05 0.6671 1.4985 -0.00145 Parameter Additional Estimates Label inflation probability 14 May 2007 Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper 0.2510 0.02997 99 8.38 <.0001 0.05 0.1915 0.3104 SSP Core Facility 294 Department of Statistics GLMM Multi-Clinic Binomial Data  SAS for Linear Models, Output 10.9       also SAS for Mixed Models, Ch 14 from Beitler & Landis, Biometrics, 1985 2 treatments (drug, cntl) 8 clinics, represent population nij patients observed on trt i at clinic j yij have favorable response 14 May 2007 SSP Core Facility 295 Department of Statistics GLMM for Beitler Landis Data  ij  Pr  favorable | trt  i, clinic  j   ij  Model: log      i  c j  (ct )ij  1    ij   2 c j iid N (0,  C2 );  ct ij iid N (0,  CT ) proc glimmix data=a; class clinic trt; model fav/nij= trt/dist=binomial link=logit; random intercept trt / subject=clinic; lsmeans trt/odds; estimate 'lsm - cntl' intercept 1 trt 1 0 /ilink; estimate 'lsm - drug' intercept 1 trt 0 1 / ilink; estimate 'diff' trt 1 -1; contrast 'diff' trt 1 -1; run; 14 May 2007 SSP Core Facility Covariance Parameter Estimates Cov Parm Subject Estimate Intercept clinic 2.0103 trt clinic 0.06057 296 Department of Statistics If you drop Clinic x Trt proc glimmix data=a; class clinic trt; model fav/nij= trt/dist=binomial link=logit; random intercept / subject=clinic; lsmeans trt/odds; estimate 'lsm - cntl' intercept 1 trt 1 0 /ilink; estimate 'lsm - drug' intercept 1 trt 0 1 / ilink; estimate 'diff' trt 1 -1; contrast 'diff' trt 1 -1; run; proc glimmix data=a; class clinic trt; model fav/nij= trt/dist=binomial link=logit; random _residual_ / type=cs subject=clinic; lsmeans trt/odds; estimate 'lsm - cntl' intercept 1 trt 1 0 /ilink; estimate 'lsm - drug' intercept 1 trt 0 1 / ilink; estimate 'diff' trt 1 -1; contrast 'diff' trt 1 -1; run; 14 May 2007 SSP Core Facility conditional (SS) model marginal (PA) model 297 Department of Statistics Selected Output – Conditional Model Type III Tests of Fixed Effects Covariance Parameter Estimates Cov Parm Estimate Standard Error clinic 2.0327 1.2637 Effect Num DF Den DF F Value Pr > F 1 7 5.98 0.0444 trt Estimates Estimate Standard Error D F t Value Pr > |t| Mean Standard Error Mean lsm - cntl -1.1464 0.5586 7 -2.05 0.0793 0.2411 0.1022 lsm - drug -0.4220 0.5552 7 -0.76 0.4720 0.3960 0.1328 diff -0.7244 0.2963 7 -2.45 0.0444 Label trt Least Squares Means Estimate Standard Error DF t Value Pr > |t| Odds cntl -1.1464 0.5586 7 -2.05 0.0793 0.3178 drug -0.4220 0.5552 7 -0.76 0.4720 0.6557 trt 14 May 2007 SSP Core Facility 298 Department of Statistics GLMM with NLMIXED 1. data step to define indicator for Trt=1 (because NLMIXED lacks CLASS statement) data a; input clinic trt $ fav unfav; nij=fav+unfav; t1=(trt='drug'); 2. then, run NLMIXED proc nlmixed; parms mu=1 tau=0 s2c=2; eta=mu+tau*t1+cj; pij=exp(eta)/(1+exp(eta)); model fav~binomial(nij,pij); random cj~normal(0,s2c) subject=clinic; estimate 'trt effect' tau; estimate 'ctl p_hat' exp(mu)/(1+exp(mu)); estimate 'drug p_hat' exp(mu+tau)/(1+exp(mu+tau)); estimate 'diff on p_hat scale' exp(mu+tau)/(1+exp(mu+tau)) - exp(mu)/(1+exp(mu)); run; 14 May 2007 SSP Core Facility 299 Department of Statistics NLMIXED with CxT term included first, also define Trt=2 indicator, here denoted t2 proc nlmixed; parms mu=1 tau=0 s2c=2 s2ct=0.08; eta=mu+tau*t1+cj+c1j*t1+c2j*t2;; pij=exp(eta)/(1+exp(eta)); model fav~binomial(nij,pij); random cj c1j c2j~normal([0,0,0],[s2c,0,s2ct,0,0,s2ct]) subject=clinic; estimate 'trt effect' tau; estimate 'ctl p_hat' exp(mu)/(1+exp(mu)); estimate 'drug p_hat' exp(mu+tau)/(1+exp(mu+tau)); estimate 'diff on p_hat scale' exp(mu+tau)/(1+exp(mu+tau)) - exp(mu)/(1+exp(mu)); run; 14 May 2007 SSP Core Facility 300 Department of Statistics Binary Repeated Measures      2 treatments 20 subjects (animals) per trt 5 times of measurement response at each measurement 0/1 suggested by companion animal vaccine trials 14 May 2007 SSP Core Facility 301 Department of Statistics Several approaches  GEE using GENMOD  PQL using %GLIMMIX − random subj(trt), or − CS  G-H quadrature using NLMIXED  (not shown) but you could use MIXED  type 1 error control of PQL + random subj(trt) not acceptable  power of PQL/CS or NLMIXED > GEE 14 May 2007 SSP Core Facility 302 Department of Statistics various SAS pgm for binary rpt-M data GEE PQL random an(trt) CS proc genmod; class trt animal day; model y=trt|day/dist=bin type1 type3; repeated subject=animal(trt)/ type=exch; Proc GLIMMIX; CLASS trt animal day; MODEL y=trt|day / dist=binomial link=logit; random animal(trt); Proc GLIMMIX; CLASS trt animal day; MODEL y=trt|day / dist=binomial link=logit; random day / rside type=cs subject=animal(trt); NLMixed next page 14 May 2007 SSP Core Facility 303 Department of Statistics NLMixed data nlmx; set univar; t1=(trt=1); t2=(trt=2); d1=(day=1); d2=(day=2); d3=(day=3); d4=(day=4); d5=(day=5); proc nlmixed; parms mu=1 a1=1 b1=1 b2=1 b3=1 b4=1 ab11=1 ab12=1 ab13=1 ab14=1 sb2=1; eta=mu+a1*t1+b1*d1+b2*d2+b3*d3+b4*d4+ ab11*t1*d1+ab12*t1*d2+ab13*t1*d3+ab14*t1*d4; pi=exp(eta+bse)/(1+exp(eta+bse)); model y~binary(pi); random bse~normal(0,sb2) subject=id; contrast 'trt' a1; contrast 'day' b1,b2,b3,b4; contrast 'trt x day' ab11,ab12,ab13,ab14; 14 May 2007 SSP Core Facility 304 Department of Statistics Poisson Repeated Measures        Output 10.39 SAS for Linear Models Leppik, et al (1985); Thall & Vail (1990) 2 treatments 28 patients on trt=0; 31 on trt=1 4 times of measurement epilespsy: # seizures in 4 test periods baseline & age covariates 14 May 2007 SSP Core Facility 305 Department of Statistics Model for seizure data denote ij  mean count (# seizures) trt i, time j GL Model is: log(ij )     i   j  ( )ij  1i (log_ base)   2 (log_ age) Assume CS working correlation structure among repeated measures using GEE proc genmod data=seizure; class id trt time; /* this model first */ *model y=trt time trt*time log_base trt*log_base log_age/ dist=poisson link=log type1 type3; /* then this model */ model y=trt time log_base(trt)log_age/ dist=poisson link=log type1 type3; repeated subject=id / type=exch corrw; see SAS file for %GLIMMIX approach 14 May 2007 SSP Core Facility 306 Department of Statistics GENMOD to GLIMMIX using GEE proc genmod data=seizure; class id trt time; model y=trt time log_base(trt)log_age/ dist=poisson link=log type1 type3; repeated subject=id / type=exch corrw; equivalent GLIMMIX proc glimmix data=seizure; class id trt time; model y=trt time log_base(trt)log_age/ dist=poisson link=log; random time / type=cs subject=id residual; 14 May 2007 SSP Core Facility 307 Department of Statistics Degrees of Freedom & Standard Errors  Recall Satterthwaite approximation & KenwardRoger bias adjustment in LMM  Same issues exist with GLMM  But not nearly as well researched  You can use SATTERTH and KR options in GLIMMIX with non-normal data & non-identity link  But what do they do? 14 May 2007 SSP Core Facility 308 Department of Statistics Power 14 May 2007 SSP Core Facility 309 Department of Statistics VIII. Power  Many software packages for power & sample size − e.g SAS PROC POWER − for FIXED effect models only  What if you have “Mixed Model Issues”? − random effects − split-plot structure − errors potentially correlated: longitudinal or spatial data − any other non-standard model structure  Methods based on PROC GLIMMIX − adapted from Stroup (2002, JABES) 14 May 2007 SSP Core Facility 310 Department of Statistics Mixed Model Background – G, R unknown ( K ' ˆ )' [ L' Cˆ L]1 ( K ' ˆ ) F ( K '   0)  rank ( K ) Cˆ is estimate of C using estimated components of G and R F ~ approx F[ rank( K ), ,]  may be obvious from design or may need to be approximat ed e.g. Satterthwa ite, Kenward - Roger   ( K ' )' [ L' CL]1 ( K ' ) 14 May 2007 SSP Core Facility 311 Department of Statistics Computing Power using SAS  create data set like proposed design (O’Brien: “exemplary data set”)  run PROC GLIMMIX with covariance components fixed  =(F computed by GLIMMIX)rank(K) [or chi-sq with GLM]  use GLIMMIX to compute   critical F (Fcrit ) is value s.t. P{F (rank(K), υ, 0 ) > Fcrit}=   Power = P{F [rank(K), υ, ] >Fcrit }  SAS functions can compute Fcrit & Power 14 May 2007 SSP Core Facility [or chi-square] 312 Department of Statistics Compute Power with GLIMMIX – CRD example /* step 1 - create data set with same structure as proposed design use MU (expected mean) instead of observed Y_ij values */ /* this example shows power for 5, 10, and 15 e.u. per trt */ data crdpwrx1; input trt mu; do n=5 to 15 by 5; do eu=1 to n; output; end; end; cards; 1 100 2 94 3 90 ; 14 May 2007 SSP Core Facility 313 Department of Statistics Compute Power with GLIMMIX – CRD example /* step 2 - use PROC GLIMMIX to compute non-centrality parameters for ANOVA tests & contrasts ODS statements output them to new data sets */ proc sort data=crdpwrx1; by n; proc glimmix data=crdpwrx1; by n; class trt; model mu=trt; parms (100)/hold=1; contrast 'et1 v et2' trt 0 1 -1; contrast 'c vs et' trt 2 -1 -1; ods output tests3=b; ods output contrasts=c; run; 14 May 2007 SSP Core Facility 314 Department of Statistics Type III Tests of Fixed Effects Effect Contrasts Num DF Den DF F Value Pr > F Label 2 12 1.27 0.3169 trt Num DF Den DF F Value Pr > F et1 v et2 1 12 0.40 0.5390 c vs et 1 12 2.13 0.1698 /* step 3: combine ANOVA & contrast n-c parameter data sets use SAS functions PROBF and FINV to compute power data power; set b c; alpha=0.05; ncparm=numdf*fvalue; fcrit=finv(1-alpha,numdf,dendf,0); power=1-probf(fcrit,numdf,dendf,ncparm); proc print; Obs n Effect NumDF DenDF FValue ProbF 1 5 trt 2 12 1.27 0.3169 2 5 1 12 0.40 0.5390 3 5 1 12 2.13 0.1698 14 May 2007 Label alpha ncparm 0.05 et1 v et2 c vs et SSP Core Facility */ fcrit power 2.53333 3.88529 0.22361 0.05 0.40000 4.74723 0.08980 0.05 2.13333 4.74723 0.26978 315 Department of Statistics More Advanced Example     Plots in 8 x 3 grid Main variation alone 8 “rows” 3 x 2 treatment design Alternative designs − randomized complete block (4 blocks, size 6) − incomplete block (8 blocks, size 3) − split plot  RCBD “easy” but ignores natural variation 14 May 2007 SSP Core Facility 316 Department of Statistics Picture the 8 x 3 Grid Gradient 14 May 2007 SSP Core Facility 317 Department of Statistics SAS Programs to Compare 8 x 3 Design data a; input bloc trtmnt @@; do s_plot=1 to 3; input dose @@; Split-Plot mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end; proc glimmix data=a noprofile; cards; class bloc trtmnt dose; 1 1 1 2 3 1 2 1 2 3 model mu=bloc trtmnt|dose; 2 1 1 2 3 random trtmnt/subject=bloc; 2 2 1 2 3 parms (4) (6) / hold=1,2; 3 1 1 2 3 lsmeans trtmnt*dose / diff; 3 2 1 2 3 contrast 'trt x lin' 4 1 1 2 3 trtmnt*dose 1 0 -1 -1 0 1; 4 2 1 2 3 ods output diffs=b; ; ods output contrasts=c; run; 14 May 2007 SSP Core Facility 318 Department of Statistics 8 x 3 – Incomplete Block data a; input bloc @@; do eu=1 to 3; input trtmnt dose @@; mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end; proc glimmix data=a noprofile; cards; class bloc trtmnt dose; 1 1 1 1 2 1 3 model mu=trtmnt|dose; 2 1 1 1 2 2 2 3 1 1 1 3 2 3 random intercept / subject=bloc; 4 1 1 2 1 2 2 parms (4) (6) / hold=1,2; 5 1 2 1 3 2 2 lsmeans trtmnt*dose / diff; 6 1 2 2 1 2 3 contrast 'trt x lin' 7 1 3 2 1 2 3 trtmnt*dose 1 0 -1 -1 0 1; 8 2 1 2 2 2 3 ods output diffs=b; ; ods output contrasts=c; run; 14 May 2007 SSP Core Facility 319 Department of Statistics 8 x 3 Example - RCBD data a; input trtmnt dose @@; do bloc=1 to 4; mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end; cards; 1 1 1 2 1 3 2 1 2 2 2 3 ; proc glimmix data=a noprofile; class bloc trtmnt dose; model mu=bloc trtmnt|dose; parms (10) / hold=1; lsmeans trtmnt*dose / diff; contrast 'trt x lin' trtmnt*dose 1 0 -1 -1 0 1; ods output diffs=b; ods output contrasts=c; run; 14 May 2007 SSP Core Facility 320 Department of Statistics Power for GLMs     2 treatments P{favorable outcome} for trt 1 p= 0.30; for trt 2 p=0.25 power if n1=300; n2=600 data a; input trt y n; datalines; 1 90 300 2 150 600 ; 14 May 2007 proc glimmix; class trt; model y/n=trt / chisq; ods output tests3=pwr; run; data power; set pwr; alpha=0.05; ncparm=numdf*chisq; fcrit=cinv(1-alpha,numdf,0); power=1-probchi(fcrit,numdf,ncparm); proc print; run; SSP Core Facility 321 Department of Statistics Power for GLMM      Same trt and sample size per location as before 10 locations Var(Location)=0.25; Var(Trt*Loc)=0.125 Variance Components: variation in log(OddsRatio) Power? data a; input trt y n; do loc=1 to 10; output; end; datalines; 1 90 300 2 150 600 ; 14 May 2007 proc glimmix data=a initglm; class trt loc; model y/n = trt / oddsratio; random intercept trt / subject=loc; random _residual_; parms (0.25) (0.125) (1) / hold=1,2,3; ods output tests3=pwr; run; SSP Core Facility 322 Department of Statistics GLMM Power Analysis Results Odds Ratio Estimates trt _trt 1 Estimate DF 2 Obs 1.286 Effect 1 trt 9 Gives you expected Conf Limits for # Locations & N / Loc contemplated 95% Confidence Limits 0.884 1.871 NumDF DenDF alpha ncparm fcrit power 1 9 0.05 2.29868 5.11736 0.27370 Gives you the power of the test of TRT effect on prob(favorable) 14 May 2007 SSP Core Facility 323 Department of Statistics GLMM Power: Impact of Sample Size?  N of subjects per trt per location?  N of Locations? data a; input trt y n; do loc=1 to 10; output; end; datalines; 1 90 300 2 150 600 ; 14 May 2007 Three cases 1. n-300/600 10 loc 2. n=600/1200, 10 loc 3. n=300/600, 20 loc data a; input trt y n; do loc=1 to 10; output; end; datalines; 1 180 600 2 300 1200 ; SSP Core Facility data a; input trt y n; do loc=1 to 20; output; end; datalines; 1 90 300 2 150 600 ; 324 Department of Statistics GLMM Power: Impact of Sample Size? Recall, for 10 locations, N=300/600, CI for OddsRatio was (0.884, 1.871); Power was 0.274 For 10 locations, N=600 / 1200 N alone has almost no impact Odds Ratio Estimates trt _trt 1 2 Obs Effect 1 trt Estimate DF 1.286 9 95% Confidence Limits 0.891 NumDF DenDF alpha 1 9 0.05 1.855 ncparm fcrit power 2.40715 5.11736 0.28421 For 20 locations, N=300 / 600 Odds Ratio Estimates trt _trt 1 2 Obs Effect 1 trt 14 May 2007 Estimate DF 95% Confidence Limits 1.286 19 NumDF DenDF alpha ncparm fcrit power 1 19 0.05 4.59736 4.38075 0.53003 1.006 SSP Core Facility 1.643 325 Department of Statistics Spatial Data 14 May 2007 SSP Core Facility 326 Department of Statistics Example 5 - Spatial from SAS for Mixed Models, Sect. 11.7 “Alliance” Data from Stroup, Baenziger, and Mulitze (1994) in GLIMMIX-speak: data two; set alliance; obs = _n_; proc glimmix data=two; class Entry Rep obs; model Yield=Entry/ddfm=kr; random intercept/subject=rep; random obs / type=sp(sph)(latitude longitude); parms (0.1) (43.4) (27.5) (11.5); lsmeans entry; 14 May 2007 SSP Core Facility 327 Department of Statistics IX. Spatial Data  Example from SAS for Mixed Models − Spatial errors in Treatement Comparison studies only − No spatial mapping, Kriging  Standard parametric models from Geostatistics  RSMOOTH alternative  Issues 14 May 2007 SSP Core Facility 328 Department of Statistics From Stroup, Baenziger & Mulitze (Crop Science, 1994) 56 varieties, 4 blocks, e.u. = 4.3  1.2 m plots L AT 47. 30 36. 55 25. 80 15. 05 4. 30 1. 2 7. 5 13. 8 20. 1 26. 4 L NG r ep 14 May 2007 1 2 SSP Core Facility 3 4 329 Department of Statistics Contour Plot of Response N B B N N B B N B = Buckskin 14 May 2007 N = NE86503 SSP Core Facility 330 Department of Statistics Additional GLIMMIX Code to Plot Spatial Variability output out=gmxout2 pred=p; ods output lsmeans=lsm2; id entry latitude longitude _zgamma_; run; proc means data=gmxout2; var _zgamma_; run; proc print data=gmxout2(OBS=20); run; proc g3d data=gmxout2; plot latitude*longitude=_zgamma_ /grid; 14 May 2007 SSP Core Facility 331 Department of Statistics Plot of Spherical Covariance 14 May 2007 SSP Core Facility 332 Department of Statistics Alternative Using RSMOOTH  Advantage in Theory: RSMOOTH does not require parametric model of spatial variation, which can be unrealistic  e.g. Alliance data spatial variation is from winter kill proc glimmix data=alliance; class Entry Rep; model Yield=Entry /ddfm=kr; *model Yield=Entry latitude longitude/ddfm=kr; random intercept/subject=rep; random latitude longitude / type=rsmooth; 14 May 2007 SSP Core Facility 333 Department of Statistics RSMOOTH?  From Penalized Spline − Ruppert, Wand, and Carroll (2003, SemiParametric Regression, Cambridge) Prediction: yˆ  B( x) ˆ Objective Function : Q*   ;     y  B ( x)    y  B ( x)     D 14 May 2007 SSP Core Facility 334 Department of Statistics RSMOOTH (2)  Rewrite the model y   0  1 xi    j  xi   j   e j  j is "knot" a.k.a. "join point" Rexpress: y  X   Z  e then Q *   ;   y  X   Z 14 May 2007 2 SSP Core Facility   2  335 Department of Statistics RSMOOTH (2) Spline: y  y  X   B  y  X   B   D    LMM: y   y  X   Zu   y  X   Zu    2  14 May 2007 SSP Core Facility 336 Department of Statistics RSMOOTH yields following Spatial Plot 14 May 2007 SSP Core Facility 337 Department of Statistics RSMOOTH vs SP(SPH) Sp(SPH) RSMOOTH Type III Tests of Fixed Effects Num Effect DF Entry 14 May 2007 Den DF F Value 55 138.1 1.85 Type III Tests of Fixed Effects Pr > F Effect Num DF 0.0021 Entry 55 SSP Core Facility Den DF 148.2 F Value Pr > F 1.77 0.0038 338 Department of Statistics However... Plot of LSMeans from two approaches LSM_RSMOOTH average 31.06 LSM_SP_SPH average 24.40 14 May 2007 SSP Core Facility ???? 339 Department of Statistics 14 May 2007 SSP Core Facility 340 Department of Statistics Some NLMM Issues  Consulting problem at UNL  Why nonlinear mixed model (NLMM) seemed appropriate  Problems in implementation   NLMM issues  Alternatives whose implications are not adequately understood 14 May 2007 SSP Core Facility 341 Department of Statistics Wheat Sawfly Study  Gary Hein, Research Entomologist, Scottsbluff, NE RREC  Sawflies inhabit/damage wheat  5 tillage treatments: impact on sawflies  Exp design used 4 randomized blocks  Sawfly emergence measured at planned times during growing season 14 May 2007 SSP Core Facility 342 Department of Statistics Emergence over TIME by TRT Black: NoTill Red: SumBlade (summer) Cyan: SB&SD Green: SpDisk (spring) Blue: SpPlow 14 May 2007 SSP Core Facility 343 Department of Statistics “Conventional” Analysis Emerge =  + TRT + blk + blk*trt + DATE + TRT*DATE + date*blk(trt) • blk*trt a.k.a. between subjects or “whole-plot” error • date*blk(trt) = within subjects or “split-plot” error ANOVA: 14 May 2007 Source df blk TRT betw subj error DATE TRT*DATE within subj error 3 4 12 12 48 180 SSP Core Facility 344 Department of Statistics Standard ANOVA model: emerge = + blk + TRT +w.p.error + TIME + TRT*TIME + s.p. error The Mixed Procedure CS covariance fit adequately Covariance Parameter Estimates Cov Parm blk blk*trt Residual Estimate 0.002177 0.005199 0.01845 Type 3 Tests of Fixed Effects Effect trt date trt*date 14 May 2007 Num DF Den DF F Value Pr > F 4 12 12 180 13.18 157.38 0.0002 <.0001 5.18 <.0001 48 180 SSP Core Facility 345 Department of Statistics Break out TRT*DATE effect Type 1 Tests of Fixed Effects Effect trt lin quad cubic date lin*trt quad*trt cubic*trt trt*date 14 May 2007 Num DF Den DF F Value Pr > F 4 1 1 1 9 4 4 4 36 12 177 177 177 177 177 177 177 177 15.62 2273.39 7.24 161.10 2.95 0.59 26.69 2.13 3.08 0.0001 <.0001 0.0078 <.0001 0.0027 0.6716 <.0001 0.0792 <.0001 SSP Core Facility 346 Department of Statistics Alternative Modeling Considerations Basic form of Model : yijk  ij  blkk  wik  eijk ij  mean of i th trt at jth time wik  whole  plot error ~ i.i.d . N (0,  W2 ) Modeling ij eijk  split  plot error ~ i.i.d . N (0,  2 ) 1. Decompose ij in “standard ANOVA” +Trt+Time+Trt*Time 2. Further decompose via polynomial regression 3. Nonlinear decomposition, e.g. Gompertz 4. Transform yijk to “linearize” response profile over date a. logit or probit (assume sigmoid profile is symmetric) b. complementary log-log (allows asymmetry) 14 May 2007 SSP Core Facility 347 Department of Statistics Gompertz Model : ij   i exp{ exp[ i  ( i  date j )]}  i is asymptote of i treatment th  i is "slope" of i treatment i is inflection point of i th treatment i th 14 May 2007 SSP Core Facility 348 Department of Statistics Parameter Estimates Parameter DF t Value Pr > |t| a1 a2 a3 a4 a5 0.9949 0.9666 0.9868 1.0037 0.9236 0.03629 0.03793 0.04609 0.06284 0.04390 19 19 19 19 19 27.42 25.48 21.41 15.97 21.04 <.0001 <.0001 <.0001 <.0001 <.0001 b1 b2 b3 b4 b5 0.5435 0.4822 0.4506 0.3431 0.8544 0.08104 0.08743 0.09845 0.06859 0.1810 19 19 19 19 19 6.71 5.52 4.58 5.00 4.72 <.0001 <.0001 0.0002 <.0001 0.0001 c1 0.3615 0.05388 19 6.71 <.0001 c2 c3 c4 c5 0.3224 0.2940 0.2186 0.5319 0.05841 0.06370 0.04360 0.1125 19 19 19 19 5.52 4.62 5.01 4.73 <.0001 0.0002 <.0001 0.0001 s2w s2s 14 May 2007 Estimate Standard Error 0.002926 0.01598 0.001355 0.001462 SSP Core Facility These are ML estimates Bias? 349 Department of Statistics Fit of Gompertz 14 May 2007 SSP Core Facility 350 Department of Statistics Trt Comparisons with NLMIXED Contrasts Label among a among b among c a: nt vs sum bld a: nt+sb vs sb&sd a: sp dsk vs sp plow a: nt+sb vs sp d+p b: nt vs sum bld b: nt+sb vs sb&sd b: sp dsk vs sp plow b: nt+sb vs sp d+p c: nt vs sum bld c: nt+sb vs sb&sd c: sp dsk vs sp plow c: nt+sb vs sp d+p 14 May 2007 Num DF Den DF F Value Pr > F 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 0.50 2.19 2.30 0.29 0.01 1.09 0.14 0.26 0.29 6.97 0.57 0.24 0.41 6.74 0.21 0.7383 0.1085 0.0966 0.5956 0.9108 0.3089 0.7169 0.6132 0.5950 0.0161 0.4590 0.6279 0.5305 0.0177 0.6497 SSP Core Facility 351 Department of Statistics Issues with Test Results  denominator degrees of freedom? DF in NLMIXED based on simple N-1 rule MIXED uses Satterthwaite/KR NLMIXED analog?  bias in test statistics? In MIXED, ML variance estimates biased  Test statistics biased  Excessive type I error rates familiar in MIXED Same in NLMIXED? 14 May 2007 SSP Core Facility 352 Department of Statistics Alternative NLMIXED Analysis 1. Use MIXED to obtain REML estimates of W2 and S2 2. Include REML variance component estimates in NLMIXED as known 3. NLMIXED will compute std errors and test statistics using REML estimates 14 May 2007 SSP Core Facility 353 Department of Statistics NLMIXED REML Tests MLE: W2 = 0.002926 REML: W2 = 0.005199 Label among a among b among c a: nt vs sum bld a: nt+sb vs sb&sd a: sp dsk vs sp plow a: nt+sb vs sp d+p b: nt vs sum bld b: nt+sb vs sb&sd b: sp dsk vs sp plow b: nt+sb vs sp d+p c: nt vs sum bld c: nt+sb vs sb&sd c: sp dsk vs sp plow c: nt+sb vs sp d+p 14 May 2007 S2 = 0.01598 S2 = 0.01845 Num DF 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 Den DF 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 SSP Core Facility F Value 0.38 1.81 1.89 0.26 0.00 0.77 0.15 0.22 0.18 5.88 0.52 0.21 0.27 5.68 0.20 Pr > F 0.8188 0.1690 0.1537 0.6138 0.9796 0.3918 0.7046 0.6419 0.6737 0.0255 0.4788 0.6555 0.6114 0.0277 0.6586 Vs. ML .1085 .0966 .0161 .0177 354 Department of Statistics Hein: “What if we transform the data to linearize it, then use MIXED?” Denote response variable emerge by y then: y    exp{ exp[   (  date)] if we assume  =1 then log[ log( y )]    (  date) 14 May 2007 SSP Core Facility 355 Department of Statistics Plot of CLogLog over Date by Trt 14 May 2007 SSP Core Facility 356 Department of Statistics MIXED Analysis of CLogLog Type 1 Tests of Fixed Effects Effect trt lin lin*trt trt*date Num DF Den DF F Value Pr > F 4 1 4 55 12 180 180 180 15.69 1402.85 3.58 7.02 0.0001 <.0001 0.0077 <.0001 Test of Lin and Lin*Trt correspond to equality of i and i for all treatments in Gompertz NLMM 14 May 2007 SSP Core Facility 357 Department of Statistics Decomposing Contrasts Num DF Label trt (b) c b: nt v sum bld b: nt&sb vs sb&sd b: sp d v p b: nt&sb v sp d&p c: nt v sum bld c: nt&sb vs sb&sd c: sp d v p c: nt&sb v sp d&p 4 4 1 1 1 1 1 1 1 1 Den DF F Value 15 120 15 15 15 15 120 120 120 120 6.12 3.62 2.15 4.37 2.27 19.96 2.11 3.49 0.99 11.08 Pr > F Vs NLMM 0.0040 0.0080 0.1631 0.0541 0.1526 0.0005 0.1491 0.0644 0.3214 0.0012 .169 .154 .674 .026 .611 .028 NLMM too conservative? or is Linearized LMM too liberal? 14 May 2007 SSP Core Facility 358 Department of Statistics Unresolved Issues 14 May 2007 SSP Core Facility 359 Department of Statistics Unresolved NLMIXED Issues  REML vs. ML variance component estimates  Degrees of Freedom  Starting Values and Convergence  Are NLMIXED tests too conservative?  Implications for standard errors??  Correlated error repeated measures?  When are linearized models analyzed using LMM (e.g. Proc Mixed) preferable?  Design 14 May 2007 SSP Core Facility 360 Department of Statistics GLIMMIX vs MIXED/GENMOD  GLIMMIX has very useful mean comparison options not available in MIXED − especially for Factorial Simple Effects  GLIMMIX can model true GLMM’s  GLIMMIX is “touchy” (e.g. use of SUBJECT=)  Many Research Issues − RSMOOTH − Properties of NonNormal KR, working correlation, DDF, etc. − Computational Methods 14 May 2007 SSP Core Facility 361 Department of Statistics Does GLIMMIX replace MIXED/GENMOD?  For GLMMs – no question  For GLMs / LMMs − for the most part – YES  Most GENMOD & MIXED programs can be duplicated in GLIMMIX − Mean Comparison features − no need to “trick” GENMOD into GLMM with marginal model (e.g. split-plot, rpt measures) 14 May 2007 SSP Core Facility 362

I. Overview - Survey, Statistics & Psychometrics

Related documents

Products

Support

I. Overview - Survey, Statistics & Psychometrics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib