I. Overview - Survey, Statistics & Psychometrics

advertisement
Department of Statistics
Introduction to
Modeling Change Over Time
with
Generalized Mixed Models
using
SAS PROC GLIMMIX
A Short Course – 14 May 2007
Instructor: Walt Stroup, Ph.D.
Professor & Chair, UNL Department of Statistics
Department of Statistics
Outline of ShortCourse (G/C = Growth/Change Model)
1. Introduction
a. motivating examples
b. Social Science HLM-speak vs. BioStat GLMM-speak
2. GLMM / HLM
a. essential background
b. recurring modeling issues
3. SAS / GLIMMIX syntax
4. G/C Models - 1st part of the picture: Factorial trt designs
a. with various error structures & distributions
b. with repeated measures & correlated errors
5. G/C Models - 2nd part of the picture: Random Effects issues
a. random coefficients
b. prediction vs. estimation
6. G/C Models – 3rd part of the picture - GLM issues:
 Binary, count, rate, zero-inflated models
7. Power & Planning
8. Nonlinear mixed models
14 May 2007
SSP Core Facility
2
Department of Statistics
Recurring Themes
 “Mixed Model” Issues
− fixed or random?
− error terms – which one & are they correlated?
− std error & d.f.
− prediction or estimate? (“inference space”)
 “GLM” Issues
− what distribution?
 incl “is it really a distribution & does it matter”?
− what link – “data” vs “model” scale?
− overdispersion
− computational issues
14 May 2007
SSP Core Facility
3
Department of Statistics
Recurring Themes
 George Bernard Shaw:
“America and England are two
peoples separated by a
common language.”
picture of
GB Shaw
 Generalized Mixed Models have
− AgStat-speak
− BioStat-speak
− Social/Behavioral Science Stat (HLM) speak
 One goal: serve as translator
14 May 2007
SSP Core Facility
4
Department of Statistics
I. Introduction
 General considerations for modeling
 Several examples illustrating generalized and
mixed models
 Typology of models
 Background theory
 Decision chart to match model with software
available in SAS
14 May 2007
SSP Core Facility
5
Department of Statistics
General Model considerations
 A Model is a description of the components of an
observation
 observation = systematic + random
 Nelder: random = ephemeral + noise or
random=random model + random error
 Alternative: random = design components +
remaining variation
 “All models are wrong but some are useful” – G.E.P Box
14 May 2007
SSP Core Facility
6
Department of Statistics
General Mixed Model Setting
 Y is vector of responses (observable)
 u is vector of random (design induced) effects
[not (directly) observable]
 relevant distributions
o Y|u ~ fC (  , R )
o u ~ fR ( 0, G )
Inexact (but useful)
•HLM level 1
•Biostat – subject-specific
•Level  2
 Model is of conditional mean of Y|u
  E (Y | u ) h( X ,  , Z , u )
14 May 2007
SSP Core Facility
7
Department of Statistics
Typology of Models
Type
Mean Model
Distribution
NLMM
h(X,,Z,u)
y|u general,
u normal **
GLMM
h(X+Zu)
y|u general,
u normal *
LMM
X+Zu
u, y|u normal
NLM
h(X,)
y normal
GLM
h(X)
y general
LM
X
y normal
* for PROC GLIMMIX ** for this course
19-20 Oct 2006
(G/N)LMM can be more general
GLIMMIX Short Course for Procter & Gamble
8
Department of Statistics
Example 1
Random Effects Model






Data: Output 4.1, p. 94, SAS for Linear Models, 4th ed.
20 packages of ground beef
3 samples per package
2 counts per sample
response variable: microbial count
response = mean + sample + count + error i.e. observation
= systematic + random model + error
14 May 2007
SSP Core Facility
9
Department of Statistics
Model for Example 1
yijk    pi  s ( p )ij  eijk
i  1, 2,..., 20;
j  1, 2,3;
pi i.i.d. N (0,  P2 );
k  1, 2
s ( p )ij i.i.d. N (0,  S2 );
eijk i.i.d. N (0,  2 )
yijk is observation [ log(count) ]
 is overall mean (systematic / fixed)
pi, s(p)ij are random model effects
eijk is random error
Convention: fixed Greek; random Latin
14 May 2007
SSP Core Facility
10
Department of Statistics
Hierarchical Levels
classroom
Level 2
students
Level 1
size
school
Level 3
14 May 2007
SSP Core Facility
level
small
1
medium
2
large
3
11
Department of Statistics
Hierarchical Level to Statistical Model
classroom
students
school
yijk  k th student, j th classroom, i th school
yijk  mean  school  classroom  student
yijk    si  c( s)ij  eijk
Level 1 (student): yijk   0ij  eijk
 0ij    si  c( s)ij
Level 3
Level 2 (classroom): yijk   0i  c( s )ij  eijk
GLIMMIX-speak
HLM-speak
 0i    si
14 May 2007
SSP Core Facility
12
Department of Statistics
Modeling Issues
1. Estimate i2’s
2. Estimate, standard error, and interval estimate of 
3. Estimates of package, sample effects
4. a.k.a. Estimates of school and classroom effects
14 May 2007
SSP Core Facility
13
Department of Statistics
Singer: HLM to MIXED
 Unconditional means model
Radenbush & Byrk (2002)
yij   0 j  rij
 0 j   00  u0 j
rij ~ N  0,
2
GLIMMIX

yij    ai  eij
u0 j ~ N  0, 00 
ai ~ N  0, A2  eij ~ N  0, 2 
 Include Level 2 Covariate
one-way random effects model
"HLM-speak"
 0 j   00   01  MEANSES j  u0 j
 yij   00   01  MEANSES j  u0 j  rij
"GLIMMIX-speak"
yij    1 X j  s j  eij
14 May 2007
SSP Core Facility
14
Department of Statistics
Example 2
Blocking & Multi-Location
 Data: SAS for Linear Models: Output 3.7, discussed as
mixed model in section 4.3; Output 11.30; SAS for Mixed
Models, 2nd ed. Section 6.6
 Output 11.30 discussed here
 3 treatments
 8 locations
 location represent a population
 3-12 blocks depending on location
 response = trt + loc + blk(loc) + trtloc + error i.e.
observation = systematic+random model+error
14 May 2007
SSP Core Facility
15
Department of Statistics
Example 2 framed by Extending School / Classroom Example
classroom
students
school
Treatment
classroom
students
school
14 May 2007
Treatment
SSP Core Facility
16
Department of Statistics
Model with Treatment
classroom
students
Treatment
school
yijkl    trt  school (trt )  classroom( school )  student
yijkl     i  s ( )ij  c( s, )ijk  eijkl
Level 1: yijkl   0ijk  eijkl
Level 2: yijkl   0ij  c( s, )ijk  eijkl
Level 3: between school model + trt as above
14 May 2007
SSP Core Facility
17
Department of Statistics
Modeling Issues
1. Appropriate error term to test treatment
2. Standard error of treatment mean
−
(inference space)
3. Intra-block vs. inter-block analysis
14 May 2007
SSP Core Facility
18
Department of Statistics
ANOVA (ignoring block)
Source
d.f.
Expected Mean Square
Treatment
2
2
 2  k1 LT
 QTRT
Location
7
2
 2  k1 LT
 k2 L2
Loc  Trt
14
2
 2  k1 LT
error
dfe
2
Test of TRT
affected
If Location fixed:
14 May 2007
Source
d.f.
Expected Mean Square
Treatment
2
 2  QTRT
Location
7
 2  QLOC
Loc  Trt
14
 2  QLT
error
dfe
2
SSP Core Facility
19
Department of Statistics
Inference Space
Assuming Locations are Fixed
Var(trt mean)=
2
# obs/trt
 Std. error(trt mean)=
MS(error)
 0.91
# obs/trt
HOWEVER... if Locations are Random
Var(trt mean)=
2
 2  k ( L2   LT
)
# obs/trt
 Std. error(trt mean)=
14 May 2007
2
ˆ 2  k (ˆ L2  ˆ LT
)
SSP Core Facility
# obs/trt
 3.62
20
Department of Statistics
Where does Uncertainty Arise?
Loc 1
Loc 2
Only from variation among obs within locations?
Locations fixed
Or does variation among locations also contribute?
Locations random
Loc 7
14 May 2007
Loc 8
SSP Core Facility
21
Department of Statistics
Intra- vs. Inter-block analysis
 Intra- (fixed) block analysis based only on within block
treatment differences
 Inter-block analysis also accounts for variance among blocks
(random combines inter- and intra-)
 Lead to equivalent tests when all treatments appear equally in
each block
 Not equivalent otherwise
 In most cases, combined inter-/intra-block analysis is more
efficient
14 May 2007
SSP Core Facility
22
Department of Statistics
Example 3
Repeated Measures/Longitudinal
 Data: SAS for Linear Models, Output 8.1; SAS for Mixed Models, Chapter 5
 3 treatments (2 test drugs + placebo)
 ni patients per treatment
 8 times of measurement (1, 2, 3, ..., 8 hours post trt)
 baseline measurement at time 0
 response = trt + hour + trthour + pat(trt) + error
observation = systematic + random model + error
i.e.
 Variations on this theme

are “latent growth models”
14 May 2007
SSP Core Facility
23
Department of Statistics
Growth Models – Singer
HLM-speak to GLIMMIX-speak
HLM
Unconditional Linear Growth Model
GLIMMIX
Level 1 (within individual)
Level 1
Within subjects
yij   0 j   1 j  timeij   rij rij ~ N  0,  2 
Level 2
Between subjects
 0 j   00  u0 j
Level 2:
 1 j  10  u1 j

 0   00  01  
 u0 j 
 u  ~ MVN    , 


0
1
j
11  
 
  
yij    00  u0 j    10  u1 j  timeij  rij
   00  10  timeij    u0 j  u1 j  timeij  rij 
  between subject    within subjects 
  population-averaged    subject-specific 
14 May 2007
PA
SSP Core Facility
SS
24
Department of Statistics
Singer (1998)
 Excellent paper translating HLM-speak to Proc Mixed
 Uses Radenbusch & Byrk examples
 Fair Warning to Readers, however – it’s dated
− new features & output revisions in SAS
− some of the output encouraged confusion or poor practice
− specifics
 revised output of Fit Statistics
 Misleading output for variance estimates deleted
 Kenward-Roger procedure for d.f. & std errors
 I’ll update & make switch to Proc GMIMMIX
14 May 2007
SSP Core Facility
25
Department of Statistics
Modeling Issues
1. Errors may be correlated
a. May affect conclusions
b. How to select covariance model
2. Denominator degrees of freedom
3. Bias in standard errors and test statistics
14 May 2007
SSP Core Facility
26
Department of Statistics
Impact of Correlated Errors
14 May 2007
Covariance Model
den df
F-value
Pr>F
errors independent
483
7.11
<0.0001
errors correlated
no structure
(bias corrected)
69
(98.1)
4.06
(3.66)
<0.0001
AR(1)
483
3.93
<0.0001
AR(1)
bias corrected
424
3.89
<0.0001
SSP Core Facility
27
Department of Statistics
Example 4





Data: SAS for Mixed Models, Section 14.5
2 treatment (Test Drug, Control)
8 clinics
clinics represent a population
nij subjects at jth location on ith treatment
 response: favorable or unfavorable (fij = # fav)
 response = trt + clinic + clinicloc + error
= systematic + random model + error
14 May 2007
SSP Core Facility
i.e. observation
28
Department of Statistics
Modeling Issues
1. Response (fij / nij) is binomial, not normal
2. Response may not be linear in model parameters
3. Errors may not be additive
4. Variance of binomial & normal are different
a.
heterogeneous
b. depends of location parameter
14 May 2007
SSP Core Facility
29
Department of Statistics
Generalized Linear Mixed Model
e.g.
Logistic
mixed
model
let  ij  Pr{favorable response | trt i, clinic j}
  ij 
Model: log 
    i  c j  (tc)ij
 1   
ij 

2
c j i.i.d. N(0, C2 ); (tc)ij i.i.d. N(0, TC
)
observations = proportion =
f ij
nij
( fij | c j , (tc)ij ) ~ Binomial ( ij , nij )
 ij modeled by
14 May 2007
exp[   i  c j  (tc)ij ]
1  exp[   i  c j  (tc)ij ]
SSP Core Facility
30
Department of Statistics
Example 5




SAS for Linear Models, Output 10.39
2 treatments
ni persons per treatment
4 times of measurement
 response = number of seizures (count)
 baseline and age observations
 response = trt + hour + trthour + baseline & age pat(trt)
+ error
 i.e. observation = systematic + random model + error
14 May 2007
SSP Core Facility
31
Department of Statistics
Modeling Issues
 Count typically not ~ normal
 Poisson (or negative binomial) more likely
 Generalized Linear Model Issues
− Linear model not good direct model of mean
− Variance depends on mean
 Repeated Measures Issues
− Observations within subjects correlated over time
− Between subject variance
14 May 2007
SSP Core Facility
32
Department of Statistics
Example 6




SAS for Mixed Models, Section 1.5.6
5 treatments
observed in each of 4 randomized blocks
several measurements at days between 130 and 180 growing
degree days
 response = (trt,day) + block + blktrt + error i.e.
observation = systematic + random model + error
14 May 2007
SSP Core Facility
33
Department of Statistics
Emergence over TIME by TRT
Black:
NoTill
Red:
SumBlade
(summer)
Cyan:
SB&SD
Green:
SpDisk
(spring)
Blue:
SpPlow
14 May 2007
SSP Core Facility
34
Department of Statistics
Modeling Issues
 “Usual” mixed model and repeated measures issues,
plus
 Linear model is poor model of trtday means
14 May 2007
SSP Core Facility
35
Department of Statistics
Nonlinear Mixed Model
Mixed Model: yijk  ij  bk  wik  eijk
ij is trt  day mean; bk is block effect
wiku is between subject error
Gompertz Model :
ij   i exp{ exp[ i  ( i  date j )]}
 i is asymptote of i th treatment
 i is "slope" of i th treatment
i is inflection point of i th treatment
i
14 May 2007
SSP Core Facility
36
Department of Statistics
Typology of Models
Type
Mean Model
Distribution
NLMM
h(X,,Z,u)
y|u general,
u normal **
GLMM
h(X+Zu)
y|u general,
u normal *
LMM
X+Zu
u, y|u normal
NLM
h(X,)
y normal
GLM
h(X)
y general
LM
X
y normal
* for PROC GLIMMIX ** for this course
14 May 2007
SSP Core Facility
(G/N)LMM can be more general
37
Department of Statistics
Generalized Mixed Model
SAS Software Decision Table
Response
Errors
Random Effects
Mean Model
Linear?
SAS
Proc
Response
Errors
Random
Effects
Mean Model
Linear?
SAS
Proc
14 May 2007
Normal
Indep
Corr
no
yes
yes
no
yes
no
yes
no
GLM
MIXED
GLIMMIX
NLIN
MIXED
NLMIXED
%NLINMIX
MIXED
GLIMMIX
NLMIXED
%NLINMIX
GLIMMIX
Non-Normal
Indep
Correl
no
yes
GENMOD
GLIMMIX
yes
no
yes
no
yes
GLIMMIX
NLMIXED
NLMIXED
GLIMMIX
(GENMOD)
SSP Core Facility
no
38
Department of Statistics
Essential GLMM Background
14 May 2007
SSP Core Facility
39
Department of Statistics
First
How do I run a SAS Program?

???????
It’s easier than the urban legends
would have you believe
14 May 2007
SSP Core Facility

40
Department of Statistics
Basic Parts of SAS Program
 DATA Step
 PROC Step
Data your_choice_of_name;
Input list of variables;
/* $ after alphameric var */
Datalines;
data – one line / obs, one
column per variable
;
Proc GLIMMIX Data= your_choice_of_name;
CLASS block group & trt var;
MODEL response=block trt covar / options;
...
Run;
 Modify existing data set
(Data __; Set__;)
14 May 2007
comment
Data new_data_set_name;
Set [old – e.g.]
your_choice_of_name;
program & data manipulation
statements. e.g.
LogY=Log(Y);
SSP Core Facility
41
Department of Statistics
Example of SAS Program
DATA Step
data demo1;
input classroom trt $ time
count;
sc=sqrt(count);
datalines;
1 std 1 12
1 std 2 16
1 std 4 17
1 std 8 24
2 exper 1 17
2 exper 2 24
2 exper 4 30
2 exper 8 32
11 std 1 16
11 std 2 15
11 std 4 22
11 std 8 23
8 exper 1 15
8 exper 2 20
8 exper 4 24
8 exper 8 27
;
14 May 2007
PROC Step
proc glimmix data=demo1;
class classroom trt time;
model sc=trt time trt*time
/ dist=normal ddfm=kr;
random classroom(trt);
lsmeans trt*time;
ods output lsmeans=lsm;
run;
Data; Set; + new PROC
data plot_growth;
set lsm;
log_time=log2(time);
symbol i=join
value=circle;
proc gplot
data=plot_growth;
plot
estimate*log_time=trt;
run;
SSP Core Facility
42
Department of Statistics
II. Generalized Mixed Model Theory
 Clarify Fixed vs Random effects
 Linear Models
− LM to LMM + GLM to GLMM
 Estimation and Inference for
− LMM
− GLM
− GLMM
 For GLMM:
− what follows naturally from GLM and LMM
− Special Issues
14 May 2007
SSP Core Facility
43
Department of Statistics
Fixed vs. Random Effects?
 Fixed Effect?
− levels observed = population of interest (except regression)
− levels deliberately chosen
− inference: systematic relationship between y and 
 Random Effect?
− observed levels represent target population
− random sample? -- ideal (but seldom perfectly realized)
− makes sense to conceptualize probability distribution
 Bottom Line: do observed levels of effect plausibly
represent a probability distribution?
− yes  random effect
− no  fixed effect
14 May 2007
SSP Core Facility
44
Department of Statistics
General Structure of Model
 Nelder: observation=systematic + random
 General approach:
− likelihood consists of two parts
 observation (y | u)
 random effects u
− model is mathematical description of  = E(y | u)
 Distribution:
− observation y | u ~ f(,R)
− random effects u ~ MVN(0,G)
 Model:  = h(X,,Z,u)
 h() called “inverse link”
14 May 2007
SSP Core Facility
45
Department of Statistics
Linear Model (LM)
 No random effects
 simple ANOVA (one error term)
 multiple regression
Assumption: y
MVN (  , R)
LM: Model  by X , usually represented as
y  X   e;
e
N (0, R )
alternative representation (helpful for transition to GLMM)
y
14 May 2007
MVN ( X  , R )
SSP Core Facility
46
Department of Statistics
Generalizations of LM
LM (Linear Model)
obs ~ normal
fixed effects only
obs ~non-normal
fixed effects only
GLM: (Generalized Linear Model)
obs ~ normal
Random Effects
LMM: (Linear Mixed Model)
obs ~ non-normal
random effects
GLMM (generalized linear mixed model)
14 May 2007
SSP Core Facility
47
Department of Statistics
GLM: Generalized Linear Model
 Binomial: Logistic regression; Probit models
 Poisson: Log-linear models
Assumption: y
dist (  , R)
R is a function of 
V ( ) called "Variance function" -- more later
GLM: model  =g( ) by X  -- called "link function"
alternatively, model  by h( X  )  " inverse link "
Note: here y  or g (  )   X   e makes no sense
Instead: y
14 May 2007
dist  h( X  ), R 
SSP Core Facility
48
Department of Statistics
LMM: Linear Mixed Model
 Multi-error models; split-plot, multi-location
 Repeated measures a.k.a. Longitudinal data
Assume: y | u MVN (  , R )
u MVN (0, G )
More vocabulary:
LMM: Model  by X   Zu
Familiar notation:
y  X   Zu  e;
u 
e
 
G
MVN 
0
alternatively:
y|u
0
R 
“G-side”
concerns V(u)
“R-side”
concerns V(e)
MVN  X   Zu  ; u ~ MVN (0, G )
or (marginal model)
y
14 May 2007
MVN ( X  ,V ); V  ZGZ   R
SSP Core Facility
49
Department of Statistics
GLMM: Generalized Linear Mixed Model
Assume:
dist (  , R )
y|u
as with GLM
R depends on V (  )
u
MVN (0, G )
GLMM models
link function:
  h  X   Zu 
inverse link:
GLMM:
y|u
Marginal Model:
14 May 2007
 =E ( y | u ) by
 =g (  )  X   Zu
Modelling
will involve
•Distribution
•Link (or inv link)
•G-side
•R-side
dist  h  X   Zu  , R 
 f ( y | u ) f (u )du
(more later)
SSP Core Facility
50
Department of Statistics
Some Grounding Before Moving On
 “Hessian Fly” example, Gotway
& Stroup (1997, JABES)
 “Hessian Fly” not so important,
but design & data structure are
 16 treatments, 4 replications: 1
3
4x4 Lattice
2
5
6
1
5
2
4
7
8
9
13
10 14
10
13
14
3
7
4
12
15
16
11
15
12 16
1
6
2
5
1
14
13 2
11
16
12
15
7
12
11
8
1
14
13
10
5
10
9
6
3
8
7
4
3
16
15 4
− 16 incomplete blocks organized into 9
11
4 complete blocks
 Response: Yij/nij
(damaged / obs per trt x block unit)
14 May 2007
SSP Core Facility
6
8
51
Department of Statistics
Linear Model (LM)
Randomized Complete Block
yij     i   j  eij ; eij i.i.d. N  0, 2 
 i  block effect;  i  treatment effect
proc glimmix;
class block entry;
model pct=block entry;
Incomplete Block Model - Intra-block analysis
incomplete block replaces complete block in denoting  i
proc glimmix;
class inc_block entry;
model pct=inc_block entry;
14 May 2007
SSP Core Facility
52
Department of Statistics
Linear Mixed Model (LMM)
Randomized Complete Block - Random block effects
yij    ri   j  eij
ri i.i.d. N  0,  R2  ;
eij i.i.d. N  0,  2  ;
ri  block effect;  i  treatment effect
proc glimmix;
class block entry;
model pct=entry;
random block;
G-side
modeling block effect
Incomplete block (recovery of interblock information)
Replace “block” by “inc_block”)
14 May 2007
SSP Core Facility
53
Department of Statistics
LMM
G-side / R-side
Two alternative “G-side” specifications:
proc glimmix;
class block entry;
model pct=entry;
random block;
R-side specification
proc glimmix;
class block entry;
model pct=entry;
random intercept/subject=block;
proc glimmix;
class block entry;
model pct=entry;
random _residual_ /
type=cs subject=block;
Here, it doesn’t matter (all equivalent) but for more complex
models, the distinctions will matter
14 May 2007
SSP Core Facility
54
Department of Statistics
Generalized Linear Model (GLM)
yij
Binomial  nij ,  ij 
  ij
GLM ("Logit ANOVA" model): log 
1
ij

proc glimmix;
class block entry;
model y/n = block entry;

     i   j

or replace “block” by
“inc_block” for
intra-block logit ANOVA
More on GLIMMIX syntax later
Here, note Y/N causes default to Binomial distribution & Logit link
(same as GENMOD)
14 May 2007
SSP Core Facility
55
Department of Statistics
Generalized Linear Mixed Model (GLMM)
yij | block effects
Binomial  nij ,  ij 
block effects ri i.i.d. N  0,  R2 
  ij 
GLM ("Logit ANOVA" mixed model): log 
   ri   j
 1   
ij 

proc glimmix;
proc glimmix;
class block entry;
class block entry;
model y/n = entry;
model y/n = entry;
random intercept / subject=block;
random block;
Marginal model
not equivalent
14 May 2007
proc glimmix;
class block entry;
model y/n = entry;
random _residual_ / type=cs subject=block;
SSP Core Facility
56
Department of Statistics
II. Inference in LM, GLM, LMM, and GLMM
Inference for fixed effects based on estimable functions
In LM theory, K  estimable if it can be expressed as AE ( y )
i.e. K   AX 
OLS ˆ  ( X X )  X y
theorem : K  estimable iff K '  K '( X X )  ( X X )
Main advantage
K ˆ invariant to choice of ( X WX ) 
i.e. when X not full rank,  has no intrinsic interpretation
K  does
(e.g. treatment difference, marginal (least squares) mean
14 May 2007
SSP Core Facility
57
Department of Statistics
II. Examples of Estimable Functions
e.g . one way model:
yij     j  eij ; i  1, 2,3, 4; j  1,..., n
Estimable functions include

Trt marginal ("Least Squares") mean (LSMean)
 + i e.g .

1 0 0 0 for i  1
Trt differences
e.g. 1   2

k   1
SS(trt)
k    0 1 1 0 0
K such that all  i equal
0 1 0 0 1
e.g . K   0 0 1 0 1


0 0 0 0 1
14 May 2007
SSP Core Facility
58
Department of Statistics
II. Common Inference Results for GLM
K ˆ ~ approx MVN ( K  , K ( X WX )  K )
exact for LM
Wald statistic:
purpose: test H 0 : K   0
Wald  ( K ˆ )[ K ( X WX )  K ]1 ( K ˆ )
2
approx ~  rank
(K )
Note in OLS
Wald 
14 May 2007
SS ( H 0 )
2
SSP Core Facility
59
Department of Statistics
II. GLM: Inference with Unknown Scale Parameter
Recall, in OLS
SS ( H 0 )
Wald 
2
But what if  2 unknown?
Think ANOVA:
Thus, Wald
rank ( K )
SS ( H 0 )
Use
ˆ 2

SS ( H 0 )
MSE
SS ( H 0 ) dfh
~ F( dfh ,dfe )
MSE
Generalization:
in GLM, scale parameter 
14 May 2007
Pearson  2
Deviance
or
dfe
dfe
SSP Core Facility
60
Department of Statistics
II. Extension of GLM Scale Parameter
Quasi-Likelihood
 Overdispersion
Counts
Poisson  E ( y )  Var ( y )
but in practice E ( y )  Var ( y )
Quasi-likelihood: you specify E  y    Var ( y )  
 “Working Correlation”
Repeated Measures
Assumed distribution  Var ( y )  diag V (  ) 
But in reality, errors are correlated, so model variance as
Var ( y )  R 2 AR 2 where R 2  diag  V (  ) 
1
1
1
A is working correlation - structure analogous to true R-side in LMM
14 May 2007
SSP Core Facility
61
Department of Statistics
II. GLM: Deviance and Likelihood Ratio Test
Full model:
X
i.e.   h( X  )
Decompose as X 1 1  X 2  2
Suppose we want to test H 0 :  2  0
1. Fit full model
Dev( X  )  2 log[ ( X  )  ( y )]
2. Fit reduced model X 1 1
Dev( X 1  )  2 log[ ( X 1 1 )  ( y )]
3. LR statistic
Dev( X  )  Dev ( X 1 1 )
14 May 2007
SSP Core Facility
62
Department of Statistics
II. LMM: The “Mixed Model Equations”
( y )  ( y  X   Zu )R 1 ( y  X   Zu )  uG 1u
 ( y)
 X R 1 ( y  X   Zu )

 ( y)
and
 Z R 1 ( y  X   Zu )  G 1u
u
solving yields

 X R 1 X

1

X
R
Z

     X R 1 y 
X R 1Z
   

Z R Z  G   u   Z R 1 y 
1
Mixed Model Solution
1
note:
Marginal Model Solution
uˆ  GZ V 1 ( y  X  ) and ˆ  ( X V 1 X )  X V 1 y
14 May 2007
SSP Core Facility
63
Department of Statistics
II. LMM Inference – G and R known
Inference based on Predictable functions
K   M u "predictable" if K  is estimable
(reduces to estimable function K  if focus on fixed effects only)
K 
1. Var[ K   M (u  uˆ )]  [ K  M ]C  
M 
 X R X
X R Z 
where C  
1
1
1 


Z
R
X
Z
R
Z

G


2. Let L   K  M  and  =    u
1
1
_
Wald statistic for tests on L is
( Lˆ)[ LCL]1 ( Lˆ) ~  2
rank ( L )
14 May 2007
SSP Core Facility
64
Department of Statistics
II. LMM Inference – G and R unknown
1. Replace G and R by Gˆ and Rˆ
 estimate variance and covariance components
2. Denote Cˆ as C with estimated var/cov components
ˆ
3. "Naive" Var[ L(  ˆ)]  LCL
ˆ )  LCL
but E ( LCL
 Kenward-Roger adjustment
4. Approximate F
ˆ ]( L )



(
L

)
[
L
CL
Wald

rank ( L)
rank ( L)
approx Frank ( L ),
F may be biased  ;  often must be approximated
14 May 2007
SSP Core Facility
65
Department of Statistics
II. LMM: Variance Component Estimation
Several methods
1. For variance-component-only models: use
EMS from ANOVA
2. Maximum likelihood
− problem: biased
3. Restricted maximum likelihood
4. Several computational approaches
a. Newton Raphson
b. Fisher Scoring
c. EM
14 May 2007
SSP Core Facility
66
Department of Statistics
What’s Wrong with ML?
 An example to illustrate
 SAS for Mixed Models, Data Set 1.5.1
 Incomplete Block design from Cochran & Cox,
Experimental Designs, p 456
 15 treatments
 15 blocks
 4 treatments observed per block
14 May 2007
SSP Core Facility
67
Department of Statistics
C&C Example: ML and two alternatives
Intrablock (fixed block) analysis
equivalent to
PROC GLM
proc glimmix data=cc456;
class trt bloc;
model y=trt bloc;
Inter/Intra-block (random block)analysis –default
proc glimmix data=cc456;
class trt bloc;
model y=trt;
random bloc;
PROC MIXED default
give same result
Inter/Intra-block (random block) analysis – ML
proc glimmix data=cc456 method=mspl;
class trt bloc;
model y=trt;
random bloc;
14 May 2007
SSP Core Facility
same as
Proc MIXED
METHOD=ML;
68
Department of Statistics
ML vs Alternative Results: Which is Right?
Intrablock (fixed block)
Type III Tests of Fixed Effects
ˆ  8.62
2
Effect
Num Den
D
D
F
F F Value
trt
14
31
1.23 0.3012
Type III Tests of Fixed Effects
Intra/inter- block
(random) block default
ˆ R2  4.65
ˆ 2  8.56
Effect
Num
D
F
trt
Intra/inter- block
(random) block - ML
ˆ  4.50
2
R
14 May 2007
ˆ  6.04
Pr > F
14
Den
D
F
36.2
F Value
Pr > F
1.48
0.1676
Type III Tests of Fixed Effects
Effect
Num
DF
Den
DF
F Value
Pr > F
14
49.04
2.02
0.0352
2
trt
SSP Core Facility
69
Department of Statistics
Simulation
 ML or REML
 1000 simulated data sets using C & C, p 456
design
 B2/2 = 0.5
 Recorded type I error rate for Ftrt
− intrablock
−
−
14 May 2007
Variable
N
Mean
REML random block ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
fixd_rej05
1000
0.0590000
ML random block
REML_rej05
1000
0.0610000
ML_rej05
1000
0.2140000
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
SSP Core Facility
70
Department of Statistics
II. LMM with estimated G and R
Bias in std error and test statistics
Kenward & Roger (Biometrics, 1997)
Consider estimable function K 
When  unknown, estimates used to obtain Vˆ
 "naive" estimate Var ( K ˆ )  K ( X Vˆ 1 X )  K
Using Taylor series expansion, can show
E[ K ( X Vˆ 1 X )  K ]
2
1

1

(
X
V
X)
1




ˆ
ˆ
 K ( X V X ) K   cov( i ,  j ) K
K
2 i, j
 i  j
14 May 2007
SSP Core Facility
71
Department of Statistics
II. LMM: Degrees of Freedom
Simple Case
model: yijk     i  b j  (ab)ij  eijk
bj
N (0,  B2 ); (ab)ij
ANOVA Source
14 May 2007
2
N (0,  AB
); eijk
N (0,  2 )
EMS
A
2
 2  n AB
 QA
B
2
 2  n AB
 na B2
AB
2
 2  n AB
error
2
SSP Core Facility
72
Department of Statistics
II. Degrees of Freedom (2)
Trt diff:
1   2
 nb  (
Var (ˆ1  ˆ 2 )  2
2
 n
2
AB
)
2  MS ( AB)
nb
   denominator d.f.=df ( AB)
Trt mean:  + i
 nb 
 b  1 

1
1
 nb   b  MS ( AB)   b  MS (B)
Var ( ˆ +ˆ i )  1
2
( 2  n AB
 n B2 )
  approximated via Satterthwaite's procedure
14 May 2007
SSP Core Facility
73
Department of Statistics
II. Satterthwaite Approximation
for linear combination of MS
MS   ci MSi
i
approximate d.f. for MS is
2
2
 b -1 



1
MSAB

MSB
c
MS
 
 i i 
 b 



b
i





e.g.
2
2
2
2
ci MSi
  b -1  2
   1 2

i df

 MSAB     MSB 
i
 b 
   b 

 df ( AB )
  df ( B ) 

 


 

14 May 2007
SSP Core Facility
74
Department of Statistics
II. Satterthwaite Approximation in LMM
Approximation:
2[ E ( K ( X V 1 X )  K )]2
2( K ( X V 1 X )  K ) 2

or
1

Var ( K ( X V X ) K )
Var ( K ( X V 1 X )  K )
For vector K (e.g. treatment contrast):
Approximate Var ( K ( X V 1 X )  K ) by g Ag
 ( K ( X V 1 X )  K )
g
, where   vector of (co)variance components

A2

{trace


V  ZGZ   R,
14 May 2007
 V
(P 
 
 i
 V

 P 

  j







)


1
P  V 1  V 1XCX V 1
SSP Core Facility
75
Department of Statistics
II. GLMM Estimation
GLMM is model of E ( y | u )
Link form: g  E ( y | u )     X   Zu
Inverse link form: E ( y | u )   = h( X   Zu )
More general expression of distribution of  y | u 
Var  y | u   R  R AR 2
1
2
1
R 2  diag  V ( i )   A is "working correlation matrix"


Estimation: as with LMM, may choose to focus on
1
1.  only
GLS equations in LMM;
Generalized Estimating Equations with GLMM
2.  and u
14 May 2007
several approaches
SSP Core Facility
76
Department of Statistics
II. Working Correlation
Recall Gotway & Stroup (1997) Hessian Fly Example
Gotway and Stroup considered
spatial variation among e.u.
proc glimmix;
class block entry;
model y/n=entry;
random intercept /
subject=block;
random _residual_ /
type=sp(sph)(row col)
subject=block;
1
2
5
6
1
5
2
6
3
4
7
8
9
13
10 14
9
10
13
14
3
7
4
11
12
15
16
11
15
12 16
1
6
2
5
1
14
13 2
11
16
12
15
7
12
11
8
1
14
13
10
5
10
9
6
3
8
7
4
3
16
15 4
8
MODEL sets up Binomial GLM, Logit link
RANDOM _RESIDUAL_ sets up a working correlation
based on SPHERICAL semivariogram
14 May 2007
SSP Core Facility
77
Department of Statistics
II. Marginal (PA) vs Subject-Specific Inference
Marginal Mean: E ( y )
Population Averaged (PA)
Conditional Mean: E ( y | u )
Note: E ( y )  E  E ( y | u )   E  h( X   Zu ) 
SS (true GLMM)
In general, cannot be further simplied
Example: log link, u ~normal
  E ( y | u )  exp( X   Zu )
 E ( y )  E  exp( X   Zu )   exp( X  ) M u ( Z )
M u ( Z ) is moment generating function of U eval at Z
  u2
 E ( y )  exp( X  ) exp 
 2
14 May 2007

  u2 
  log  E ( y )   X    2 



SSP Core Facility
78
Department of Statistics
II. More on PA (marginal) vs. SS
Probit-normal model:
Pr( y  1| u )   ( X   Zu );
u
N (0, G )
can show
X


E ( y)   
 ( X  )

 Z GZ  1 
in LMM, model X   Zu  e; u
N (0, I  u2 ); e
N (0, I  e2 )
1  .  


1
.


and X   e; e N  0, R  ; R   2 
. 



1


are equivalent. However, in GLMM, they are not. Yield
different estimates, std. errors, etc.
14 May 2007
SSP Core Facility
79
Department of Statistics
II. Estimation of GLMM
 model E(y|u)
 inverse link: E(y|u)=h(X+Zu)
 link: g[E(y|u)]==X+Zu
 to estimate  and u need to evaluate f(y), f(y|u)
− approximate e.g. by Taylor series expansion
 Penalized Quasi-Likelihood (SAS %GLIMMIX)
 SAS PROC GLIMMIX (next slides)
− numerical integrate joint density
 Gauss-Hermite Quadrature (Proc NLMIXED)
− stochastically evaluate integral
 Monte Carlo Markov Chain (WinBugs – not in this course)
14 May 2007
SSP Core Facility
80
Department of Statistics
II. Computational Method Comparison
 GEE
− Computationally easy
− Meaning of marginal results in GLM?
 Linearized GLMM (current PROC GLIMMIX)
− uses familiar LMM analogs (but many are ad hoc & need further research)
− allows considerable R-side flexibility
− adequate for many GLMM; breaks down for certain cases (binary data)
 Integral Approximation (PROC NLMIXED)
− better approximation that Linearized GLMM
− BUT: ML only, simple G-side models only, no R-side
 LaPlace
− computationally less demanding than Integral approximation but often
“accurate enough”; same limitations as Integral approximations
 MCMC
− simple models only; limited & temperamental software
− but in extreme cases, only way to get accurate results
14 May 2007
SSP Core Facility
81
Department of Statistics
Modeling
Considerations
14 May 2007
SSP Core Facility
82
Department of Statistics
Basic Parts of SAS Program
 DATA Step
 PROC Step
Data your_choice_of_name;
Input list of variables;
/* $ after alphameric var */
Datalines;
comment
data – one line / obs, one
column per variable
;
proc glimmix data=demo1;
class classroom trt time;
model sc=trt time trt*time
/ dist=normal ddfm=kr;
random classroom(trt);
lsmeans trt*time;
ods output lsmeans=lsm;
run;
14 May 2007
SSP Core Facility
83
Department of Statistics
III. Modeling Considerations
 Overdispersion
 Marginal (PA) vs Conditional (SS) models
 “Data” vs “Model” Scale
14 May 2007
SSP Core Facility
84
Department of Statistics
III. Model Considerations





Variance Model & Overdispersion
Choice of Link Function
Choice of Distribution
Choice of Model Effects
Correlated Errors?
 Any of the above could show up as
“overdispersion”
14 May 2007
SSP Core Facility
85
Department of Statistics
III. GLMM: Model Considerations
 Common dilemma
 Design, e.g. like “Hessian fly”
example
 BINOMIAL data
 Recover interblock
information - BLOCK
random
Model (Logit GLMM):
or equivalently
 ij 
1
2
5
6
1
5
2
6
3
4
7
8
9
13
10
14
9
10
13
14
3
7
4
8
11
12
15
16
11
15
12
16
1
6
2
5
1
14
13
2
11
16
12
15
7
12
11
8
1
14
13
10
5
10
9
6
3
8
7
4
3
16
15
4
exp    ri   j 
1  exp    ri   j 
  ij
log 
 1 
ij


    ri   j

Analysis reveals that the data are overdispersed
14 May 2007
SSP Core Facility
86
Department of Statistics
III. Hessian Fly Example
proc glimmix data=HessianFly;
class block entry;
model y/n = entry;
random block;
Fit Statistics
-2 Res Log Pseudo-Likelihood
182.21
Generalized Chi-Square
107.96
Gener. Chi-Square / DF
2.25
Evidence of Overdispersion
when >>1
14 May 2007
SSP Core Facility
87
Department of Statistics
III. Overdispersion
 Observed variance > variance under presumed model
 Symptom: Deviance/DFE or chi-square/DFE >> 1
 Uniquely a GLM / GLMM issue
− not a consideration with LM, LMM
− y|u ~ normal implies variance not a function of mean
 When is there an issue
− If Var(y) = f[E(y)] and
− using scale adjustment requires unrealistic assumptions
14 May 2007
SSP Core Facility
88
Department of Statistics
III. Common fix for Overdispersion
Multiply variance by scale parameter. Here:  1   
proc glimmix
data=HessianFly;
class block entry;
model y/n= entry;
random block;
random _residual_;
Covariance Parameter Estimates
Cov Parm
Subject
Intercept
block
0
.
2.2668
0.4627
Residual (VC)
estimates 
Issue: not a true
likelihood
Covariance Parameter Estimates
vs.
w/o ˆ
14 May 2007
Estimate
Standard
Error
Cov Parm
Subject
Intercept
block
SSP Core Facility
Estimate
Standard
Error
0.01116
0.03116
89
Department of Statistics
Impact of Scale Parameter on Inference
Type III Tests of Fixed Effects
no scale parameter
Effect
entry
with
scale parameter
adjustment
Num Den
DF DF
15
45
F Value
Pr > F
6.90
<.0001
Type III Tests of Fixed Effects
Effect
Num
DF
Den
DF
F Value
Pr > F
entry
15
45
3.03
0.0020
failure to account for overdispersion tends to
increase type I error rate
but is this the best way to address the problem?
14 May 2007
SSP Core Facility
90
Department of Statistics
III. Mean – Variance Overdispersion Models
Var ( y )  f (  ,  )
 1    , 
No scale parameter
 binomial, poisson 
Nonlinear scale parameter
 1- 
1+
 negative binomial, gen. poisson, beta 
 2
Linear scale parameter
 gamma, inverse gaussian 

No mean parameter
 normal 
14 May 2007
SSP Core Facility
91
Department of Statistics
III. Marginal or Conditional Formulation
 For many models (notably LMM) there are
equivalent forms
− conditional (mixed, SS) model
− marginal (PA) model
− lead to the same marginal log-likelihood
 Distinction results from
− G-side model; random model effects
− R-side model; marginal model
14 May 2007
SSP Core Facility
92
Department of Statistics
III. Example: variance component (G-side) vs.
Compound symmetry (R-side)
yij    ri   j  eij
ri
i.i.d. N  0,  R2 
eij
i.i.d. N  0,  2 
 R2   2
 R2
...
 R2 


2
2
2



...

R
R

Var Yi    R2 J   2 I  

...
... 

2
2
R  

14 May 2007
SSP Core Facility
93
Department of Statistics
III. Compound Symmetry Equivalent
2



2
2
2
R
Let  C   R   and  =  2
2 



 R

Model:
yij     i  Eij
  if i  k (same block)
Var  Eij   
Corr  Eij , Ekl   
0 otherwise

1  ...  
 1 ...  
Models equivalent if   0
2 

Var Yi    C
... ...



1


2
C
14 May 2007
SSP Core Facility
94
Department of Statistics
III. G-side / R-side
proc glimmix;
class block entry;
model y/n=entry;
random block;
proc glimmix;
class block entry;
model y/n=entry;
random intercept /
subject=block;
R-side model
same model
proc glimmix;
G-side
class block entry;
model y/n=entry;
random _residual_ / type=CS subject=block;
proc mixed;
class block entry;
model y=entry;
repeated / type=CS subject=block;
14 May 2007
SSP Core Facility
95
Department of Statistics
III. Variance Component vs CS in GLMM
 Variance component model is GLMM
 CS model is GEE
 They are not equivalent
Conditional model:
 yij | ui
logit  ij     ri   j
 exp    ri   j 



Binomial 


   ri   j  
1

exp


marginal distribution is p( yij )   p  yij | ui  p(ui ) dui
Marginal model:
logit  ij   i   j
with working correlation matrix defined by CS form
yij is NOT Binomial, merely borrow Binomial-like quasi-likelihood form
Does such a distribution actually exist?
14 May 2007
SSP Core Facility
96
Department of Statistics
III. Conditional vs. Marginal Results
Marginal
Conditional
Fit Statistics
Gener. Chi-Square / DF
Fit Statistics
2.27
Gener. Chi-Square / DF
Covariance Parameter Estimates
Cov Parm
Subject
Intercept
block
Residual (VC)
Covariance Parameter Estimates
Estimate
0
2.2668
Cov Parm
Subject
Estimate
CS
block
-0.03247
Residual
Type III Tests of Fixed Effects
Effect
Den
DF
F Value
Pr > F
entry
15
45
3.03
0.0020
14 May 2007
2.2992
Type III Tests of Fixed Effects
Num
DF
which is right?
2.30
Effect
Num
DF
Den
DF
F Value
Pr > F
entry
15
45
2.99
0.0023
•fit statistic?
•can you simulate data using
mechanism implied by model?
SSP Core Facility
97
Department of Statistics
III. Marginal or Conditional?
 How to choose?
− Conditional: G-side; Marginal: R-side
− Fit statistic? (may help; may deceive)
 General recommendation
− G-side formulation preferred for non-normal data
− G-side effects operate inside the link function & hence
always lead to valid conditional & marginal distributions
− R-side effects operate outside the link function
− for non-normal data, models implied by R-side effects
may be vacuous
14 May 2007
SSP Core Facility
98
Department of Statistics
III. Impact of Model Effects
 Back to Hessian Fly Data
 Incomplete Block Design
 Try more appropriate model
Fit Statistics
Gener. Chi-Square / DF
1.41
Covariance Parameter Estimates
proc glimmix;
class inc_block entry;
model y/n-entry;
random intercept /
subject=inc_block;
14 May 2007
Cov Parm
Subject
Intercept
inc_block
Estimate
0.4971
Type III Tests of Fixed Effects
Effect
Num
DF
Den
DF
F Value
Pr > F
entry
15
33
6.33
<.0001
SSP Core Facility
99
Department of Statistics
III. Inference
 After model fit & estimation, inference begins
 Also want at least some of following
 comparisons among groups (trt, entry...)
− test hypotheses
− obtain confidence intervals
− obtain predictions
− further model checking
14 May 2007
SSP Core Facility
100
Department of Statistics
III. Scale issue for GLM, GLMM
 For GLM, GLMM there are two “natural scales”
− linear (or model) scale (e.g. logit)
− data scale
 May be other scales, depending on context
− odds
− odds ratio
14 May 2007
SSP Core Facility
101
Department of Statistics
III. Choosing the Scale
 Example: Hessian Fly – binomial dist, logit link
 Data: measured as 0/1; per e.u. as Y/N
 Main focus: entry effect on P{indiv resp = 1}
Link:
Inverse Link:
14 May 2007
  ij
log 
 1 
ij


  ij     ri   j

exp ˆij 
ˆij 
1  exp ˆij 
SSP Core Facility
102
Department of Statistics
III. Scale and Inference
Main tool of inference: estimable functions
e.g.
entry "LS Mean" ˆ +ˆ j
ˆ j  ˆ j
entry difference
These are estimated on the "linear" or "model" scale
can denote:
ˆ or ˆ  ˆ 
j
j
j
Main focus of inference: on data scale
e.g.
P resp  1| entry  i  ˆ j
entry difference between probabilities
ˆ j  ˆ j
Require "inverse linking":
14 May 2007
ˆ j 
 
exp ˆ j
SSP Core Facility
 
1  exp ˆ j
103
Department of Statistics
III. Inverse Linking
 Estimation occurs on model scale
 But reporting typically must occur on data scale
Estimate:
ˆ  K ˆ
Std error:
s.e. ˆ   k Var ( ˆ )k
Confidence interval:
  z   s.e. ˆ  
2
Inverse linked estimate
ˆ  h ˆ 
 
e.g.
exp ˆ 
1  exp ˆ 
 h ˆ  
“delta”
s.e.  ˆ   
 s.e ˆ 

 
rule

Inverse linked confidence interval
h( LowerB ), h(UpperB )
Inverse linked std error
14 May 2007
SSP Core Facility
104
Department of Statistics
III. Model & Data Scale – Hessian Fly Example
Solutions for Fixed Effects
Effect
entry
Intercept
Estimate
Standard Error
DF
t Value
Pr > |t|
-1.9057
0.4886
15
-3.90
0.0014
entry
1
3.8001
0.6327
33
6.01
<.0001
entry
2
3.4821
0.6186
33
5.63
<.0001
Estimates
Estimate
Standard
Error
Lower
Upper
Mean
Standard
Error
Mean
entry 1
1.8944
0.4608
0.9568
2.8319
0.8693
0.05237
0.7225
0.9444
entry 2
1.5765
0.4321
0.6974
2.4555
0.8287
0.06133
0.6676
0.9210
diff entry 1-2
0.3179
0.5793
-0.8607
1.4965
0.5788
0.1412
0.2972
0.8171
Label
linear or model scale
14 May 2007
SSP Core Facility
Lower
Mean
Upper
Mean
which of these
data scale make NO
sense?
105
Department of Statistics
on to GLIMMIX
14 May 2007
SSP Core Facility
106
Department of Statistics
IV. GLIMMIX Syntax
 SAS software for GLMs & Mixed models
 Basic GLIMMIX syntax
 Similarities & Differences vs existing SAS Procs
 New features
14 May 2007
SSP Core Facility
107
Department of Statistics
IV. SAS Software for Linear Models
 LM
− Proc GLM, MIXED
− Proc GLIMMIX
 GLM
− Proc GENMOD
− Proc GLIMMIX
Proc NLMIXED
 LMM
− Proc MIXED
− Proc GLIMMIX
 GLMM
− Proc GLIMMIX
14 May 2007
Proc NLMIXED
SSP Core Facility
108
Department of Statistics
IV. PROC GLIMMIX Syntax
 What’s familiar (from MIXED & GENMOD)
−
−
−
−
−
−
CLASS
MODEL
DIST and LINK options in MODEL (like GENMOD)
RANDOM (for G-side)
ESTIMATE, CONTRAST, LSMEANS
ODS
 What’s new or different
−
−
−
−
−
RANDOM _RESIDUAL_ (replaces REPEATED for R-side)
LSMESTIMATE
new options in LSMEANS (e.g. better options for factorial exp)
NLOPTIONS
Model diagnostics
14 May 2007
SSP Core Facility
109
Department of Statistics
IV. Relation between GLMM Structure and
GLIMMIX Code
y | u ~ dist   , R 
Var (u )  G
GLMM: g   | u   X   Zu
Var  y | u   V PV 2
1
2
1
proc glimmix;
class variables;
model <resp>=<fixed effects> /dist= link= ;
random <g-side effects> / <options>;
random _residual_ / type= subject= ;
run;
14 May 2007
SSP Core Facility
110
Department of Statistics
IV. NLOPTIONS Statement
 New Statement in GLIMMIX
 Controls Optimization technique, Line Search
Method, number of Iterations, etc
proc glimmix;
class id a b;
model y=a b a*b;
random _residual_ / type=cs subject=id(a);
nloptions tech=nrridge maxiter=100;
TECH=NRRIDGE causes GLIMMIX to use
MIXED computing algorithm (good for comparison...)
14 May 2007
SSP Core Facility
111
Department of Statistics
IV. Programming Statements
 Similar to GENMOD, NLIN, NLMIXED
 GLIMMIX supports statements using DATA step syntax
 Use to transform variables, define quantities to output,
user-defined link, variance, etc.
 For example....
proc glimmix;
class block entry;
pct=y/n;
model pct=entry;
random intercept / subject=block;
14 May 2007
SSP Core Facility
112
Department of Statistics
IV. Some GLIMMIX Defaults Useful to Know
 In MODEL statement
− response Y= NORMAL distribution & IDENTITY link
− response Y/N= BINOMIAL distribution and LOGIT link
 For distributions without scale parameter in
variance function (e.g. Binomial, Poisson)
− no scale parameter assumed (unlike %GLIMMIX macro)
− obtain scale parameter with RANDOM _RESIDUAL_
 Optimization method automatically matched
based on DISTRIBUTION & LINK
14 May 2007
SSP Core Facility
113
Department of Statistics
IV. Estimation Methods in PROC GLIMMIX
 Defaults depend on model, distribution, and link
 May be altered with METHOD= option
− in PROC statement
 METHOD= options
− variations on pseudo-likelihood
Restricted obj fct
− RSPL
(like REML)
− RMPL
− MSPL
Unrestricted obj fct
(like ML)
− MMPL
14 May 2007
SSP Core Facility
subject specific
(conditional or mixed)
model
population averaged
(marginal) model
114
Department of Statistics
IV. Defaults & Methods (continued)
 GLMM Default Method is RSPL
 For LMM, this is REML
− GLIMMIX uses different algorithm than MIXED, TECH=NRRIDG
uses MIXED algorithm
− you can get slightly different numbers with MIXED/GLIMMIX
 METHOD=MSPL yields ML estimates
 Methods appear in literature as MPL, PQL
 Gaussian adaptive quadrature and LaPlace
algorithms will be added to V 9.2
− not available yet & not discussed here
14 May 2007
SSP Core Facility
115
Department of Statistics
IV. Examples
proc glimmix;
class id;
model y=x / dist=poisson;
run;
proc glimmix;
class id;
model y=x / dist=poisson;
random _residual_;
run;
proc glimmix;
class id;
_variance_=_mu_*_mu_;
model y=x / dist=poisson;
run;
14 May 2007
SSP Core Facility
Poisson regression
Log link
Poisson regression
Log link
add scale parameter
Poisson regression
Log link
change variance function
116
Department of Statistics
IV. “GLM-mode” vs “GLMM-mode”
 Use following trick to get GLM (GENMOD) type
model via pseudo-likelihood
proc glimmix;
class id;
model y=x / dist=poisson;
random _residual_;
proc glimmix;
class id;
model y=x / dist=poisson;
random _residual_ / subject=id;
14 May 2007
SSP Core Facility
“GLM-mode”
max likelihood
“GLMM-mode”
pseudo likelihood
this is a GEE with
indep working corr
117
Department of Statistics
IV. Distributions supported by GLIMMIX
14 May 2007
Discrete
Continuous
Binary
Binomial
Beta
Normal
Poisson
Lognormal
Geometric
Negative Binomial
Multinomial
−Nominal
−Ordinal
Gamma
Exponential
Inverse Gaussian
Shifted T
SSP Core Facility
118
Department of Statistics
IV. MIXED to GLIMMIX – R-side
proc mixed;
class loc id trt time;
model y=trt | time;
random loc;
repeated / type=ar(1) subject=id(loc);
proc glimmix;
class loc id trt time;
model y=trt | time;
random intercept / subject=loc;
random _residual_ / type=ar(1) subject=id(loc);
when you use GLIMMIX, you will notice it is much
fussier about SUBJECT= statement when nested
subject structure is present (MIXED more likely to
let you get away with ignoring SUBJECT)
14 May 2007
SSP Core Facility
119
Department of Statistics
IV. More on R-side
proc mixed;
class loc id trt time;
model y=trt | time;
random loc;
repeated time / type=ar(1) subject=id(loc);
alternative form
of random residual
e.g when time
points missing,
unsorted
etc.
proc glimmix;
class loc id trt time;
model y=trt | time;
random intercept / subject=loc;
random time / type=ar(1) subject=id(loc) residual;
** vs random _residual_ / type=ar(1) subject=id(loc);
14 May 2007
SSP Core Facility
120
Department of Statistics
IV. MIXED to GLIMMIX - Estimate
 MIXED: single row ESTIMATE statements
proc mixed;
class trt;
model y=trt a x trt*a trt*x;
estimate ’10 3’ trt 1 -1 trt*a 10 -10 trt*x 3 -3;
estimate ’20 3’ trt 1 -1 trt*a 20 -20 trt*x 3 -3;
estimate ’30 3’ trt 1 -1 trt*a 30 -30 trt*x 3 -3;
 GLIMMIX: multi-row with multiplicity adjustment
proc glimmix;
class trt;
model y=trt a x trt*a trt*x;
estimate ’10 3’ trt 1 -1 trt*a 10 -10 trt*x 3 -3,
’20 3’ trt 1 -1 trt*a 20 -20 trt*x 3 -3,
’30 3’ trt 1 -1 trt*a 30 -30 trt*x 3 -3 / adjust=scheffe;
14 May 2007
SSP Core Facility
121
Department of Statistics
IV. MIXED vs. GLIMMIX - LSMEANS
 Example: Factorial
PROC MIXED;
class A B;
model y=A|B;
lsmeans A B/diff;
lsmeans A*B/diff slice=(A B);
gives you table of all
possible differences
tests – but does not
estimate – simple effects
A given B, vice versa
PROC GLIMMIX;
gives multiple range
class A B;
display users love 
model y=A|B;
lsmeans A B/diff lines;
lsmeans A*B / slice=(A B) slicediff=(A B); restricts A*B diffs
to actual simple
effects, e.g. A1-A2|Bj
14 May 2007
SSP Core Facility
122
Department of Statistics
IV. GLIMMIX – LSMEANS (1) Main Effects
B Least Squares Means
B
Estimate
Standard
Error
1
18.5300
1.3226
13.69
14.01
<.0001
2
26.5200
1.3226
13.69
20.05
<.0001
4
28.2800
1.3226
13.69
21.38
<.0001
8
25.3000
1.3226
13.69
19.13
<.0001
DF
t Value
Pr > |t|
T Grouping for B Least
Squares Means
LS-means with the same
letter are not significantly
different.
B
Estimate
4
28.2800
A
A
proc glimmix data=AxB_example;
class block A B;
model y=A|B/ddfm=satterth;
random block block*B;
lsmeans A B/diff lines;
lsmeans A*B/slicediff=(A B);
run;
14 May 2007
SSP Core Facility
2
26.5200
A
A
8
25.3000
A
1
18.5300
B
123
Department of Statistics
IV. GLIMMIX – LSMEANS (2) Simple Effects
proc glimmix
data=AxB_example;
class block A B;
model y=A|B/ddfm=satterth;
random block block*B;
lsmeans A B/diff lines;
lsmeans A*B/slicediff=(A B);
run;
A*B Least Squares Means
A*B Least Squares Means
A
B
Estimate
Standard
Error
r
1
20.0000
r
2
r
r
A
B
Estimate
Standard
Error
1.4769
s
1
17.0600
1.4769
27.8400
1.4769
s
2
25.2000
1.4769
4
28.1800
1.4769
s
4
28.3800
1.4769
8
24.8000
1.4769
s
8
25.8000
1.4769
Simple Effect Comparisons of A*B Least Squares Means By B
Simple
Effect
Level
A
_A
B1
r
B2
Estimate
Standard
Error
DF
t Value
Pr > |t|
s
2.9400
1.3144
16
2.24
0.0399
r
s
2.6400
1.3144
16
2.01
0.0618
B4
r
s
-0.2000
1.3144
16
-0.15
0.8810
B8
r
s
-1.0000
1.3144
16
-0.76
0.4578
14 May 2007
SSP Core Facility
124
Department of Statistics
IV. GLIMMIX – LSMEANS (3)
 lsmeans a*b / diff; gave you this
Differences of A*B Least Squares Means
Estimate
Standard
Error
DF
t Value
Pr > |t|
A
B
_A
_B
r
1
r
2
-7.8400
1.8796
19.49
-4.17
0.0005
r
1
r
4
-8.1800
1.8796
19.49
-4.35
0.0003
r
1
r
8
-4.8000
1.8796
19.49
-2.55
0.0192
r
1
s
1
2.9400
1.3144
16
2.24
0.0399
r
1
s
2
-5.2000
1.8796
19.49
-2.77
0.0121
r
1
s
4
-8.3800
1.8796
19.49
-4.46
0.0003
r
1
s
8
-5.8000
1.8796
19.49
-3.09
0.0060
r
2
r
4
-0.3400
1.8796
19.49
-0.18
0.8583
r
2
r
8
3.0400
1.8796
19.49
1.62
0.1219
r
2
s
1
10.7800
1.8796
19.49
5.74
<.0001
etc
14 May 2007
SSP Core Facility
125
Department of Statistics
IV. GLIMMIX -- LSMESTIMATE
Example: Simple Effect in 2-Factor Factorial
Model: yijk  ij  eijk     i   j   ij  eijk
Simple Effect, e.g. A|B
ij  ij   i   i    ij   i j
estimate ‘A|B’ a*b 1 0 0 0 -1 0 0 0;
must write
estimate ‘A|B’ a 1 -1 a*b 1 0 0 0 -1 0 0 0;
new GLIMMIX
alternative
14 May 2007
not estimable
lsmestimate a*b ‘A|B’ 1 0 0 0 -1 0 0 0;
 Defined on ij not on model effects
 Allows multiple LSMESTIMATES
& ADJUST= for multiplicity
SSP Core Facility
126
Department of Statistics
IV. ODS Graphics With GLIMMIX
 Not available with MIXED
ods html;
ods graphics on;
ods select MeanPlot;
proc glimmix
data=AxB_example;
class block A B;
model y=A|B/ddfm=satterth;
random block block*B;
lsmeans A*B/plot=MeanPlot
(sliceby=A join cl);
run;
ods graphics off;
ods html close;
run;
14 May 2007
SSP Core Facility
127
Department of Statistics
Factorial Treatment Design
 Treatment Design vs Experiment (or study)
Design
 Factorial is type of treatment design
 Factor A, a levels; Factor B, b levels; etc
 Main inference tools:
− simple effects; e.g. method effect | variety j
− interaction; i.e. simple effects equal for all j
− main effects
14 May 2007
SSP Core Facility
128
Department of Statistics
Model: yijk  ij  Eijk
Eijk is generic random structure
yijk  k th obs on ij th A  B
specific form depends
on design
ij  ij th A  B mean
Simple effect:
A | B j : ij  ij
B | A i : ij  ij
Interaction:
equal simple effects  no interaction
e.g. ij  ij  ij  ij
Main effect: i  i or  j  
14 May 2007
j
SSP Core Facility
129
Department of Statistics
GLIMMIX Features
 Can estimate / test
− simple effects
− main effect
− depending on which is appropriate
 ODS graphics can graph / plot effects of interest
 SLICE can focus on simple effects in presence
of interaction
 SLICEDIFF can estimate simple effects of
interest
14 May 2007
SSP Core Facility
130
Department of Statistics
Modeling & Design
14 May 2007
SSP Core Facility
131
Department of Statistics
But My Study is not a Designed Experiment!
 Comparative Study: any study whose purpose is
to compare treatments or conditions (includes
assessing change over time). Includes “quasiexperiments” & surveys with comparative objectives
+ designed experiments. Design principles apply to
all!
 Most modeling issues are study design issues
 Most modeling errors result from poor
understanding of design principles
14 May 2007
SSP Core Facility
132
Department of Statistics
If you are
modeling, you
need to
understand
design
principles!!
14 May 2007
SSP Core Facility
133
Department of Statistics
Key Terms in Design
 Treatment Design: factors and levels & how they are
structured in the study. E.g factorial, planned obs over time
 Experiment Design: Organization of experimental units
(e.g into matched pairs, blocks, strata, clusters); plan by which
they are assigned to treatment levels.
 Experimental Unit: (e.u.) Smallest entity to which
treatment levels (or treatment combinations) are independently
assigned. E.U.s are legitimate units of replication
 Sampling Unit: Unit on which measurement is taken. May
be e.u. itself or subset of e.u. A.k.a. pseudo-replicate
 Pseudo-replication: use of S.U.s as units of replication;
common form of inappropriate design & analysis
14 May 2007
SSP Core Facility
134
Department of Statistics
Factorial & Experiment Designs
 idea: experimental unit is smallest entity to which
treatment level independently applied
 e.u. may be different size for different factors
 e.g. from SAS for Mixed Models, Section 4.6
− 2 type  3 dose example
 dose applied to cage; type to animal in cage
 e.u. for dose: cage with 2 animals
 e.u. for type (and dose  type): animal
  split-plot
 many variations (including repeated measures)
14 May 2007
SSP Core Facility
135
Department of Statistics
Adding to Model
classroom
exp
std
curriculum
students
school
Treatment
Participate in Prof Devel
classroom
std
exp
curriculum
14 May 2007
students
school
Treatment
Do Not Participate
SSP Core Facility
136
Department of Statistics
V. Factorial Treatment Designs
 Basic Features
 Come in Many (many, many) design forms
 Experiment design & “quasi-experiment” or survey
“study design”
− key to deciding what’s random & what’s fixed
− non-mixed (LM and GLM only) software is
UNACCEPTABLE for these types of problems
 Includes repeated measures (change... growth)
 Normal and non-normal data
14 May 2007
SSP Core Facility
137
Department of Statistics
Type x Dose Design
Dose 1
Type 1
Type 2
Type 2
type 1
Type 2
type 2
Dose 2
Dose 3
or...
14 May 2007
Dose = Professional Development Trt
Type = Curriculum
SSP Core Facility
138
Department of Statistics
Figure 4.1 Possible design layouts for 22 factorial experiment
Treatments codes:
From
SAS for Mixed Models
Treatment design:
2 x 2 factorial
A1B1
A1B2
A2B1
A2B2
a. Completely Randomized
b. Randomized complete block
Blk 1
Experiment design:
many
many
variations
Blk 2
Blk 3
Blk 4
c. Row-Column (Latin Square)
Here are 7
(seven)
col1
col2
col3
col4
d. Split-plot 1, whole plot completely
randomized
row 1
row2
row3
row4
14 May 2007
SSP Core Facility
139
Department of Statistics
e. Split-plot 2, whole plot in
randomized complete blocks
f. Split-block, a.k.a. strip-split-plot
Blk 1
Blk 1
Blk 3
Blk 2
Blk 4
Blk 2
Blk 3
Blk 4
g. Split-plot 3. whole plot in rowcolumn (2 Latin squares)
col1
14 May 2007
col2
col3
col4
row 1
Row 3
row2
Row 4
Even with 2 x 2 factorial
these seven are not all
we’re just getting started!
SSP Core Facility
140
Department of Statistics
Split Block Example
Side
L R
Microchip
wafer
Position
(same meaning
both sides)
14 May 2007
SSP Core Facility
141
Department of Statistics
Choosing right model – step 1
What is the experimental unit?
figure

effect 
4.1.a
4.1.b
4.1.c
4.1.d
4.1.e
4.1.f
4.1.g
CRD
RCB
LS
block?
no
yes
row
col
split plot
CR
no
split plot
RCB
yes
splitblock
yes
split-plot
LS
row
col
A
eu(A*B) blk*A*B row*col eu(A)
blk*A
blk*A
row*col
B
eu(A*B) blk*A*B row*col B*eu(A)
blk*A*B
blk*B
row*col*B
A*B
eu(A*B) blk*A*B row*col B*eu(A)
blk*A*B
blk*A*B
row*col*B
14 May 2007
SSP Core Facility
142
Department of Statistics
Common Models in PROC MIXED/GLIMMIX
Design
CRD (Figure 4.1.a)
RCB (Fig 4.1.b)
Latin Square
(4.1.c)
Split-plot CR
(4.1.d)
Split-plot RCB
(4.1.e)
Split-block (4.1.f)
Split-plot LS
(4.1.g)
SAS – class, model and random statements
class eu a b;
model y=a b a*b;
class block a b;
model y=a b a*b;
Random block; or Random intercept / subject=block;
class row col a b;
model y= a b a*b;
Random row col;
class eu a b;
model y=a b a*b;
random eu(a);
class block a b;
model y=a b a*b;
random block block*a;
class block a b;
model y=a b a*b;
random block block*a block*b;
class row col a b;
model y=a b a*b;
random row col row*col;
(or, equivalently random row col row*col*a;)
MODEL  treatment design RANDOM  experiment (study) design
14 May 2007
SSP Core Facility
143
Department of Statistics
Model for split-plot: school-classroom example
1. list factor effects
2. list e.u. for that effect
3. each e.u.  a random model effect
e.u.
Effect
school
prof dev trt
classroom(school)
curriculum
classroom(school)
p.d  curr
Strategy:
e.g.
 model: yijk  ij  s (t )ik  eijk
ij    pi  c j  pcij or alternative expression
Eijk  school (trt )ik  eijk
note! student is sampling unit (not an e.u.)
14 May 2007
SSP Core Facility
144
Department of Statistics
Model for split-plot – Dose x Type example
Strategy:
e.g.
1. list factor effects
2. list e.u. for that effect
3. each e.u.  a random model effect
Effect
e.u.
dose
block  dose
type
block  dose  type
dose  type
block  dose  type
 model: yijk  ij  block  (b  d )ik  eijk
ij    di  t j  dtij or alternative expression
Eijk  block  (b  d )ik  eijk
note! bloc  type NOT in model (not an e.u.)
14 May 2007
SSP Core Facility
145
Department of Statistics
Conventional ANOVA
Source
EMS
bloc
dose
 S2  t W2  QD
w.p. error †
bloc  dose
  t
type
 S2  QT
dose  type
 S2  QDT
s.p. error ††
 S2
14 May 2007
2
S
2
W
SSP Core Facility
H a.k.a.
between subjects
error
HH a.k.a.
within subjects
error
146
Department of Statistics
Standard errors of various terms
Main effects
of dose
of type
i  i
 j 
Simple effects
type|dosei ij  ij 
dose|type j ij  ij
j
 rt 
Var=  2  ( )
rd
( S2  t W2 )
Var= 2
2
S
 r
Var=  2  (
r
Var= 2
( S2 )
2
S
  W2 )
Note: you can use MS() directly except for dose|typej
14 May 2007
SSP Core Facility
147
Department of Statistics
Programming in Proc GLIMMIX
proc glimmix;
class bloc type dose;
model y=type|dose;
random intercept dose / subject=bloc;
** i.e. random bloc bloc*dose;
lsmeans type*dose /
diff lines slicediff=(type dose) slice=(type dose);
ods output lsmeans=lsm;
run;
simple effect
with “MRT lines”
simple effect
all possible mean
differences only
differences
tests only
You can use ODS to output LSMEANS and GPLOT
for interaction plots, Or use ODS graphics directly
14 May 2007
SSP Core Facility
148
Department of Statistics
Type x Dose: Selected Output
Covariance Parameter Estimates
Estimate
Standard
Error
block
2.0735
2.7320
block
4.5132
2.8291
4.3189
1.5270
Cov Parm
Subject
Intercept
dose
Residual
Type III Tests of Fixed Effects
Num
DF
Den
DF
F Value
Pr > F
type
1
16
2.78
0.1151
dose
3
12
13.63
0.0004
type*dose
3
16
2.29
0.1176
Effect
14 May 2007
SSP Core Facility
149
Department of Statistics
Type x Dose LSMeans
type*dose Least Squares Means
Estimate
Standard
Error
DF
t Value
Pr > |t|
type
dose
r
1
20.0000
1.4769
20.23
13.54
<.0001
r
2
27.8400
1.4769
20.23
18.85
<.0001
r
4
28.1800
1.4769
20.23
19.08
<.0001
r
8
24.8000
1.4769
20.23
16.79
<.0001
s
1
17.0600
1.4769
20.23
11.55
<.0001
s
2
25.2000
1.4769
20.23
17.06
<.0001
s
4
28.3800
1.4769
20.23
19.22
<.0001
s
8
25.8000
1.4769
20.23
17.47
<.0001
14 May 2007
SSP Core Facility
150
Department of Statistics
Type x Dose: “MRT Lines”
T Grouping for type*dose Least Squares Means
LS-means with the same letter are not significantly different.
type
dose
s
4
Estimate
28.3800
A
A
r
r
4
2
28.1800
27.8400
A
A
however ...
A

A
s
8
25.8000
A
A
s
2
25.2000
A
A
r
8
24.8000
A
r
1
20.0000
B
s
1
17.0600
C
14 May 2007
SSP Core Facility
151
Department of Statistics
A Factorial Inference Flowchart
The Prime Directive: Interactions first!!!!!
Interaction?
Non-ignorable
Negligible
Interpret
Simple Effects
Interpret
Main Effects
Full Wheelbarrow
14 May 2007
SSP Core Facility
152
Department of Statistics
Plots of Differences between Means
 LSMEANS allows various plots of mean differences
 DIFFPlot: plots interval estimates of mean differences
 ANoMPlot: (ANalysis of Means) plots difference
between each treatment and the overall mean
 ControlPlot: Plots each treatment vs control (e.g. like
Dunnett test)
14 May 2007
SSP Core Facility
153
Department of Statistics
SAS for Mean Difference Plots
 From Type x Dose example
ods html;
ods graphics on;
ods select Anomplot DiffPlot;
proc glimmix data=variety_eval;
class block type dose;
model y=type|dose/ddfm=satterth;
random block block*dose;
lsmeans dose/plot=DiffPlot;
lsmeans dose/plot=AnomPlot;
*lsmeans type*dose/plot=DiffPlot;
*lsmeans type*dose/plot=AnomPlot;
run;
ods graphics off;
ods html close;
run;
14 May 2007
SSP Core Facility
154
Department of Statistics
SAS for Mean Difference Plots: DIFFPLOT
14 May 2007
SSP Core Facility
155
Department of Statistics
SAS for Mean Difference Plots: ANoMPLOT
14 May 2007
SSP Core Facility
156
Department of Statistics
Mean Difference Plots – Control Plots
 From SAS for Linear Models – Output 3.17-3.22
 Randomized Complete Block
 5 Irrigation Treatments: Flood (control), Basin, Spray,
Sprinkler, Trickle
ods html;
ods graphics on;
ods select ControlPlot;
proc glimmix order=data;
class bloc irrig;
model fruitwt=irrig;
random bloc;
lsmeans irrig/diff=control('flood')
plot=controlplot adjust=dunnett;
ods graphics off;
ods html close;
14 May 2007
SSP Core Facility
run;
run;
157
Department of Statistics
Dunnett-style Control Plot
14 May 2007
SSP Core Facility
158
Department of Statistics
Back to Type x Dose Data: Interaction Plot
14 May 2007
SSP Core Facility
159
Department of Statistics
Type x Dose: Simple Effects
SLICE: test only
Tests of Effect Slices for type*dose Sliced By dose
Tests of Effect Slices for type*dose Sliced By
type
Num
D
F
Den
D
F
F Value
Pr > F
r
3
19.49
8.12
0.0010
s
3
19.49
13.58
<.0001
type
Num
D
F
Den
D
F
F Value
Pr > F
1
1
16
5.00
0.0399
2
1
16
4.03
0.0618
4
1
16
0.02
0.8810
8
1
16
0.58
0.4578
dose
Simple Effect Comparisons of type*dose Least Squares Means By dose
SLICEDIFF
estimates
etc
14 May 2007
Simple
Effect
Level
type
_type
dose 1
r
dose 2
Estimate
Standard
Error
DF
t Value
Pr > |t|
s
2.9400
1.3144
16
2.24
0.0399
r
s
2.6400
1.3144
16
2.01
0.0618
dose 4
r
s
-0.2000
1.3144
16
-0.15
0.8810
dose 8
r
s
-1.0000
1.3144
16
-0.76
0.4578
SSP Core Facility
160
Department of Statistics
Type x Dose: Simple Effect Estimates by Type
Simple Effect Comparisons of type*dose Least Squares Means By type
Simple
Effect
Level
dose
_dose
type r
1
type r
Estimate
Standard
Error
DF
t Value
Pr > |t|
2
-7.8400
1.8796
19.49
-4.17
0.0005
1
4
-8.1800
1.8796
19.49
-4.35
0.0003
type r
1
8
-4.8000
1.8796
19.49
-2.55
0.0192
type r
2
4
-0.3400
1.8796
19.49
-0.18
0.8583
type r
2
8
3.0400
1.8796
19.49
1.62
0.1219
type r
4
8
3.3800
1.8796
19.49
1.80
0.0876
type s
1
2
-8.1400
1.8796
19.49
-4.33
0.0003
type s
1
4
-11.3200
1.8796
19.49
-6.02
<.0001
type s
1
8
-8.7400
1.8796
19.49
-4.65
0.0002
type s
2
4
-3.1800
1.8796
19.49
-1.69
0.1066
type s
2
8
-0.6000
1.8796
19.49
-0.32
0.7530
type s
4
8
2.5800
1.8796
19.49
1.37
0.1855
14 May 2007
SSP Core Facility
161
Department of Statistics
Effect of dose?
contrast
contrast
contrast
contrast
contrast
contrast
'logdose linear' dose -3 -1 1 3;
'logdose quad' dose 1 -1 -1 1;
'logdose cubic' dose -1 3 -3 1;
'type x linear' dose*type -3 -1 1 3 3 1 -1 -3;
'type x quad' dose*type 1 -1 -1 1 -1 1 1 -1;
'type x cubic' dose*type -1 3 -3 1 1 -3 3 -1;
 Log(Dose)
otherwise.....
contrast 'dose linear' dose -11 -7 1 17;
contrast 'dose quad' dose 20 -4 -29 13;
contrast 'dose cubic' dose -8 14 -7 1;
contrast 'type x linear' dose*type -11 -7 1 17 11 7 -1 -17;
contrast 'type x quad' dose*type 20 -4 -29 13 -20 4 29 -13;
contrast 'type x cubic' dose*type -8 14 -7 1 8 -14 7 -1;
14 May 2007
SSP Core Facility
162
Department of Statistics
LogDose contrast results
Contrasts
Num
Den
DF
DF
F Value
Pr > F
logdose linear
1
12
18.25
0.0011
logdose quad
1
12
22.54
0.0005
logdose cubic
1
12
0.08
0.7780
type x linear
1
16
6.22
0.0240
type x quad
1
16
0.04
0.8515
type x cubic
1
16
0.61
0.4472
Label
14 May 2007
SSP Core Facility
163
Department of Statistics
Direct Regression – borrow from ANCOVA
proc glimmix data=variety_eval;
class block type dose;
model y=type logdose(type) ld_sq(type) /
noint ddfm=satterth solution;
random intercept dose / subject=block;
contrast 'equal quad by type?' ld_sq(type) 1 -1;
run;
Contrasts
Solutions for Fixed Effects
Effect
type
Estimate
Standard
Error
DF
t Value
type
r
20.1890
1.4204
19.62
14.21
Label
type
s
17.0200
1.4204
19.62
11.98
logdose(type)
r
9.8890
2.0181
21.45
4.90
logdose(type)
s
10.9800
2.0181
21.45
5.44
equal
quad
by
type?
ld_sq(type)
r
-2.8050
0.6447
21.45
-4.35
ld_sq(type)
s
-2.6800
0.6447
21.45
-4.16
14 May 2007
SSP Core Facility
N
u
m
D
F
Den
DF
F
Value
Pr > F
1
17
0.04
0.8497
can re-fit with LD_SQ
common to both types
164
Department of Statistics
Example 3







From SAS for Mixed Models, Section 4.7
4 “conditions”
3 diets
Condition applied in incomplete block design
2 conditions per block
Diet applied to cages within condition
Condition is whole plot, diet is split-plot
14 May 2007
SSP Core Facility
165
Department of Statistics
“Plot plan”
diet 1
diet 2
diet 3
diet 2
diet 1
diet 3
diet 2
diet 1
diet 3
diet 1
diet 3
diet 2
14 May 2007
SSP Core Facility
166
Department of Statistics
Model?




blocking? yes
e.u. with respect to condition “1/2 block”
e.u. with repect to diet: “1/3 condition e.u.”
e.u. w.r.t. cond x diet: same as diet
Model:
yijk  ij  blkk  wik  eijk
14 May 2007
SSP Core Facility
167
Department of Statistics
SAS Program
proc glimmix data=fix2;
class cage condition diet / ddfm=kr;
model gain=condition diet condition*diet/ddfm=satterth;
random intercept condition / subject=cage;
run;
data & program: file ch4-ex3.sas
14 May 2007
SSP Core Facility
168
Department of Statistics
Selected Output
Type III Tests of Fixed Effects
Covariance Parameter Estimates
Estimate
Standard
Error
cage
3.0376
5.0791
cage
0
.
27.8429
8.7672
Cov Parm
Subject
Intercept
condition
Residual
Num
DF
Den
DF
F Value
Pr > F
condition
3
23.61
2.71
0.0677
diet
2
20.17
0.93
0.4090
condition*diet
6
20.17
1.73
0.1661
Effect
how should one deal with negative variance component estimate?
• revert to ANOVA via PROC GLM ?
• in MIXED, use NOBOUND option ?
• in GLIMMIX, use LowerB
• alternatively, redefine model
• may be CS with plots in block negatively correlated
14 May 2007
SSP Core Facility
169
Department of Statistics
Comparison with SAS Proc GLM
proc glm data=fix2;
class cage condition diet;
model gain=cage condition cage*condition diet condition*diet;
random cage cage*condition/test;
lsmeans condition diet condition*diet;
Tests of Hypotheses for Mixed Model Analysis of Variance
Source
DF
Type III SS
Mean Square F Value
cage
5
198.277778
39.655556
2.73
* condition
3
171.666667
57.222222
3.95
Error
3
43.500000
14.500000
Error: MS(cage*condition)
* This test assumes one or more other fixed effects are zero.
*
Pr > F
0.2185
0.1446
Source
cage*condition
diet
DF
3
2
Type III SS
43.500000
52.055556
Mean Square
14.500000
26.027778
F Value
0.46
0.82
Pr > F
0.7144
0.4561
condition*diet
6
288.388889
48.064815
1.52
0.2333
16
504.888889
31.555556
Error: MS(Error)
14 May 2007
SSP Core Facility
170
Department of Statistics
More GLM output
non-estimability
results from
inappropriate
definition of
estimability
(based on fixed
& random eff)
inescapable
consequence of
Proc GLM with
mixed model
14 May 2007
Least Squares Means
condition
1
2
3
4
gain LSMEAN
Non-est
Non-est
Non-est
Non-est
diet
normal
restrict
suppleme
gain LSMEAN
57.9166667
55.5000000
58.1666667
condition
1
1
1
2
2
2
3
3
3
4
4
4
diet
normal
restrict
suppleme
normal
restrict
suppleme
normal
restrict
suppleme
normal
restrict
suppleme
SSP Core Facility
DON’T
use
Proc GLM
with
mixed models!
gain LSMEAN
Non-est
Non-est
Non-est
Non-est
Non-est
Non-est
Non-est
Non-est
Non-est
Non-est
Non-est
Non-est
171
Department of Statistics
GLM vs MIXED issues
 REML default: variance component estimates set to 0
− if BLOCK affected, type I error rate 
− if error term affected, power may 
− better to allow negative estimates
− In MIXED: NOBOUND or METHOD=TYPE3
− In GLIMMIX: LowerB
 vs. GLM uses implied MS regardless
 GLM: inappropriate NON-EST artifact of incomplete
block design
 Standard errors for means, many simple effects
(including SLICE) incorrect in GLM (no fix!!)
14 May 2007
SSP Core Facility
172
Department of Statistics
GLIMMIX Option (1) – Like NOBOUND in MIXED
proc glimmix data=fix2;
class cage condition diet;
model gain=condition|diet/ddfm=kr;
Covariance Parameter Estimates
random intercept condition /
Standard
subject=cage;
Cov Parm Subject Estimate
Error
Intercept
cage
5.0288
4.7149
condition cage
-6.2404
4.8693
Residual
31.5556
11.1566
parms / lowerb=(1e-4,-10,1e-4);
run;
Type III Tests of Fixed Effects
Num
DF
Den
DF
F Value
Pr > F
condition
3
4.718
4.31
0.0798
diet
2
16
0.82
0.4561
condition*diet
6
16
1.52
0.2333
Effect
14 May 2007
SSP Core Facility
173
Department of Statistics
GLIMMIX Option (2) – is it really correlation?
proc glimmix data=fix2;
class cage condition diet;
Covariance Parameter Estimates
model gain=condition|diet/ddfm=kr;
Cov Parm
Subject
Estimate random intercept / subject=cage;
Intercept
cage
5.0271 random _residual_ / type=cs
CS
cage*condition
Residual
-6.2402
31.5567
=

2
 CC
 2
 0.2466
14 May 2007
run;
Type III Tests of Fixed Effects
Interblock correlation
2
CC
subject=condition*cage;
Num
DF
Den
DF
F Value
Pr > F
condition
3
4.717
4.31
0.0798
diet
2
16
0.82
0.4561
condition*diet
6
16
1.52
0.2334
Effect
SSP Core Facility
174
Department of Statistics
Modeling Change over Time





Regression over time
Latent growth / change models
Random coefficients over time
Repeated measures experiment
Longitudinal Data
14 May 2007
SSP Core Facility
175
Department of Statistics
From Acock – BMI Data
b mi
50
40
30
20
10
1
9
9
7
1
9
9
8
1
9
9
9
2
0
0
0
2
0
0
1
2
0
0
2
2
0
0
3
Note – my sample differs from Acock’s, so the numbers won’t match
y
e
a
r
f
r
m
14 May 2007
y
e
a
r
f
r
m
y
e
a
r
f
r
m
y
e
a
r
f
r
m
SSP Core Facility
y
e
a
r
f
r
m
y
e
a
r
f
r
m
176
Department of Statistics
Basic Growth Model
 Simplest model involves slope & intercept
 In “Stat-speak”
yij   0   1  time i  eij
obs=intercept  slope  time + error
this is just linear regression
e1 j , e2 j ,..., etj  may be independent N  0,  2 
or may be correlated (more later)
14 May 2007
SSP Core Facility
177
Department of Statistics
Basic Growth Model in SAS
in PROC GLM
Estimate
Standard
Error
t Value
Pr > |t|
21.38349324
0.55631931
38.44
<.0001
0.68444085
0.15429522
4.44
<.0001
Parameter
proc glm;
model bmi=year;
run;
Intercept
year
regression equation: yˆ  21.38  0.684  Year
Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
Model
1
432.856378
432.856378
19.68
<.0001
Error
229
5037.468822
21.997680
Corrected Total
230
5470.325200
R-Square
Coeff Var
Root MSE
bmi Mean
0.079128
20.01197
4.690168
23.43682
very deceptive – more shortly
14 May 2007
SSP Core Facility
178
Department of Statistics
Growth Model in SAS - II
in PROC GLIMMIX
proc glimmix;
class id;
model bmi=year/solution;
random _residual_ /subject=id;
estimate 'y-hat in 1997' intercept 1 year 0 / cl;
estimate 'y-hat in 2000' intercept 1 year 3 / cl;
estimate 'y-hat in 2003' intercept 1 year 6 / cl;
run;
selected output next page
14 May 2007
SSP Core Facility
179
Department of Statistics
Basic Growth Model – Selected GLIMMIX Output
Covariance Parameter Estimates
Cov Parm
Residual (VC)
Estimate
Standard
Error
21.9977
2.0558
Note: residual VC est =
MSE from GLM ANOVA
Solutions for Fixed Effects
Effect
Intercept
year
Estimate
Standard
Error
DF
t Value
Pr > |t|
21.3835
0.5563
32
38.44
<.0001
0.6844
0.1543
197
4.44
<.0001
Estimates
Estimate
Standard
Error
DF
t Value
Pr > |t|
Alpha
Lower
Upper
y-hat in 1997
21.3835
0.5563
197
38.44
<.0001
0.05
20.2864
22.4806
y-hat in 2000
23.4368
0.3086
197
75.95
<.0001
0.05
22.8283
24.0454
y-hat in 2003
25.4901
0.5563
197
45.82
<.0001
0.05
24.3930
26.5872
Label
14 May 2007
SSP Core Facility
180
Department of Statistics
G/C Model – Issue I – Account for ID
 Recall R2 for Basic Growth Model very low
 You must account for variation among subjects (ID)
proc glm;
class id;
model bmi=id year;
run;
okay
proc glimmix;
better
class id;
model bmi=year/solution;
random id;
/* or random intercept / subject = id
14 May 2007
SSP Core Facility
181
Department of Statistics
Selected Output
from
GLM
vs. 0.079
R-Square
0.815282
Covariance Parameter Estimates
from
GLIMMIX
Cov Parm
Subject
Intercept
id
Estimate
Standard
Error
17.2449
4.4950
5.1293
0.5168
Residual
vs. 21.998
Solutions for Fixed Effects
estimates
don’t change
std errors do
Effect
Intercept
year
14 May 2007
Estimate
Standard
Error
DF
t Value
Pr > |t|
21.3835
0.7712
32
27.73
<.0001
0.6844
0.07451
197
9.19
<.0001
SSP Core Facility
182
Department of Statistics
Growth Change Modeling Issue - II
 Correlated Errors
Recall:
In Model yij   0   1   year i  eij
e1 j , e2 j ,..., etj  may be independent N  0,  2 
or may be correlated
Correlation Modeled by Covariance Model
• Failure to model correlation increases P{type I error}
• Over-modeling correlation decreases Power
14 May 2007
SSP Core Facility
183
Department of Statistics
Covariance models
Indep
 =I 2
identical to split-plot
1    

1  
2 

CS
 =
1 



1

NOTE: CS is reparameterization of Indep
AR(1)
14 May 2007
1 

1
2 
 =



2

1
SSP Core Facility
3 

2


1
184
Department of Statistics
More covariance models
Toep
ANTE(1)
UN
14 May 2007
1 1

1
2 
 =



2
1
.
3 
2 

1 

1
 12  1 2 1  1 3 1  2  1 4 1  2 3 


2








2
2 3 2
2 4 2 3 
= 

 32
 3 4 3 


2

4


 12  12  13  14 


2



2
23
24 
= 
2

 3  34 

2 

4 

SSP Core Facility
185
Department of Statistics
Issues in Repeated Measures






Impact of covariance structure?
Selection of appropriate covariance?
Bias in std errors, test statistics
Degrees of freedom
Nonlinear models over time
Non-normal errors
14 May 2007
SSP Core Facility
186
Department of Statistics
Basic G/C Model with Covariance Model
 Also known as Autocorrelation
degree of freedom
proc glimmix;
and
class id;
std error bias
model bmi=year/solution / ddfm=kr; must be dealt with
more later
random intercept / subject=id;
random _residual_ /subject=id type=ar(1);
run;
Competing Covariance Models
compared via Fit Statistics
•AICC
BIC
•HQIC
CAIC
14 May 2007
SSP Core Facility
187
Department of Statistics
Selected Output for G/C Model w/ Autocorrelation
variance,
covariance &
correlation
estimates
Covariance Parameter Estimates
Estimate
Standard
Error
id
14.8587
4.6202
id
0.5623
0.1144
7.7165
1.8981
Cov Parm
Subject
Intercept
AR(1)
Residual
Fit Statistics
-2 Res Log Likelihood
Solutions for Fixed Effects
1111.69
AIC (smaller is better)
1117.69
AICC (smaller is better)
1117.79
BIC (smaller is better)
1122.18
CAIC (smaller is better)
1125.18
HQIC (smaller is better)
1119.20
Generalized Chi-Square
1767.07
Gener. Chi-Square / DF
7.72
Effect
Intercept
year
14 May 2007
Estimate
Standar
d Error
DF
t Value
Pr > |t|
21.3238
0.8042
32
26.52
<.0001
0.6896
0.1102
197
6.26
<.0001
estimate – slight effect
std error – bigger effect
used to assess cov model
SSP Core Facility
188
Department of Statistics





random coeff
correl errors
prediction
add Gender
add emotional prob
14 May 2007
SSP Core Facility
189
Department of Statistics




Repeated Measure Experiments
a.k.a. Longitudinal Data
Assign e.u. to treatments
May use any design (completely random,
blocked, row-column, split-plot ....)
Observations at planned times
Objectives
1. assess changes in response over time
2. assess treatment effect on (1)
14 May 2007
SSP Core Facility
190
Department of Statistics
Typical repeated Measures Data
from SAS for Linear Models, Chapter 8
SAS for Mixed Models, 2nd ed, Chapter 5
14 May 2007
SSP Core Facility
191
Department of Statistics
From BMI Data: Are G/C Curves Equal by Gender?
interaction
plot of G/C
curve
by
gender
14 May 2007
SSP Core Facility
192
Department of Statistics
FYI – SAS Code to Get Interaction Plot
ods html;
ods graphics on;
ods select MeanPlot;
proc glimmix data=bmi_uni_anc;
class gender id year;
model bmi=gender|year / solution ddfm=kr;
random intercept / subject=id(gender);
random _residual_ / type=ar(1) subject=id(gender);
lsmeans gender*year /
plot=MeanPlot (sliceby=gender join cl);
run;
ods graphics off;
ods html close;
run;
14 May 2007
SSP Core Facility
193
Department of Statistics
Model
Model:
yijk  ij  id ( gender )ik  eijk
where ij  genderi  yearj mean
can express as: ij    gi  yrj   g  yr ij
id ( gender )ik is between subjects error NI (0,  B2 )
like whole-plot error
eijk is within subjects error, like split-plot error, except...
Let eik   ei1k
ei 2 k
... eiTk 
eik
MVN (0, )
translates to:
proc glimmix data=bmi_uni_anc;
class gender id year;
model bmi=gender|year / solution ddfm=kr;
random intercept / subject=id(gender);
random _residual_ / type=ar(1) subject=id(gender);
14 May 2007
SSP Core Facility
194
Department of Statistics
Back to SAS for Mixed Models Example
Model:
yijk  ij  s (trt )ik  eijk
where ij  trti  time j mean
s (trt )ik is between subjects error
NI (0,  B2 )
like whole-plot error
eijk is within subjects error, like split-plot error, except...
Let eik   ei1k
ei 2 k
... eiTk 
eik
MVN (0, )
Hence Var ( y ik )  Vik  Z S Z S  B2  ; typically J T  B2  
 
 V  Var y  I AK  Vik A  # trt's, K =#subj/trt
14 May 2007
SSP Core Facility
195
Department of Statistics
Middle Ground between MANOVA and
Split-Plot in Time via Proc GLIMMIX
PROC GLIMMIX;
CLASSES SUBJ TRT TIME;
MODEL Y= TRT TIME TRT*TIME;
RANDOM INTERCEPT / SUBJECT=SUBJ(TRT);
RANDOM TIME / TYPE=AR(1) SUBJECT=SUBJ(TRT) RESIDUAL;
*LSMEANS TRT TIME TRT*TIME;
TITLE 'MIXED - AR(1) ERRORS';
RUN;
RANDOM specifies between subjects effects (G-side)
RANDOM...RESIDUAL specifies within subjects effect (R-side)
in many models, G- and R-side effects are not identifiable
14 May 2007
SSP Core Facility
196
Department of Statistics
Modeling Covariance among Repeated Measures
PROC MIXED DATA=univ;
CLASSES SUBJ TRT TIME;
MODEL Y= TRT TIME TRT*TIME;
REPEATED TIME / TYPE=UN SSCP SUBJECT=SUBJ(TRT);
ODS OUTPUT CovParms=cp;
run;
data times;
Computes covariance between
do time1=1 to 8;
pairs of measurements
do time2=1 to time1;
dist=time1-time2;
(same subject, different times)
output;
based on Sum of squares &
end;
end;
cross-products matrix
data covplot;
merge times cp;
then
plots them by distance
proc gplot data=covplot;
plot adjcorr*dist=time1;
14 May 2007
SSP Core Facility
197
Department of Statistics
Plot of Covariance by Distance
14 May 2007
SSP Core Facility
198
Department of Statistics
Idealized Plots
CS=Subj(Trt), AR(1), AR(1)+Subj(Trt)
AR(1) + Subj(Trt)
CS
= random Subj(Trt)
AR(1) only
14 May 2007
SSP Core Facility
199
Department of Statistics
Model Fitting
Criteria in Version 8
1. Compound Symmetry
proc glimmix;
classes subj trt time;
model y= trt time trt*time;
random time / residual type=cs subject=subj(trt);
title 'mixed - compound symmetry';
Fit Statistics
14 May 2007
-2 Res Log Likelihood
839.39
AIC (smaller is better)
843.39
AICC (smaller is better)
843.47
BIC (smaller is better)
845.75
CAIC (smaller is better)
847.75
HQIC (smaller is better)
844.02
Generalized Chi-Square
767.61
Gener. Chi-Square / DF
4.80
SSP Core Facility
200
Department of Statistics
Comparison of Models
Smaller is Better
Compound Symmetry
Neg2LogLike
839.4
Parms
2
AIC
843.4
AICC
843.5
HQIC
844.0
BIC
845.7
CAIC
847.7
AR(1) + Subj(TRT) random effect
Neg2LogLike
788.7
Parms
3
AIC
794.7
AICC
794.8
HQIC
795.6
BIC
798.2
CAIC
801.2
Parms
36
AIC
832.5
AICC
854.1
HQIC
843.7
BIC
874.9
CAIC
910.9
Unstructured
Neg2LogLike
760.5
ANTE(1)
TOEP
14 May 2007
Neg2LogLike
780.7
Parms
15
AIC
810.7
AICC
814.0
HQIC
815.3
BIC
828.3
CAIC
843.3
Neg2LogLike
784.9
Parms
AIC
800.9
AICC
801.9
HQIC
803.4
BIC
810.4
CAIC
818.4
8
SSP Core Facility
201
Department of Statistics
How do Model Fitting Criteria Compare?
 Guerin & Stroup (2000) compared AIC, BIC, HQIC, CAIC for
simulated AR(1) and ARH(1) data
 CAIC tends to select simpler models
 AIC tends to select most complex models *
 complex -- AIC > HQIC > BIC > CAIC -- simple
 Model too simple (correlation model not adequate)  Type I error
rate too high
 Model too complex (correlation over-modeled)  Type I error control
not affected, but power suffers
 *Since 2000, SAS added AICC to address AIC issue
 Best choice depends on severity of Type I vs II error
14 May 2007
SSP Core Facility
202
Department of Statistics
An Inference Issue
CS:
Type 3 Tests of Fixed Effects
Num
Effect
DF
TRT
3
TIME
7
TRT*TIME
21
Den
DF
20
140
140
F Value
0.74
109.04
1.98
Type 3 Tests of Fixed Effects
Num
Den
Effect
DF
DF
F Value
TRT
3
20
0.75
TIME
7
140
60.55
TRT*TIME
21
140
1.48
Pr > F
0.5425
<.0001
0.0106
AR(1)+between subj:
UN:
Type 3 Tests of Fixed Effects
Num
Effect
DF
TRT
3
TIME
7
TRT*TIME
21
Den
DF
20
20
20
F Value
0.74
101.31
1.37
Pr > F
0.5344
<.0001
0.0921
Pr > F
0.5425
<.0001
0.2450
UN similar to MANOVA but MANOVA Trt*Time p-value was 0.50
14 May 2007
SSP Core Facility
203
Department of Statistics
Bias & Options for Adjusting
 SAS Default uses estimated (co)variance components in
V std errors biased , t-, F-statistics biased 
 “Robust” (a.k.a. “sandwich) estimate of K’V-1K available
using EMPIRICAL option in MIXED
 Kenward & Roger (Biometrics, 1997) proposed
adjustment; available using DDFM=KR option in
MODEL statement of MIXED
 Guerin & Stroup (2000) evaluated KR option of SAS
Version 8 with simulated AR(1) and ARH(1) data
 Biased F resulted in inflated Type I error rates unless
KR option used (for α=0.05, rejection rates >0.10 for
TYPE=AR(1), up to 0.20 with TYPE=ANTE(1), UN
14 May 2007
SSP Core Facility
204
Department of Statistics
Sandwich (“Robust”) Estimator
ˆOLS   X X  X y

OLS estimate of  :




Var ˆOLS   X X  X  Var ( y )  X  X X 
  X X  X VX  X X 

GLS estimate is:

 
Var ˆGLS  X Vˆ 1
Let Vˆ0


 X Vˆ
X  X VX  X Vˆ X 
ˆ
GLS
 X Vˆ 1 X

1

1
y

V based on residuals eˆ  y  X ˆ
ˆ ˆ ˆ 1
Yields Vˆ0  Vˆ 1 eeV

 
"Sandwich" estimator: Var ˆGLS  X Vˆ 1 X
14 May 2007
SSP Core Facility



ˆ ˆ ˆ 1 X X Vˆ 1 X
X Vˆ 1 eeV


205
Department of Statistics
How does the sandwich estimator perform?
proc mixed empirical;
classes subj trt time;
model y=trt time trt*time;
random intercept/ subject=subj(trt);
random time / type=ar(1) subject=subj(trt) residual;
run;
Type 3 Tests of Fixed Effects
Effect
TRT
TIME
TRT*TIME
Num
DF
Den
DF
F Value
Pr > F
3
7
20
20
140
140
1.31
121.57
9.04
0.2981
<.0001
<.0001
vs. F=1.48; p=0.0921
using default
14 May 2007
SSP Core Facility
206
Department of Statistics
Kenward and Roger
proc glimmix;
classes subj trt time;
model y= trt time trt*time/ddfm=kr;
random intercept / subject=subj(trt);
random time / type=ar(1) subject=subj(trt) residual;
Type 3 Tests of Fixed Effects
Effect
TRT
TIME
TRT*TIME
14 May 2007
Num
DF
Den
DF
F Value
Pr > F
3
7
21
20.5
109
117
0.77
50.90
1.24
0.5219
<.0001
0.2330
SSP Core Facility
207
Department of Statistics
Alternative KR adjustment
• in SAS, KR adjustment uses Hessian matrix by default
• you can cause it to use the Information matrix instead
• no documented advantage one way or another
PROC glimmix scoremod scoring=51;
CLASSES SUBJ TRT TIME;
MODEL Y= TRT TIME TRT*TIME/ddfm=kr;
RANDOM intercept / subject=SUBJ(TRT);
Random _resid_ / TYPE=AR(1) SUBJECT=SUBJ(TRT);
nloptions technique=nrridg;
Type 3 Tests of Fixed Effects
Num
Den
Effect
DF
DF
F Value
Pr > F
TRT
TIME
TRT*TIME
3
7
21
20.5
112
119
0.77
54.18
1.28
0.5264
<.0001
0.2010
vs. F=1.24, p=0.2330 using Hessian
14 May 2007
SSP Core Facility
208
Department of Statistics
Alternative Model for Change in BMI by Gender
Level 1: ytj   0 j   1 j  yrt  etj
Level 2:  0 j   0  genderi  id ( gender )ij
 1 j  1  g1i
 Repeated Measures ANCOVA Model
yijk   0  genderi  id ( gender )ij   1  g1i   yrt  eijk
  0i  1i  id ( gender )ij  eijk
proc glimmix data=bmi_uni_anc;
class gender id year;
model bmi=gender yr(gender) / noint solution ddfm=kr;
random intercept / subject=id(gender);
random _residual_ / type=ar(1) subject=id(gender);
contrast 'male vs female intercept' gender 1 -1;
contrast 'male vs female slope' yr(gender) 1 -1;
run;
14 May 2007
SSP Core Facility
209
Department of Statistics
Selected Output
Contrasts
Num
DF
Den
DF
F
Value
Pr > F
male vs female
intercept
1
165.9
3.89
0.0501
male vs female slope
1
204.5
1.57
0.2111
Covariance Parameter Estimates
Cov Parm
Subject
Intercept
id(gender)
15.1933
AR(1)
id(gender)
0.2928
Residual
Estimate
7.8871
Label
Solutions for Fixed Effects
Estimate
Standard
Error
DF
t Value
Pr > |t|
Effect
gender
gender
0
20.1988
0.6084
165.9
33.20
<.0001
gender
1
21.8298
0.5596
165.9
39.01
<.0001
yr(gender)
0
0.7860
0.08207
204.5
9.58
<.0001
yr(gender)
1
0.6462
0.07549
204.5
8.56
<.0001
14 May 2007
SSP Core Facility
210
Department of Statistics
Alternative Model
proc glimmix data=bmi_uni;
class gender id;
model bmi=gender year(gender) / noint solution ddfm=kr;
random intercept year(gender) / subject=id type=un;
contrast 'male vs female intercept' gender 1 -1;
contrast 'male vs female slope' year(gender) 1 -1;
run;
This is a random coefficient model
Next section
14 May 2007
SSP Core Facility
211
Department of Statistics
Response Surface Split Plot with Repeated Measures
 4 treatment factors (A, B, C, D)
− 2 levels each





3 factors (A, B, C) applied to P( subject)
treatment design: central composite design
subjects split into 2 sub-units
level of D randomly assigned to each sub-unit
observations at 3 planned times (H)
14 May 2007
SSP Core Facility
212
Department of Statistics
Central Composite Design
14 May 2007
SSP Core Facility
213
Department of Statistics
Model for Central Composite Split-Split Plot
Effect
e.u.
A, B, C
main effects & interactions
P(A B C)
D
D  P(A B C)
D  (A, B, C)
D  P(A B C)
H and all interactions
H  D  P(A B C)
involving H
 yhijklm  f ( X Ai , X Bj , X Ck )  dl  f l ( X Ai , X Bj , X Ck )
 hm  dhlm  f m ( X Ai , X Bj , X Ck )
 p (abc) hijk  dp (abc) hijkl  ehijklm
14 May 2007
SSP Core Facility
214
Department of Statistics
SAS Statements
proc glimmix;
class ca cb cc p d u;
*model y=a b c a*a b*b c*c a*b a*c b*c d d*a d*b d*c
t t*t t*a t*b t*c t*d/htype=1 htype=3 ddfm=kr;
model y=d a(d) b(d) c(d) a*a b*b c*c a*b a*c b*c
t(d) t*t t*a t*b t*c
/noint solution htype=1 ddfm=kr;
random p(ca cb cc) d*p(ca cb cc);
14 May 2007
SSP Core Facility
215
Department of Statistics
Solutions for Fixed Effects
Key output
Covariance Parameter Estimates
d
Estimate
Standard Error
d
0
53.5687
2.3344
d
1
31.7168
2.3344
a(d)
0
16.8226
1.8101
a(d)
1
11.2226
1.8101
b(d)
0
19.5049
1.8101
b(d)
1
12.3715
1.8101
c(d)
0
4.4019
1.8101
1
3.5352
1.8101
Cov Parm
Subject
Intercept
p(ca*cb*cc)
24.3200
d
p(ca*cb*cc)
4.5151
c(d)
11.4944
a*a
0.4980
3.2427
b*b
-2.5020
3.2427
c*c
5.1647
3.2427
a*b
6.2083
1.8872
a*c
-2.8333
1.8872
b*c
1.2083
1.8872
Residual
Estimate
Effect
Fit Statistics
AICC (smaller is better)
14 May 2007
573.40
SSP Core Facility
t(d)
0
9.4200
0.5504
t(d)
1
0.02442
0.5504
t*t
-0.1487
1.1114
a*t
0.1160
0.5078
b*t
1.7331
0.5078
c*t
0.3513
0.5078
216
Department of Statistics
Complex Split-split-plot revisited






Recall A, B, C applied to units P
P split in two, levels of D to each half
Measured a 3 times
Previous analysis assumed split on time
Actually repeated measures
Split-plot + repeated measures
14 May 2007
SSP Core Facility
217
Department of Statistics
CCD Split-plot + repeated measures
proc glimmix data=CCD_SpltPlt;
class ca cb cc p d u;
*model y=a b c a*a b*b c*c a*b a*c b*c d d*a d*b d*c
t t*t t*a t*b t*c t*d/htype=1 htype=3 ddfm=kr;
model y=d a(d) b(d) c(d) a*a b*b c*c a*b a*c b*c
t(d) t*t t*a t*b t*c /
noint solution htype=1 ddfm=kr;
random intercept / subject=p(ca cb cc);
random _residual_ / type=sp(pow)(t) subject=d*p(ca cb cc);
run;
AICC: 573.4 as split-split-plot
551.1 as repeated measures using SP(POW)
note SP(POW) is generalization of AR(1)
for unequally spaced times
14 May 2007
SSP Core Facility
218
Department of Statistics
Unreplicated Split-Plot
 SAS for Mixed Models, Section 16.7
 Quilt divided in half
 Each “half sheet” received 2 x 2 x 3 factorial
− 2 pH levels (low high)
− 2 temp (cold hot)
− 3 dry cycles (air machine-delicate machine-normal
 Material cut from each unit
− washed 10, 20, 30, 40, 50 times
 Breaking strength monitored
 Materials observed so reps by sheet lost
14 May 2007
SSP Core Facility
219
Department of Statistics
Model for Breaking Strength Experiment
yijklm  ijkl  rm  wijkm  eijklm
where
ijkl is the mean of the ijkth pH  water temperature  dry cycle
(i=8,10; j=35,55; k=air, delicate, normal) at the lth
time of washing (l=10.20.30.40.50),
rm is the effect of the mth block (m=1,2 in the design, but m=1
only in the data)
wijkm is the ijkmth between subjects (or whole-plot) error effect,
assumed NID(0,  W2 )
eijklm is the within subjects (or split-plot) error effect,
2
assumed NID(0,  )
14 May 2007
SSP Core Facility
220
Department of Statistics
ANOVA for Breaking Strength Experiment
Source of Variation
block
1
pH (P)
1
wash temp (T)
1
dry cycle (D)
2
PT
1
PD
2
TD
2
PTD
2
between subject error
11
no. of washes (W)
4
WP
4
WT
4
WD
8
WPT
4
WPD
8
WTD
8
WPTD
8
within subjects error
14 May 2007
d.f.
48
SSP Core Facility
but these become
0
when blocking
by “half quilt”
distinction lost
221
Department of Statistics
Breaking Strength vs # Washes by pH
14 May 2007
SSP Core Facility
222
Department of Statistics
Breaking Strength vs # Washes by Temp
14 May 2007
SSP Core Facility
223
Department of Statistics
Breaking Strength vs # Washes by Dry Cycle
14 May 2007
SSP Core Facility
224
Department of Statistics
Revised ANOVA
Pool negligible effects to get between & within error
Source of Variation
14 May 2007
d.f.
pH (P)
1
wash temp (T)
1
dry cycle (D)
2
between subject error
7
linear effect of no. of washes (W Lin)
1
W LinP
1
W LinT
1
W LinD
2
within subjects error
43
SSP Core Facility
225
Department of Statistics
GLIMMIX Program for Breaking Strength Experiment
proc glimmix data=shellie;
class pH water_temp dry_cycle;
model breaking_strength=pH water_temp dry_cycle
w w*pH w*water_temp w*dry_cycle / solution;
random pH*water_temp*dry_cycle;
contrast 'air vs dryer effect on wear' w*dry_cycle 2 -1 -1;
contrast 'delicate v normal effect on wear' w*dry_cycle 0 1 -1;
run;
14 May 2007
SSP Core Facility
226
Department of Statistics
Revised GLIMMIX - Estimate Regression over # of Washes
proc glimmix data=shellie;
class pH water_temp dry_cycle;
model breaking_strength= w(pH) w(water_temp) w(dry_cycle)/noint solution;
random pH*water_temp*dry_cycle;
estimate 'slope: ph 8, cold, air‘
w(ph) 1 0 w(water_temp) 1 0 w(dry_cycle) 1 0 0;
estimate 'slope: ph 8, cold, delicate'
w(ph) 1 0 w(water_temp) 1 0 w(dry_cycle) 0 1 0;
estimate 'slope: ph 8, cold, normal'
w(ph) 1 0 w(water_temp) 1 0 w(dry_cycle) 0 0 1;
estimate 'slope: ph 8, hot, air‘
w(ph) 1 0 w(water_temp) 0 1 w(dry_cycle) 1 0 0;
estimate 'slope: ph 8, hot, delicate'
w(ph) 1 0 w(water_temp) 0 1 w(dry_cycle) 0 1 0;
etc for all pH – temp – dry cycle combinations
14 May 2007
SSP Core Facility
227
Department of Statistics
Regression – Selected Output
Label
Estimate
Standard
Error
slope: ph 8, cold, air
-0.00024
0.000077
slope: ph 8, cold, delicate
-0.00047
0.000077
Estimate
Standard
Error
slope: ph 8, cold, normal
-0.00050
0.000077
0.1070
0.001895
slope: ph 8, hot, air
-0.00050
0.000077
slope: ph 8, hot, delicate
-0.00073
0.000077
slope: ph 8, hot, normal
-0.00076
0.000077
slope: ph 10, cold, air
-0.00082
0.000077
slope: ph 10, cold, delicate
-0.00105
0.000077
slope: ph 10, cold, normal
-0.00108
0.000077
slope: ph 10, hot, air
-0.00108
0.000077
slope: ph 10, hot, delicate
-0.00131
0.000077
slope: ph 10, hot, normal
-0.00134
0.000077
avg slope: ph 8
-0.00053
0.000054
avg slope: ph 10
-0.00111
0.000054
avg slope: cold water
-0.00069
0.000054
avg slope: hot water
-0.00095
0.000054
avg slope: air dry
-0.00066
0.000063
avg slope: delicate dry
-0.00089
0.000063
avg slope: normal dry
-0.00092
0.000063
Solution for Fixed Effects
Effect
water
temp
Intercept
14 May 2007
Dry
cycle
p
H
SSP Core Facility
228
Department of Statistics
Prediction &
Inference Space
14 May 2007
SSP Core Facility
229
Department of Statistics
VI. Prediction, “BLUP” and Inference Space
 Estimation vs. Prediction
 When “BLUP” is a good thing
 Inference Space
− what is it?
− how can we use it?
 Performance evaluation issues
 Multi-location issues
14 May 2007
SSP Core Facility
230
Department of Statistics
Estimation, Prediction, and Inference Space
 Estimation based on estimable functions K 
 Estimation applies to fixed effects only, inference is to
entire population
 Prediction based on “predictable functions” K   M u
 Prediction applies to fixed & random effects, narrows
scope of inference to specific subset defined by M’u
 Examples: locations, workers, teachers, patients...
14 May 2007
SSP Core Facility
231
Department of Statistics
Prediction Example 1
Growth Change Modeling Issue - III
 Random Coefficients
 Recall Basic Growth Model yij   0  1   year i  eij
Level 2:  0   0  b0
 1  1  b1
 0    02  01  
 b0 
 b  ~ MVN  0  , 
2 
 1  
  
 1
proc glimmix data=bmi_uni;
class id;
model bmi=year/solution ddfm=kr;
random intercept year / subject=id type=un solution;
random _residual_ /subject=id type=ar(1);
14 May 2007
SSP Core Facility
232
Department of Statistics
Selected Output
Covariance Parameter Estimates
Estimate
Solutions for Fixed Effects
id
10.8070
Estimate
Standard
Error
t Value
UN(2,1)
id
0.5873
21.3577
0.6480
32.96
UN(2,2)
id
0.2676
0.6870
0.1212
5.67
AR(1)
id
0.3024
Cov Parm
Subject
UN(1,1)
Residual
Intercept
year
4.6021
partial listing
14 May 2007
Effect
Solution for Random Effects
Estimate
Std Err
Pred
DF
Effect
Subject
Intercept
id 73
2.1023
1.3487
165
year
id 73
-0.1608
0.3118
165
Intercept
id 281
-1.3178
1.3487
165
year
id 281
-0.1353
0.3118
165
Intercept
id 496
-1.8137
1.3487
165
year
id 496
-0.07237
0.3118
165
SSP Core Facility
233
Department of Statistics
You can obtain Subject-Specific Estimates
proc glimmix data=bmi_uni;
class id;
model bmi=year/solution ddfm=kr;
random intercept year / subject=id type=un solution;
random _residual_ /subject=id type=ar(1);
estimate 'popn avg slope' year 1 / cl;
estimate 'id (73) specific slope' year 1 | year 1 / subject 1 0 cl e;
estimate 'id (496) specific slope' year 1 | year 1 / subject 0 0 1 0 cl;
estimate 'popn avg intercept' intercept 1 / cl;
estimate 'predicted bmi in 1997' intercept 1 year 0 / cl;
estimate 'id (73) specific intercept' intercept 1 | intercept 1 / subject 1 0 cl e;
estimate 'id (496) specific intercept' intercept 1 | intercept 1 / subject 0 0 1 0 cl;
estimate 'predicted bmi in 2000' intercept 1 year 3 / cl;
estimate 'id (73) specific 2000 bmi' intercept 1 year 3 |
intercept 1 year 3/ subject 1 0 cl;
estimate 'id (496) specific 2000 bmi' intercept 1 year 3 |
intercept 1 year 3/ subject 0 0 1 0 cl;
estimate 'predicted bmi in 2003' intercept 1 year 6 / cl;
estimate 'id (73) specific 2003 bmi' intercept 1 year 6 |
intercept 1 year 6/ subject 1 0 cl;
estimate 'id (496) specific 2003 bmi' intercept 1 year 6 |
intercept 1 year 6/ subject 0 0 1 0 cl;
run;
14 May 2007
SSP Core Facility
234
Department of Statistics
Best Linear Unbiased Prediction
 Look closer at Estimate statement
estimate 'popn avg slope' year 1 / cl;
estimate 'id (73) specific slope' year 1 |
year 1 / subject 1 0 cl e;
estimate 'id (496) specific slope' year 1 |
year 1 / subject 0 0 1 0 cl;
estimate 'predicted bmi in 2000' intercept 1 year 3 / cl;
estimate 'id (73) specific 2000 bmi' intercept 1 year 3 |
intercept 1 year 3/ subject 1 0 cl;
estimate 'id (496) specific 2000 bmi' intercept 1 year 3 |
intercept 1 year 3/ subject 0 0 1 0 cl;
Coefficients to right of vertical bar ( | ) apply to
random effects – this is a new idea
BLUP - - - estimation (prediction) of random effects
14 May 2007
SSP Core Facility
235
Department of Statistics
Selected Estimates from Random Coeff BMI Model
Estimates
Estimate
Standard
Error
DF
Lower
Upper
popn avg slope
0.6870
0.1214
31.57
0.4396
0.9344
id (73) specific slope
0.5262
0.3833
18.35
-0.2779
1.3303
id (496) specific slope
0.6146
0.3833
18.35
-0.1895
1.4187
popn avg intercept
21.3577
0.6459
31.5
20.0413
22.6742
predicted bmi in 1997
21.3577
0.6459
31.5
20.0413
22.6742
id (73) specific intercept
23.4601
1.4916
33.36
20.4266
26.4935
id (496) specific intercept
19.5440
1.4916
33.36
16.5105
22.5775
predicted bmi in 2000
23.4186
0.7330
31.99
21.9255
24.9117
id (73) specific 2000 bmi
25.0387
0.9928
9.56
22.8127
27.2646
id (496) specific 2000 bmi
21.3878
0.9928
9.56
19.1618
23.6138
predicted bmi in 2003
25.4795
0.9605
31.84
23.5226
27.4365
id (73) specific 2003 bmi
26.6173
1.5462
20.15
23.3936
29.8410
id (496) specific 2003 bmi
23.2316
1.5462
20.15
20.0079
26.4553
Label
14 May 2007
SSP Core Facility
236
Department of Statistics
Inference Space Example II:
 Workers and machines
 From McLean, Sanders & Stroup (1991,
American Statistician)
 Also Chapter 6, ex 2, SAS for Mixed Models
 2 machines
 3 operators (sample from population)
 inference can apply to population of workers or
specific worker
 KEY CONCEPT: Inference Space
14 May 2007
SSP Core Facility
237
Department of Statistics
Worker-Machine Example: Fixed Effect Inference
proc glimmix;
class machine operator;
model y=machine/ddfm=kr;
random operator machine*operator;
lsmeans machine / diff;
estimate 'BLUE - machine 1'
intercept 1 machine 1 0;
estimate 'BLUE - diff' machine 1 -1;
Type III Tests of Fixed Effects
Effect
Num
DF
Den
DF
F Value
Pr > F
1
2
20.26
0.0460
machine
based on MS(mach) / MS(Mach*oper)
machine Least Squares Means
machine
these
ESTIMATE
statements
give same result
14 May 2007
Estimate
Std Error
DF
t Value
Pr > |t|
1
50.9483
0.2467
2.973
206.50
<.0001
2
51.9567
0.2467
2.973
210.59
<.0001
Differences of machine Least Squares Means
machine
_machine
1
2
Estimate
Std Error
DF
t Value
Pr > |t|
-1.0083
0.2240
2
-4.50
0.0460
SSP Core Facility
238
Department of Statistics
Worker-Machine Example: Prediction
these statements apply inference to specific workers or worker-machine
• machine 1 averaged over ONLY THE WORKERS IN THE STUDY
• diff between machines for workers in study ONLY
•operator 1 averaged over machines, with machine 1 only, oper-specific
difference between machines
estimate 'BLUP - m1 narrow' intercept 3 machine 3 0 | operator 1 1 1
machine*operator 1 1 1 0 0 0/divisor=3;
estimate 'BLUP - diff nrw' machine 3 -3 | machine*operator 1 1 1 -1 -1 1/divisor=3;
estimate 'BLUP - oper 1' intercept 2 machine 1 1 | operator 2 0 0
machine*operator 1 0 0 1 0 0/divisor=2;
estimate 'BLUP - m1 op1' intercept 1 machine 1 0 | operator 1 0 0
machine*operator 1 0 0 0 0 0;
estimate 'BLUP - diff op1' machine 1 -1 | machine*operator 1 0 0 -1 0 0;
14 May 2007
SSP Core Facility
239
Department of Statistics
Worker-Machine Example: Prediction (2)
Estimates
Estimate
Standard
Error
DF
t Value
Pr > |t|
BLUE - machine 1
50.9483
0.2467
2.973
206.50
<.0001
BLUE - diff
-1.0083
0.2240
2
-4.50
0.0460
BLUP - m1 narrow
50.9483
0.08993
6
566.53
<.0001
BLUP - diff nrw
-1.0083
0.1272
6
-7.93
0.0002
BLUP - oper 1
51.7366
0.1151
6.698
449.30
<.0001
BLUP - m1 op1
51.2979
0.1724
7.885
297.48
<.0001
BLUP - diff op1
-0.8773
0.2567
7.976
-3.42
0.0092
Label
BLUE – inference to population of workers
BLUP – inference to specific worker or set of workers
note impact of standard error
14 May 2007
SSP Core Facility
240
Department of Statistics
BLUP a.k.a. “Shrinkage Estimator”
Covariance Parameter Estimates
Cov Parm
operator
Estimate
0.1073
machine*operator
0.05100
Residual
0.04852
e.g. operator BLUP is
 BLUP is regressed toward
mean
 BLUP is E(u|Y)
 Degree of skrinkage
depends of variance
component estimates
E (oi )  Cov(oi , y j  ) Var ( y j  ) 
14 May 2007
1
y
 j
SSP Core Facility
 y 
241
Department of Statistics
Relationship to Proc GLM
proc glm;
operator
y LSMEAN
class machine operator;
1
51.7625000
model y=machine|operator;
random operator machine*operator/test;
vs. 51.74, 0.1151
lsmeans machine operator
machine*operator/stderr;
machine
operator
y LSMEAN
lsmeans machine/stderr
1
1
51.3550000
e=machine*operator;
estimate 'diff' machine 1 -1/e;
vs 51.30, 0.1724
run;
machine
y LSMEAN
Standard
Error
1
50.9483333
0.1583947
SSP Core Facility
0.1101420
Standard
Error
0.1557642
machine
y LSMEAN
Standard
Error
1
50.9483333
0.0899305
std error neither Mixed broad or narrow
produced by
estimate “m1” intercept 3 machine 3 0 |
operator 1 1 1 machine*operator 0 / divisor=3
14 May 2007
Standard
Error
same as BLUP
specific to workers
in GLIMMIX
242
Department of Statistics
Prediction Example II: Multi-Location Data




From SAS for Mixed Models, 9 Locations
3 blocks per location
4 treatments
Major issues
 are blocks fixed or random?
 if random how does one estimate location-specific
treatment effects?
14 May 2007
SSP Core Facility
243
Department of Statistics
ANOVA (ignoring block)
Source
d.f.
Expected Mean Square
Treatment
3
2
 2  k1 LT
 QTRT
Location
8
2
 2  k1 LT
 k2 L2
Loc  Trt
24
2
 2  k1 LT
error
dfe
2
Test of TRT
affected
If Location fixed:
14 May 2007
Source
d.f.
Expected Mean Square
Treatment
3
 2  QTRT
Location
8
 2  QLOC
Loc  Trt
24
 2  QLT
error
dfe
2
SSP Core Facility
244
Department of Statistics
Inference Space
Assuming Locations are Fixed
Var(trt mean)=
2
# obs/trt
 Std. error(trt mean)=
MS(error)
# obs/trt
HOWEVER... if Locations are Random
Var(trt mean)=
2
 2  k ( L2   LT
)
# obs/trt
 Std. error(trt mean)=
14 May 2007
SSP Core Facility
2
ˆ 2  k (ˆ L2  ˆ LT
)
# obs/trt
245
Department of Statistics
Where does Uncertainty Arise?
Loc 1
Loc 2
Only from variation among obs within locations?
Locations fixed
Or does variation among locations also contribute?
Locations random
Loc 7
14 May 2007
Loc 8
SSP Core Facility
246
Department of Statistics
Location-Specific Effects: BLUP
In Multi-Location trial, location-specific effect is
e.g. trt 1 vs trt 2 | location j
=  1   2   L 1 j    L 2 j  
 Implies linear combination of fixed and
random effect (predictable function = BLUP)
14 May 2007
SSP Core Facility
247
Department of Statistics
Basic SAS Programs
for fixed location:
proc glimmix data=MultiCenter;
class location block treatment;
model response=location treatment location*treatment;
random block(location);
lsmeans treatment;
lsmeans location*treatment/slice=location slicediff=location;
run;
for random locations
proc glimmix data=MultiCenter;
class location block treatment; model response=treatment/ddfm=KR;
random location block(location) location*treatment;
lsmeans treatment/diff;
estimate 'trt1 vs trt2' treatment 1 -1 0;
estimate 'loc A vs loc B' | location 1 -1 0;
estimate 'trt 1 BLUP' intercept 8 treatment 8
| location 1 1 1 1 1 1 1 1/divisor=8;
estimate 'trt1 at loc A blup' intercept 1 treatment 1 0 0 0
| location 1 0 location*treatment 1 0;
etc – see ch6 MultiCenter.sas for program in detail
14 May 2007
SSP Core Facility
248
Department of Statistics
“Take Home” points
 Inference space usually implies random locations
 “Broad” inference on treatments applies to entire
population
 Location-specific inference may be of interest
 Requires BLUP
 Hans Peter Piepho has proposed mixed-model based
measures of commonality among locations
 Making locations fixed to maximize error d.f.
to test TRT is inappropriate
14 May 2007
SSP Core Facility
249
Department of Statistics
GLM Issues
14 May 2007
SSP Core Facility
250
Department of Statistics
VII. “GLM” Issues
 Bernoulli data
− as a binomial
− special problems with BINARY data
 Counts
 Rates
14 May 2007
SSP Core Facility
251
Department of Statistics
Common Non-Normal Models
 Bernoulli (binary) observations
 Categorical data
− Binomial
− multinomial
 Counts
Contingency tables
− Poisson
− Over dispersed (e.g. negative binomial)
 Rates
 Survival times
− Gamma, Weibull
 Dispersion measures
− variance
14 May 2007
SSP Core Facility
252
Department of Statistics
Elements of GLM
(Generalized Linear Model)
 Systematic model X
 Assumed distribution
− implied variance structure
 Link function
 Examples
p = (X)
or
logit(p)=X
□ Y~ Poisson() log () = X
□ y ~ Bernoulli(p)
14 May 2007
SSP Core Facility
253
Department of Statistics
GLM Example
 From SAS for Linear
Models
 Output 10.1, reexpressed in 10.5
 Challenger space shuttle
data
 relate prob{failure} to
temperature at launch
 DATA: TEMP, TD (#
times thermal distress in
O-ring, NO_TD
14 May 2007
SSP Core Facility
254
Department of Statistics
Approach to modeling
 Assess relationship between TEMP and
Prob{TD=1}, i.e O-rings show thermal distress
Distribution: Bernoulli
 Natural parameter: logit = log[p/(1-p)]
 Model: logit(Pr{TD})=a+b(Temp)
 Inverse link form:
Pr{TD}=exp[a+b(Temp)]/{1+exp[a+b(Temp)]}
14 May 2007
SSP Core Facility
255
Department of Statistics
SAS Program: Proc GENMOD
proc glimmix data=Challenger;
model td/total=temp;
estimate 'logit at 50 deg' intercept 1
estimate 'logit at 60 deg' intercept 1
estimate 'logit at 64.7 deg' intercept
estimate 'logit at 64.8 deg' intercept
estimate 'logit at 70 deg' intercept 1
estimate 'logit at 80 deg' intercept 1
run;
14 May 2007
SSP Core Facility
temp 50 / ilink;
temp 60 / ilink;
1 temp 64.7 / ilink;
1 temp 64.8 / ilink;
temp 70 / ilink;
temp 80 / ilink;
256
Department of Statistics
Relevant Output
Fit Statistics
Pearson Chi-Square
11.13
Pearson Chi-Square / DF
no evidence of
overdispersion
0.80
Parameter Estimates
Estimate
Standard
Error
D
F
t Value
Pr > |t|
Intercept
15.0429
7.3786
14
2.04
0.0608
temp
-0.2322
0.1082
14
-2.14
0.0500
Effect
 logit ( )  15.04  0.23 X
  Pr{TD  1}
14 May 2007
X  temp (F)
SSP Core Facility
257
Department of Statistics
Relevant Output (2)
Estimates
Label
Estimate
Standard
Error DF
t Value
Pr > |t|
Mean
Standard
Error
Mean
logit at 50 deg
3.4348
2.0232
14
1.70
0.1117
0.9688
0.06121
logit at 60 deg
1.1131
1.0259
14
1.09
0.2962
0.7527
0.1909
logit at 64.7 deg
0.02197
0.6576
14
0.03
0.9738
0.5055
0.1644
logit at 64.8 deg
-0.00125
0.6518
14
-0.00
0.9985
0.4997
0.1630
logit at 70 deg
-1.2085
0.5953
14
-2.03
0.0618
0.2300
0.1054
logit at 80 deg
-3.5301
1.4140
14
-2.50
0.0256 0.02847
0.03911
logit scale
14 May 2007
SSP Core Facility
data scale
258
Department of Statistics
Alternatives
 Express data in binomial form
− SAS for Linear Models, 4th ed., output 10.5
 Probit link
   



1
e
2
z2

2
dz
std normal c.d.f.
 link function is  -1         X
 inverse link is     X 
14 May 2007
SSP Core Facility
259
Department of Statistics
Logit vs Probit
Red:
probit
Blue:
logit
14 May 2007
SSP Core Facility
260
Department of Statistics
Probit Model
proc glimmix data=Challenger;
model td/total=temp/link=probit solution;
estimate 'logit at 50 deg' intercept 1 temp 50 / ilink;
estimate
estimate
estimate
estimate
'logit
'logit
'logit
'logit
at
at
at
at
60 deg' intercept 1
64.7 deg' intercept
64.8 deg' intercept
70 deg' intercept 1
temp 60 / ilink;
1 temp 64.7 / ilink;
1 temp 64.8 / ilink;
temp 70 / ilink;
estimate 'logit at 80 deg' intercept 1 temp 80 / ilink;
run;
14 May 2007
SSP Core Facility
261
Department of Statistics
Probit Output
Parameter Estimates
Fit Statistics
Pearson Chi-Square
10.98
Pearson Chi-Square / DF
0.78
Effect
Estimate
Standard
Error
DF
t Value
Pr > |t|
8.7750
4.0286
14
2.18
0.0470
-0.1351
0.05839
14
-2.31
0.0364
Intercept
temp
Estimates
Estimate
Standard
Error
DF
t Value
Pr > |t|
Mean
Standard
Error
Mean
logit at 50 deg
2.0201
1.1413
14
1.77
0.0985
0.9783
0.05917
logit at 60 deg
0.6692
0.6024
14
1.11
0.2854
0.7483
0.1921
logit at 64.7 deg
0.03421
0.3960
14
0.09
0.9324
0.5136
0.1579
logit at 64.8 deg
0.02070
0.3925
14
0.05
0.9587
0.5083
0.1566
logit at 70 deg
-0.6818
0.3244
14
-2.10
0.0541
0.2477
0.1026
logit at 80 deg
-2.0328
0.7277
14
-2.79
0.0144
0.02104
0.03678
Label
14 May 2007
SSP Core Facility
262
Department of Statistics
Option 3: Use Binary Data
proc glimmix data=O_Ring;
Careful!! Normal default
model td_bin=temp / solution;
model td_bin=temp /dist=binomial link=logit solution;
estimate 'logit at 50 deg' intercept 1 temp 50 / ilink;
estimate 'logit at 60 deg' intercept 1 temp 60 / ilink;
estimate 'logit at 64.7 deg' intercept 1 temp 64.7 / ilink;
estimate 'logit at 64.8 deg' intercept 1 temp 64.8 / ilink;
estimate 'logit at 70 deg' intercept 1 temp 70 / ilink;
estimate 'logit at 80 deg' intercept 1 temp 80 / ilink;
run;
14 May 2007
SSP Core Facility
263
Department of Statistics
Binary Output
Fit Statistics
Pearson Chi-Square
Parameter Estimates
23.17
Pearson Chi-Square / DF
Estimate
Standard
Error
DF
t Value
Pr > |t|
Intercept
15.0429
7.3786
21
2.04
0.0543
temp
-0.2322
0.1082
21
-2.14
0.0438
Effect
1.10
no evidence of overdispersion
Estimates
Estimate
Standard
Error
DF
t Value
Pr > |t|
Mean
Standard
Error
Mean
logit at 50 deg
3.4348
2.0232
21
1.70
0.1043
0.9688
0.06121
logit at 60 deg
1.1131
1.0259
21
1.09
0.2902
0.7527
0.1909
logit at 64.7 deg
0.02197
0.6576
21
0.03
0.9737
0.5055
0.1644
logit at 64.8 deg
-0.00124
0.6518
21
-0.00
0.9985
0.4997
0.1630
logit at 70 deg
-1.2085
0.5953
21
-2.03
0.0552
0.2300
0.1054
logit at 80 deg
-3.5301
1.4140
21
-2.50
0.0209
0.02847
0.03911
Label
14 May 2007
SSP Core Facility
264
Department of Statistics
Binary Data + Random Effects
 Binary data in GLM with random effect can be
troublesome
 Pseudo-likelihood tends to produce biased
variance / covariance component estimates
 e.g. variance estimates biased down for small
cluster size
 Larger sample sizes tend to be required
 No overdispersion estimate
14 May 2007
SSP Core Facility
265
Department of Statistics
Binary GLMM example
 courtesy of Oliver
Schabenberger
 200 subjects
 random intercept
 logistic link
14 May 2007
data binary;
do subject = 1 to 200;
ranint = rannor(&seed);
do i = 1 to &n;
linp = &b0 + ranint;
pi = 1/(1 + exp(-linp));
y = ranbin(0,1,pi);
output;
end;
end;
drop i;
run;
SSP Core Facility
266
Department of Statistics
Binary GLMM
 Schabenberger used two programs
proc glimmix data=binary;
class subject;
model y(event='1') = / dist=binary link=logit s;
random intercept / subject=subject;
ods select ParameterEstimates CovParms;
run;
proc nlmixed data=binary;
parms s2 1 intercept -1;
model y ~ binary(1/(1+exp(-intercept+gamma)));
random gamma ~ normal(0,s2) subject=subject;
ods select Dimensions ParameterEstimates;
run;
14 May 2007
SSP Core Facility
267
Department of Statistics
GLIMMIX vs NLMIXED Binary Results
cluster size n=4
cluster size n=20
GLIMMIX
Covariance Parameter Estimates
Cov Parm
Subject
Intercept
subject
Covariance Parameter Estimates
Estimate
Standard
Error
0.5251
0.1699
Cov Parm
Intercept
Standard
Err
or
Estimate
Intercept
subject
0.9905
Solutions for Fixed Effects
Solutions for Fixed Effects
Effect
Subjec
t
Estimate
Standard
Error
DF
Effect
-0.7159
0.09211
199
Intercept
0.1373
Estimate
Standard
Error
DF
-0.9239
0.08020
199
Estimate
Standard
Error
DF
1.1512
0.1659
199
-0.9854
0.08691
199
NLMIXED
Parameter Estimates
Parameter
s2
intercept
14 May 2007
Parameter Estimates
Estimate
Standard
Error
DF
Parameter
0.8159
0.2718
199
s2
-0.8092
0.1085
199
intercept
SSP Core Facility
268
Department of Statistics
Diagnostics & Alternative Models






Example using count data
SAS Linear Models, Output 10.24
Historically, count data assumed ~ Poisson
Implies mean=variance
In practice, often variance>mean, overdispersion
Requires modification
− scale to correct std error, test statistics for overdispersion
− use different distribution
14 May 2007
SSP Core Facility
269
Department of Statistics
Basic analysis + model checking
proc glimmix data=a;
class BLOCK CTL_TRT a b;
model count=CTL_TRT a b a*b/dist=poisson;
random intercept / subject=BLOCK;
output out=check pred=xbeta pred(ilink)=pred
residual=r pearson=resid_pearson;
run;
data plot;
merge check;
adjlamda=2*sqrt(pred);
ystar=xbeta+(count-pred)/pred;
absres=abs(resid_pearson);
proc gplot;
plot resid_pearson*(pred xbeta);
plot (resid_pearson)*adjlamda;
plot ystar*xbeta;
plot absres*adjlamda;
run;
14 May 2007
SSP Core Facility
Model checking plots:
1. Residuals vs pred
a. use std resid
b. or deviance res
c. std’ize pred scale
look for unequal scatter
(wrong dist or var fct)
pattern in resid
(wrong model or link)
2. y* vs.  (xbeta)
linear or wrong link
270
Department of Statistics
Evidence of Overdispersion
Fit Statistics
-2 Res Log Pseudo-Likelihood
Generalized Chi-Square
Gener. Chi-Square / DF
124.06
100.15
3.34
Gener. chi-square / DF should be  1
>1 indicates overdispersion
<1 indicates underdispersion
14 May 2007
SSP Core Facility
271
Department of Statistics
Example: plot of residuals x adjlamda
14 May 2007
SSP Core Facility
272
Department of Statistics
Another look – absolute value resid vs adjlamda
14 May 2007
SSP Core Facility
273
Department of Statistics
Link? Plot ystar x XBeta
should be linear – no strong evidence of problem
14 May 2007
SSP Core Facility
274
Department of Statistics
Strategy 1: Adjust using scale parameter
Poisson log-likelihood is y log( )    log  y !
 E ( y )  Var ( y )  
Quasi-likelihood allows scale parameter

y t
Q
dt
t
Now, E ( y )  
14 May 2007
q y log( )  



Var ( y )  
SSP Core Facility
275
Department of Statistics
Implementation with GLIMMIX
proc glimmix data=a;
class BLOCK CTL_TRT a b;
model count=CTL_TRT a b a*b/dist=poisson htype=1,3;
random intercept / subject=BLOCK;
random _residual_;
run;
SCALE estimated from RANDOM _RESIDUAL_

Generalized  2
N - rank ( X )
alternatively can use 
14 May 2007
deviance
N - rank ( X )
SSP Core Facility
276
Department of Statistics
Selected Output
UnScaled
Scaled
Type I Tests of Fixed Effects
Effect
CTL_TRT
Type I Tests of Fixed Effects
Num
DF
Den
DF
F Value
Pr > F
Effect
1
27
55.83
<.0001
CTL_TRT
Type III Tests of Fixed Effects
Den
DF
F Value
Pr > F
1
27
16.23
0.0004
Type III Tests of Fixed Effects
Num
DF
Den
DF
F Value
Pr > F
CTL_TRT
0
.
.
.
0.0009
A
2
27
2.67
0.0875
0.06
0.9402
B
2
27
0.02
0.9822
3.11
0.0315
A*B
4
27
0.90
0.4753
Num
DF
Den
DF
F Value
Pr > F
CTL_TRT
0
.
.
.
A
2
27
9.19
B
2
27
A*B
4
27
Effect
Num
DF
Effect
Note discrepancy for CTL_TRT and A main effect
14 May 2007
SSP Core Facility
277
Department of Statistics
Alternative 2: different distribution
e.g. Negative Binomial
Standard math - stat text form :
( N  1)!
 y (1   ) N  y
y! ( N  y  1)!
More useful form : let N  y  k and  

 k
( y  k  1)!     k 
yields p.d.f. 

 

y! ( k  1)!    k     k 
y
k
 ( y  k  1)! 
  
 k 
 log L  y log 
  k log 
  log 

 k 
 k 
 y! ( k  1)! 
  
 k 
y log 
  k log 
  expon family, but is quasi - likelihood


k
k




E ( y )   , Var ( y )   
2
  
, natural param   log 

k
 k 
 is the mean and k is the aggregation parameter
small k  aggregation; k  Poisson
14 May 2007
SSP Core Facility
278
Department of Statistics
Negative Binomial with GLIMMIX
proc glimmix data=a;
class BLOCK CTL_TRT a b;
model count=CTL_TRT a b
a*b/dist=negbin htype=1,3;
random intercept / subject=BLOCK;
run;
Type I Tests of Fixed Effects
Effect
CTL_TRT
-2 Res Log Pseudo-Likelihood
84.48
Generalized Chi-Square
28.32
Gener. Chi-Square / DF
0.94
14 May 2007
Den
DF
F Value
Pr > F
1
27
10.08
0.0037
Type III Tests of Fixed Effects
Num
DF
Den
DF
F Value
Pr > F
CTL_TRT
0
.
.
.
A
2
27
3.53
0.0436
B
2
27
0.03
0.9753
A*B
4
27
1.02
0.4139
Effect
Fit Statistics
Num
DF
SSP Core Facility
279
Department of Statistics
Modeling with Offsets
 There are cases when modeling count alone is naive
 This occurs when counts are “per unit”
−
−
−
−
−
Number of plants per plot
Number of patients per county
Number of students per district
Number of boating accidents per year per lake
Number of defects per lot
 Accurate model must take units into account
 Essentially, based on log(count/unit)
 Log(count) is link; log(unit) is “offset”
14 May 2007
SSP Core Facility
280
Department of Statistics
Offset defined
 Idea: raw count may be artifact of unit size
 Count / unit more informative
 Offset
− adjusts for size
− is a regressor whose coefficient is assumed to be 1.0
− used especially in conjuction with Poisson models with
log link
− accounts for heterogeneity in rates resulting from
difference in size
14 May 2007
SSP Core Facility
281
Department of Statistics
Modeling with Offsets
yi
Poisson(i )
i    sizei  exp i 
log  E ( yi )   i  log     log  sizei 
 X   offset
  rate per unit size
14 May 2007
SSP Core Facility
282
Department of Statistics
Example: Courtesy of Oliver Schabenberger
 Some of the data
 X is predictor variable
 SIZE is the “unit” to be
taken into account
14 May 2007
Obs
size
x
count
1
5001
4.597
4
2
7550
4.245
76
3
1744
3.918
2
4
1451
3.273
2
5
5313
4.140
12
6
3687
3.438
4
7
3022
4.763
2
8
8809
4.445
9
9
4436
4.191
3
10
2621
4.835
6
SSP Core Facility
283
Department of Statistics
Naive Modeling (not accounting for SIZE)
proc glimmix data=test;
model count = x / s dist=poisson;
ods select FitStatistics ParameterEstimates;
run;
Fit Statistics
Parameter Estimates
-2 Log Likelihood
647.12
AIC (smaller is better)
651.12
AICC (smaller is better)
651.45
BIC (smaller is better)
654.50
CAIC (smaller is better)
656.50
HQIC (smaller is better)
652.35
Pearson Chi-Square
Pearson Chi-Square / DF
14 May 2007
Effect
Estimate
Standard
Error
D
F
t Value
Pr > |t|
2.0978
0.4143
38
5.06
<.0001
-0.01619
0.1002
38
-0.16
0.8725
Intercept
x
1078.66
28.39
SSP Core Facility
284
Department of Statistics
Poisson Model with Offset
proc glimmix data=test;
offs = log(size);
model count = x /s dist=poisson offset=offs;
ods select FitStatistics ParameterEstimates;
run;
Fit Statistics
-2 Log Likelihood
Parameter Estimates
318.41
AIC (smaller is better)
322.41
Effect
AICC (smaller is better)
322.73
Intercept
BIC (smaller is better)
325.79
x
CAIC (smaller is better)
327.79
HQIC (smaller is better)
323.63
Pearson Chi-Square
347.09
Pearson Chi-Square / DF
14 May 2007
Estimate
Standard
Error
D
F
t Value
Pr > |t|
-7.3168
0.5052
38
-14.48
<.0001
0.2247
0.1225
38
1.83
0.0746
9.13
SSP Core Facility
285
Department of Statistics
Alternative to Offset??
 Could count/size be treated as binomial?
proc glimmix data=test;
offs = log(size);
model count = x /s dist=poisson offset=offs;
output out=gmxout1 pred(ilink)=mu;
id _xbeta_ offs _linp_;
ods exclude all;
run;
proc glimmix data=test;
model count/size = x /s dist=binomial;
output out=gmxout2 pred(ilink)=prob;
ods exclude all;
run;
data gmxout2; set gmxout2;
predcount= prob * size;
14 May 2007
SSP Core Facility
286
Department of Statistics
Compare Poisson/Offset vs Binomial Results
Poisson results
MU = pred count
Obs
_xbeta_
offs
_linp_
Bimomial results
mu
Obs
size
x
count
prob
predcount
1
-6.28394
8.51739
2.23346
9.3321
1
5001
4.597
4
.001866023
9.3320
2
-6.36302
8.92930
2.56628
13.0173
2
7550
4.245
76
.001724158
13.0174
3
-6.43649
7.46394
1.02745
2.7939
3
1744
3.918
2
.001602034
2.7939
4
-6.58140
7.28001
0.69860
2.0109
4
1451
3.273
2
.001385890
2.0109
5
-6.38661
8.57791
2.19130
8.9468
5
5313
4.140
12
.001683963
8.9469
6
-6.54433
8.21257
1.66823
5.3028
6
3687
3.438
4
.001438241
5.3028
7
-6.24664
8.01367
1.76703
5.8535
7
3022
4.763
2
.001936911
5.8533
8
-6.31809
9.08353
2.76544
15.8860
8
8809
4.445
9
.001803387
15.8860
9
-6.37516
8.39751
2.02235
7.5561
9
4436
4.191
3
.001703368
7.5561
10
-6.23047
7.87131
1.64085
5.1595
10
2621
4.835
6
.001968487
5.1594
predicted counts nearly identical
14 May 2007
SSP Core Facility
287
Department of Statistics
ZIP and Hurdle Models
 Mixture models for count data
− ZIP = “zero-inflated Poisson”
− ZINB = “zero-inflated Negative Binomial”
− in principle, other zero-inflated models limited only by
imagination
 Accommodate excess zeros
− Excess zeros cause overdispersion
 Are not in exponential family
 Cannot be fit with PROC GLIMMIX
 Can be fit using PROC NLMIXED
14 May 2007
SSP Core Facility
288
Department of Statistics
ZIP Model
zi
Poisson  i 
 i  1   i  Pr  zi  0  j  0
Pr  yi  j   
 1   i  Pr  zi  0  j  0
 i  1   i  e  i


 i j e  i 
1   i   j ! 



Observation
prob of 0 from
Bernoulli process
14 May 2007
SSP Core Facility
j0
j0
prob of zero
from Poisson
process
289
Department of Statistics
Hurdle Model
 Two part model
− One process generates zeros
− Another process generates non-zeros
Pr  zi  0 

Pr  yi  j   
 Pr  ui  0  
 1  Pr  zi  0    1  Pr  u  0  
i



zeros from
Z process
14 May 2007
observation
SSP Core Facility
j0
j0
truncated at zero
distribution
290
Department of Statistics
ZIP or Hurdle?
 Number of doctor visits per year
 Number of fish caught by sport fishermen
 Cancer mortality
14 May 2007
SSP Core Facility
291
Department of Statistics
From SAS for Mixed Models, 2nd ed, Ch 15


%let pi = 0.27;
data zip;
do s = 1 to 100;
u = rannor(556712);
Credit:
do i = 1 to 20;
x = int(ranuni(0)*100);
Oliver
y = int(rannor(0)*100);
Schabenberger
if (ranuni(0) < &pi) then do;
count = 0;
lambda = .;
end; else do;
lambda = exp(-2 + 0.01*x + 0.01*y + u);
count = ranpoi(0,lambda);
end;
output;
end;
end;
drop i u lambda;
run;
14 May 2007
SSP Core Facility
292
Department of Statistics
ZIP Model with Random Effects
proc nlmixed data=zip;
parameters b0=0 b1=0 b2=0 a0=0 s2u=1;
/* linear predictor for the inflation probability
*/
linpinfl = a0;
/* infprob = inflation probability for zeros
*/
/*
= logistic transform of the linear predictor*/
infprob = 1/(1+exp(-linpinfl));
/* Poisson mean */
lambda
= exp(b0 + b1*x + b2*y + u);
/* Build the ZIP log likelihood */
if count=0 then
ll = log(infprob + (1-infprob)*exp(-lambda));
else ll = log((1-infprob)) + count*log(lambda)-lgamma(count+1)-lambda;
model count ~ general(ll);
random u ~ normal(0,s2u) subject=s;
estimate "inflation probability" infprob;
run;
14 May 2007
SSP Core Facility
293
Department of Statistics
ZIP NLMIXED Selected Results
true parameter values
b0=-2 b1=b2=0.01
a0=-0.9946 s2u=1
Fit Statistics
-2 Log Likelihood
2803.6
AIC (smaller is better)
2813.6
AICC (smaller is better)
2813.7
BIC (smaller is better)
2826.7
Parameter Estimates
Estimate
Standard
Error
DF
t Value
Pr > |t|
Alpha
Lower
Upper
Gradient
b0
-1.9979
0.1530
99
-13.06
<.0001
0.05
-2.3014
-1.6944
-0.00224
b1
0.01011
0.001299
99
7.78
<.0001
0.05
0.007535
0.01269
-0.15649
b2
0.01016
0.000394
99
25.78
<.0001
0.05
0.009378
0.01094
-0.0434
a0
-1.0934
0.1594
99
-6.86
<.0001
0.05
-1.4097
-0.7771
-0.00034
s2u
1.0828
0.2095
99
5.17
<.0001
0.05
0.6671
1.4985
-0.00145
Parameter
Additional Estimates
Label
inflation probability
14 May 2007
Estimate
Standard
Error
DF
t Value
Pr > |t|
Alpha
Lower
Upper
0.2510
0.02997
99
8.38
<.0001
0.05
0.1915
0.3104
SSP Core Facility
294
Department of Statistics
GLMM Multi-Clinic Binomial Data
 SAS for Linear Models, Output 10.9






also SAS for Mixed Models, Ch 14
from Beitler & Landis, Biometrics, 1985
2 treatments (drug, cntl)
8 clinics, represent population
nij patients observed on trt i at clinic j
yij have favorable response
14 May 2007
SSP Core Facility
295
Department of Statistics
GLMM for Beitler Landis Data
 ij  Pr  favorable | trt  i, clinic  j
  ij 
Model: log 
    i  c j  (ct )ij
 1   
ij 

2
c j iid N (0,  C2 );  ct ij iid N (0,  CT
)
proc glimmix data=a;
class clinic trt;
model fav/nij= trt/dist=binomial link=logit;
random intercept trt / subject=clinic;
lsmeans trt/odds;
estimate 'lsm - cntl' intercept 1 trt 1 0 /ilink;
estimate 'lsm - drug' intercept 1 trt 0 1 / ilink;
estimate 'diff' trt 1 -1;
contrast 'diff' trt 1 -1;
run;
14 May 2007
SSP Core Facility
Covariance Parameter Estimates
Cov Parm
Subject
Estimate
Intercept
clinic
2.0103
trt
clinic
0.06057
296
Department of Statistics
If you drop Clinic x Trt
proc glimmix data=a;
class clinic trt;
model fav/nij= trt/dist=binomial link=logit;
random intercept / subject=clinic;
lsmeans trt/odds;
estimate 'lsm - cntl' intercept 1 trt 1 0 /ilink;
estimate 'lsm - drug' intercept 1 trt 0 1 / ilink;
estimate 'diff' trt 1 -1;
contrast 'diff' trt 1 -1;
run;
proc glimmix data=a;
class clinic trt;
model fav/nij= trt/dist=binomial link=logit;
random _residual_ / type=cs subject=clinic;
lsmeans trt/odds;
estimate 'lsm - cntl' intercept 1 trt 1 0 /ilink;
estimate 'lsm - drug' intercept 1 trt 0 1 / ilink;
estimate 'diff' trt 1 -1;
contrast 'diff' trt 1 -1;
run;
14 May 2007
SSP Core Facility
conditional
(SS)
model
marginal
(PA)
model
297
Department of Statistics
Selected Output – Conditional Model
Type III Tests of Fixed Effects
Covariance Parameter Estimates
Cov
Parm
Estimate
Standard
Error
clinic
2.0327
1.2637
Effect
Num
DF
Den
DF
F Value
Pr > F
1
7
5.98
0.0444
trt
Estimates
Estimate
Standard
Error
D
F
t Value
Pr > |t|
Mean
Standard
Error
Mean
lsm - cntl
-1.1464
0.5586
7
-2.05
0.0793
0.2411
0.1022
lsm - drug
-0.4220
0.5552
7
-0.76
0.4720
0.3960
0.1328
diff
-0.7244
0.2963
7
-2.45
0.0444
Label
trt Least Squares Means
Estimate
Standard
Error
DF
t Value
Pr > |t|
Odds
cntl
-1.1464
0.5586
7
-2.05
0.0793
0.3178
drug
-0.4220
0.5552
7
-0.76
0.4720
0.6557
trt
14 May 2007
SSP Core Facility
298
Department of Statistics
GLMM with NLMIXED
1. data step to define indicator for Trt=1 (because NLMIXED
lacks CLASS statement)
data a;
input clinic trt $ fav unfav;
nij=fav+unfav; t1=(trt='drug');
2. then, run NLMIXED
proc nlmixed;
parms mu=1 tau=0 s2c=2;
eta=mu+tau*t1+cj;
pij=exp(eta)/(1+exp(eta));
model fav~binomial(nij,pij);
random cj~normal(0,s2c) subject=clinic;
estimate 'trt effect' tau;
estimate 'ctl p_hat' exp(mu)/(1+exp(mu));
estimate 'drug p_hat' exp(mu+tau)/(1+exp(mu+tau));
estimate 'diff on p_hat scale'
exp(mu+tau)/(1+exp(mu+tau)) - exp(mu)/(1+exp(mu));
run;
14 May 2007
SSP Core Facility
299
Department of Statistics
NLMIXED with CxT term included
first, also define Trt=2 indicator, here denoted t2
proc nlmixed;
parms mu=1 tau=0 s2c=2 s2ct=0.08;
eta=mu+tau*t1+cj+c1j*t1+c2j*t2;;
pij=exp(eta)/(1+exp(eta));
model fav~binomial(nij,pij);
random cj c1j c2j~normal([0,0,0],[s2c,0,s2ct,0,0,s2ct])
subject=clinic;
estimate 'trt effect' tau;
estimate 'ctl p_hat' exp(mu)/(1+exp(mu));
estimate 'drug p_hat' exp(mu+tau)/(1+exp(mu+tau));
estimate 'diff on p_hat scale'
exp(mu+tau)/(1+exp(mu+tau)) - exp(mu)/(1+exp(mu));
run;
14 May 2007
SSP Core Facility
300
Department of Statistics
Binary Repeated Measures





2 treatments
20 subjects (animals) per trt
5 times of measurement
response at each measurement 0/1
suggested by companion animal vaccine trials
14 May 2007
SSP Core Facility
301
Department of Statistics
Several approaches
 GEE using GENMOD
 PQL using %GLIMMIX
− random subj(trt), or
− CS
 G-H quadrature using NLMIXED
 (not shown) but you could use MIXED
 type 1 error control of PQL + random subj(trt) not
acceptable
 power of PQL/CS or NLMIXED > GEE
14 May 2007
SSP Core Facility
302
Department of Statistics
various SAS pgm for binary rpt-M data
GEE
PQL
random an(trt)
CS
proc genmod;
class trt animal day;
model y=trt|day/dist=bin type1 type3;
repeated subject=animal(trt)/ type=exch;
Proc GLIMMIX;
CLASS trt animal day;
MODEL y=trt|day / dist=binomial link=logit;
random animal(trt);
Proc GLIMMIX;
CLASS trt animal day;
MODEL y=trt|day / dist=binomial link=logit;
random day / rside type=cs subject=animal(trt);
NLMixed next page
14 May 2007
SSP Core Facility
303
Department of Statistics
NLMixed
data nlmx;
set univar;
t1=(trt=1); t2=(trt=2);
d1=(day=1); d2=(day=2);
d3=(day=3); d4=(day=4);
d5=(day=5);
proc nlmixed;
parms mu=1 a1=1 b1=1 b2=1 b3=1 b4=1
ab11=1 ab12=1 ab13=1 ab14=1 sb2=1;
eta=mu+a1*t1+b1*d1+b2*d2+b3*d3+b4*d4+
ab11*t1*d1+ab12*t1*d2+ab13*t1*d3+ab14*t1*d4;
pi=exp(eta+bse)/(1+exp(eta+bse));
model y~binary(pi);
random bse~normal(0,sb2) subject=id;
contrast 'trt' a1;
contrast 'day' b1,b2,b3,b4;
contrast 'trt x day' ab11,ab12,ab13,ab14;
14 May 2007
SSP Core Facility
304
Department of Statistics
Poisson Repeated Measures







Output 10.39 SAS for Linear Models
Leppik, et al (1985); Thall & Vail (1990)
2 treatments
28 patients on trt=0; 31 on trt=1
4 times of measurement
epilespsy: # seizures in 4 test periods
baseline & age covariates
14 May 2007
SSP Core Facility
305
Department of Statistics
Model for seizure data
denote ij  mean count (# seizures) trt i, time j
GL Model is:
log(ij )     i   j  ( )ij  1i (log_ base)   2 (log_ age)
Assume CS working correlation structure among repeated measures
using GEE
proc genmod data=seizure;
class id trt time;
/* this model first */
*model y=trt time trt*time log_base trt*log_base log_age/
dist=poisson link=log type1 type3;
/* then this model */
model y=trt time log_base(trt)log_age/
dist=poisson link=log type1 type3;
repeated subject=id / type=exch corrw;
see SAS file for %GLIMMIX approach
14 May 2007
SSP Core Facility
306
Department of Statistics
GENMOD to GLIMMIX
using GEE
proc genmod data=seizure;
class id trt time;
model y=trt time log_base(trt)log_age/
dist=poisson link=log type1 type3;
repeated subject=id / type=exch corrw;
equivalent GLIMMIX
proc glimmix data=seizure;
class id trt time;
model y=trt time log_base(trt)log_age/
dist=poisson link=log;
random time / type=cs subject=id residual;
14 May 2007
SSP Core Facility
307
Department of Statistics
Degrees of Freedom & Standard Errors
 Recall Satterthwaite approximation & KenwardRoger bias adjustment in LMM
 Same issues exist with GLMM
 But not nearly as well researched
 You can use SATTERTH and KR options in GLIMMIX
with non-normal data & non-identity link
 But what do they do?
14 May 2007
SSP Core Facility
308
Department of Statistics
Power
14 May 2007
SSP Core Facility
309
Department of Statistics
VIII. Power
 Many software packages for power & sample size
− e.g SAS PROC POWER
− for FIXED effect models only
 What if you have “Mixed Model Issues”?
− random effects
− split-plot structure
− errors potentially correlated: longitudinal or spatial data
− any other non-standard model structure
 Methods based on PROC GLIMMIX
− adapted from Stroup (2002, JABES)
14 May 2007
SSP Core Facility
310
Department of Statistics
Mixed Model Background – G, R unknown
( K ' ˆ )' [ L' Cˆ L]1 ( K ' ˆ )
F ( K '   0) 
rank ( K )
Cˆ is estimate of C using estimated components of G and R
F ~ approx F[ rank( K ), ,]
 may be obvious from design or may need to be approximat ed
e.g. Satterthwa ite, Kenward - Roger
  ( K ' )' [ L' CL]1 ( K ' )
14 May 2007
SSP Core Facility
311
Department of Statistics
Computing Power using SAS

create data set like proposed design (O’Brien: “exemplary data set”)

run PROC GLIMMIX with covariance components fixed

=(F computed by GLIMMIX)rank(K) [or chi-sq with GLM]

use GLIMMIX to compute 

critical F (Fcrit ) is value s.t.
P{F (rank(K), υ, 0 ) > Fcrit}= 

Power = P{F [rank(K), υ, ] >Fcrit }

SAS functions can compute Fcrit & Power
14 May 2007
SSP Core Facility
[or chi-square]
312
Department of Statistics
Compute Power with GLIMMIX – CRD example
/* step 1 - create data set with same structure
as proposed design
use MU (expected mean) instead of
observed Y_ij values
*/
/* this example shows power for 5, 10, and 15
e.u. per trt
*/
data crdpwrx1;
input trt mu;
do n=5 to 15 by 5;
do eu=1 to n;
output;
end;
end;
cards;
1 100
2 94
3 90
;
14 May 2007
SSP Core Facility
313
Department of Statistics
Compute Power with GLIMMIX – CRD example
/* step 2 - use PROC GLIMMIX to compute non-centrality parameters
for
ANOVA tests & contrasts
ODS statements output them to new data sets
*/
proc sort data=crdpwrx1;
by n;
proc glimmix data=crdpwrx1;
by n;
class trt;
model mu=trt;
parms (100)/hold=1;
contrast 'et1 v et2' trt 0 1 -1;
contrast 'c vs et' trt 2 -1 -1;
ods output tests3=b;
ods output contrasts=c;
run;
14 May 2007
SSP Core Facility
314
Department of Statistics
Type III Tests of Fixed Effects
Effect
Contrasts
Num
DF
Den
DF
F Value
Pr > F
Label
2
12
1.27
0.3169
trt
Num
DF
Den
DF
F Value
Pr > F
et1 v et2
1
12
0.40
0.5390
c vs et
1
12
2.13
0.1698
/* step 3: combine ANOVA & contrast n-c parameter data sets
use SAS functions PROBF and FINV to compute power
data power;
set b c;
alpha=0.05;
ncparm=numdf*fvalue;
fcrit=finv(1-alpha,numdf,dendf,0);
power=1-probf(fcrit,numdf,dendf,ncparm);
proc print;
Obs
n
Effect
NumDF
DenDF
FValue
ProbF
1
5
trt
2
12
1.27
0.3169
2
5
1
12
0.40
0.5390
3
5
1
12
2.13
0.1698
14 May 2007
Label
alpha
ncparm
0.05
et1 v et2
c vs et
SSP Core Facility
*/
fcrit
power
2.53333
3.88529
0.22361
0.05
0.40000
4.74723
0.08980
0.05
2.13333
4.74723
0.26978
315
Department of Statistics
More Advanced Example




Plots in 8 x 3 grid
Main variation alone 8 “rows”
3 x 2 treatment design
Alternative designs
− randomized complete block (4 blocks, size 6)
− incomplete block (8 blocks, size 3)
− split plot
 RCBD “easy” but ignores natural variation
14 May 2007
SSP Core Facility
316
Department of Statistics
Picture the 8 x 3 Grid
Gradient
14 May 2007
SSP Core Facility
317
Department of Statistics
SAS Programs to Compare 8 x 3 Design
data a;
input bloc trtmnt @@;
do s_plot=1 to 3;
input dose @@;
Split-Plot
mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3));
output;
end;
proc glimmix data=a noprofile;
cards;
class bloc trtmnt dose;
1 1 1 2 3
1 2 1 2 3
model mu=bloc trtmnt|dose;
2 1 1 2 3
random trtmnt/subject=bloc;
2 2 1 2 3
parms (4) (6) / hold=1,2;
3 1 1 2 3
lsmeans trtmnt*dose / diff;
3 2 1 2 3
contrast 'trt x lin'
4 1 1 2 3
trtmnt*dose 1 0 -1 -1 0 1;
4 2 1 2 3
ods output diffs=b;
;
ods output contrasts=c;
run;
14 May 2007
SSP Core Facility
318
Department of Statistics
8 x 3 – Incomplete Block
data a;
input bloc @@;
do eu=1 to 3;
input trtmnt dose @@;
mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3));
output;
end;
proc glimmix data=a noprofile;
cards;
class bloc trtmnt dose;
1
1 1
1 2
1 3
model mu=trtmnt|dose;
2
1 1
1 2
2 2
3
1 1
1 3
2 3
random intercept / subject=bloc;
4
1 1
2 1
2 2
parms (4) (6) / hold=1,2;
5
1 2
1 3
2 2
lsmeans trtmnt*dose / diff;
6
1 2
2 1
2 3
contrast 'trt x lin'
7
1 3
2 1
2 3
trtmnt*dose 1 0 -1 -1 0 1;
8
2 1
2 2
2 3
ods output diffs=b;
;
ods output contrasts=c;
run;
14 May 2007
SSP Core Facility
319
Department of Statistics
8 x 3 Example - RCBD
data a;
input trtmnt dose @@;
do bloc=1 to 4;
mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3));
output;
end;
cards;
1 1 1 2 1 3 2 1 2 2 2 3
;
proc glimmix data=a noprofile;
class bloc trtmnt dose;
model mu=bloc trtmnt|dose;
parms (10) / hold=1;
lsmeans trtmnt*dose / diff;
contrast 'trt x lin'
trtmnt*dose 1 0 -1 -1 0 1;
ods output diffs=b;
ods output contrasts=c;
run;
14 May 2007
SSP Core Facility
320
Department of Statistics
Power for GLMs




2 treatments
P{favorable outcome}
for trt 1 p= 0.30; for trt 2 p=0.25
power if n1=300; n2=600
data a;
input trt y n;
datalines;
1 90 300
2 150 600
;
14 May 2007
proc glimmix;
class trt;
model y/n=trt / chisq;
ods output tests3=pwr;
run;
data power;
set pwr;
alpha=0.05;
ncparm=numdf*chisq;
fcrit=cinv(1-alpha,numdf,0);
power=1-probchi(fcrit,numdf,ncparm);
proc print; run;
SSP Core Facility
321
Department of Statistics
Power for GLMM





Same trt and sample size per location as before
10 locations
Var(Location)=0.25; Var(Trt*Loc)=0.125
Variance Components: variation in log(OddsRatio)
Power?
data a;
input trt y n;
do loc=1 to 10;
output;
end;
datalines;
1 90 300
2 150 600
;
14 May 2007
proc glimmix data=a initglm;
class trt loc;
model y/n = trt / oddsratio;
random intercept trt / subject=loc;
random _residual_;
parms (0.25) (0.125) (1) / hold=1,2,3;
ods output tests3=pwr;
run;
SSP Core Facility
322
Department of Statistics
GLMM Power Analysis Results
Odds Ratio Estimates
trt _trt
1
Estimate DF
2
Obs
1.286
Effect
1 trt
9
Gives you expected
Conf Limits for
# Locations & N / Loc
contemplated
95%
Confidence
Limits
0.884
1.871
NumDF
DenDF
alpha
ncparm
fcrit
power
1
9
0.05
2.29868
5.11736
0.27370
Gives you the power of the
test of TRT effect on
prob(favorable)
14 May 2007
SSP Core Facility
323
Department of Statistics
GLMM Power: Impact of Sample Size?
 N of subjects per trt
per location?
 N of Locations?
data a;
input trt y n;
do loc=1 to 10;
output;
end;
datalines;
1 90 300
2 150 600
;
14 May 2007
Three cases
1. n-300/600 10 loc
2. n=600/1200, 10 loc
3. n=300/600, 20 loc
data a;
input trt y n;
do loc=1 to 10;
output;
end;
datalines;
1 180 600
2 300 1200
;
SSP Core Facility
data a;
input trt y n;
do loc=1 to 20;
output;
end;
datalines;
1 90 300
2 150 600
;
324
Department of Statistics
GLMM Power: Impact of Sample Size?
Recall, for 10 locations, N=300/600,
CI for OddsRatio was (0.884, 1.871); Power was 0.274
For 10 locations, N=600 / 1200
N alone
has almost
no impact
Odds Ratio Estimates
trt
_trt
1
2
Obs
Effect
1 trt
Estimate
DF
1.286
9
95% Confidence Limits
0.891
NumDF
DenDF
alpha
1
9
0.05
1.855
ncparm
fcrit
power
2.40715 5.11736
0.28421
For 20 locations, N=300 / 600
Odds Ratio Estimates
trt
_trt
1
2
Obs
Effect
1 trt
14 May 2007
Estimate
DF
95% Confidence Limits
1.286
19
NumDF
DenDF
alpha
ncparm
fcrit
power
1
19
0.05
4.59736
4.38075
0.53003
1.006
SSP Core Facility
1.643
325
Department of Statistics
Spatial Data
14 May 2007
SSP Core Facility
326
Department of Statistics
Example 5 - Spatial
from SAS for Mixed Models, Sect. 11.7
“Alliance” Data from Stroup, Baenziger, and Mulitze (1994)
in GLIMMIX-speak:
data two; set alliance;
obs = _n_;
proc glimmix data=two;
class Entry Rep obs;
model Yield=Entry/ddfm=kr;
random intercept/subject=rep;
random obs / type=sp(sph)(latitude longitude);
parms (0.1) (43.4) (27.5) (11.5);
lsmeans entry;
14 May 2007
SSP Core Facility
327
Department of Statistics
IX. Spatial Data
 Example from SAS for Mixed Models
− Spatial errors in Treatement Comparison studies only
− No spatial mapping, Kriging
 Standard parametric models from Geostatistics
 RSMOOTH alternative
 Issues
14 May 2007
SSP Core Facility
328
Department of Statistics
From Stroup, Baenziger & Mulitze (Crop Science, 1994)
56 varieties, 4 blocks, e.u. = 4.3  1.2 m plots
L AT
47. 30
36. 55
25. 80
15. 05
4. 30
1. 2
7. 5
13. 8
20. 1
26. 4
L NG
r ep
14 May 2007
1
2
SSP Core Facility
3
4
329
Department of Statistics
Contour Plot of Response
N
B
B
N
N
B
B
N
B = Buckskin
14 May 2007
N = NE86503
SSP Core Facility
330
Department of Statistics
Additional GLIMMIX Code to
Plot Spatial Variability
output out=gmxout2 pred=p;
ods output lsmeans=lsm2;
id entry latitude longitude _zgamma_;
run;
proc means data=gmxout2; var _zgamma_; run;
proc print data=gmxout2(OBS=20); run;
proc g3d data=gmxout2;
plot latitude*longitude=_zgamma_ /grid;
14 May 2007
SSP Core Facility
331
Department of Statistics
Plot of Spherical Covariance
14 May 2007
SSP Core Facility
332
Department of Statistics
Alternative Using RSMOOTH
 Advantage in Theory: RSMOOTH does not require
parametric model of spatial variation, which can be
unrealistic
 e.g. Alliance data spatial variation is from winter
kill
proc glimmix data=alliance;
class Entry Rep;
model Yield=Entry /ddfm=kr;
*model Yield=Entry latitude longitude/ddfm=kr;
random intercept/subject=rep;
random latitude longitude / type=rsmooth;
14 May 2007
SSP Core Facility
333
Department of Statistics
RSMOOTH?
 From Penalized Spline
− Ruppert, Wand, and Carroll (2003, SemiParametric
Regression, Cambridge)
Prediction: yˆ  B( x) ˆ
Objective Function :
Q*   ;     y  B ( x)    y  B ( x)     D
14 May 2007
SSP Core Facility
334
Department of Statistics
RSMOOTH (2)
 Rewrite the model
y   0  1 xi    j  xi   j   e
j
 j is "knot" a.k.a. "join point"
Rexpress:
y  X   Z  e
then
Q *   ;   y  X   Z
14 May 2007
2
SSP Core Facility
  2 
335
Department of Statistics
RSMOOTH (2)
Spline:
y  y  X   B  y  X   B   D



LMM:
y   y  X   Zu   y  X   Zu    2 
14 May 2007
SSP Core Facility
336
Department of Statistics
RSMOOTH yields following Spatial Plot
14 May 2007
SSP Core Facility
337
Department of Statistics
RSMOOTH vs SP(SPH)
Sp(SPH)
RSMOOTH
Type III Tests of Fixed Effects
Num
Effect DF
Entry
14 May 2007
Den
DF F Value
55 138.1
1.85
Type III Tests of Fixed Effects
Pr > F
Effect
Num
DF
0.0021
Entry
55
SSP Core Facility
Den
DF
148.2
F Value
Pr > F
1.77
0.0038
338
Department of Statistics
However...
Plot of LSMeans from two approaches
LSM_RSMOOTH average 31.06
LSM_SP_SPH average 24.40
14 May 2007
SSP Core Facility
????
339
Department of Statistics
14 May 2007
SSP Core Facility
340
Department of Statistics
Some NLMM Issues
 Consulting problem at UNL
 Why nonlinear mixed model (NLMM) seemed
appropriate
 Problems in implementation
  NLMM issues
 Alternatives whose implications are not
adequately understood
14 May 2007
SSP Core Facility
341
Department of Statistics
Wheat Sawfly Study
 Gary Hein, Research Entomologist, Scottsbluff,
NE RREC
 Sawflies inhabit/damage wheat
 5 tillage treatments: impact on sawflies
 Exp design used 4 randomized blocks
 Sawfly emergence measured at planned times
during growing season
14 May 2007
SSP Core Facility
342
Department of Statistics
Emergence over TIME by TRT
Black:
NoTill
Red:
SumBlade
(summer)
Cyan:
SB&SD
Green:
SpDisk
(spring)
Blue:
SpPlow
14 May 2007
SSP Core Facility
343
Department of Statistics
“Conventional” Analysis
Emerge =  + TRT + blk + blk*trt + DATE + TRT*DATE + date*blk(trt)
• blk*trt a.k.a. between subjects or “whole-plot” error
• date*blk(trt) = within subjects or “split-plot” error
ANOVA:
14 May 2007
Source
df
blk
TRT
betw subj error
DATE
TRT*DATE
within subj error
3
4
12
12
48
180
SSP Core Facility
344
Department of Statistics
Standard ANOVA
model: emerge =
+ blk + TRT +w.p.error + TIME + TRT*TIME + s.p. error
The Mixed Procedure
CS covariance
fit adequately
Covariance Parameter Estimates
Cov Parm
blk
blk*trt
Residual
Estimate
0.002177
0.005199
0.01845
Type 3 Tests of Fixed Effects
Effect
trt
date
trt*date
14 May 2007
Num
DF
Den
DF
F Value
Pr > F
4
12
12
180
13.18
157.38
0.0002
<.0001
5.18
<.0001
48
180
SSP Core Facility
345
Department of Statistics
Break out TRT*DATE effect
Type 1 Tests of Fixed Effects
Effect
trt
lin
quad
cubic
date
lin*trt
quad*trt
cubic*trt
trt*date
14 May 2007
Num
DF
Den
DF
F Value
Pr > F
4
1
1
1
9
4
4
4
36
12
177
177
177
177
177
177
177
177
15.62
2273.39
7.24
161.10
2.95
0.59
26.69
2.13
3.08
0.0001
<.0001
0.0078
<.0001
0.0027
0.6716
<.0001
0.0792
<.0001
SSP Core Facility
346
Department of Statistics
Alternative Modeling Considerations
Basic
form of
Model : yijk  ij  blkk  wik  eijk
ij  mean of i th trt at jth time
wik  whole  plot error ~ i.i.d . N (0,  W2 )
Modeling ij
eijk  split  plot error ~ i.i.d . N (0,  2 )
1. Decompose ij in “standard ANOVA” +Trt+Time+Trt*Time
2. Further decompose via polynomial regression
3. Nonlinear decomposition, e.g. Gompertz
4. Transform yijk to “linearize” response profile over date
a. logit or probit (assume sigmoid profile is symmetric)
b. complementary log-log (allows asymmetry)
14 May 2007
SSP Core Facility
347
Department of Statistics
Gompertz Model :
ij   i exp{ exp[ i  ( i  date j )]}
 i is asymptote of i treatment
th
 i is "slope" of i treatment
i is inflection point of i th treatment
i
th
14 May 2007
SSP Core Facility
348
Department of Statistics
Parameter Estimates
Parameter
DF
t Value
Pr > |t|
a1
a2
a3
a4
a5
0.9949
0.9666
0.9868
1.0037
0.9236
0.03629
0.03793
0.04609
0.06284
0.04390
19
19
19
19
19
27.42
25.48
21.41
15.97
21.04
<.0001
<.0001
<.0001
<.0001
<.0001
b1
b2
b3
b4
b5
0.5435
0.4822
0.4506
0.3431
0.8544
0.08104
0.08743
0.09845
0.06859
0.1810
19
19
19
19
19
6.71
5.52
4.58
5.00
4.72
<.0001
<.0001
0.0002
<.0001
0.0001
c1
0.3615
0.05388
19
6.71
<.0001
c2
c3
c4
c5
0.3224
0.2940
0.2186
0.5319
0.05841
0.06370
0.04360
0.1125
19
19
19
19
5.52
4.62
5.01
4.73
<.0001
0.0002
<.0001
0.0001
s2w
s2s
14 May 2007
Estimate
Standard
Error
0.002926
0.01598
0.001355
0.001462
SSP Core Facility
These are
ML estimates
Bias?
349
Department of Statistics
Fit of Gompertz
14 May 2007
SSP Core Facility
350
Department of Statistics
Trt Comparisons with NLMIXED
Contrasts
Label
among a
among b
among c
a: nt vs sum bld
a: nt+sb vs sb&sd
a: sp dsk vs sp plow
a: nt+sb vs sp d+p
b: nt vs sum bld
b: nt+sb vs sb&sd
b: sp dsk vs sp plow
b: nt+sb vs sp d+p
c: nt vs sum bld
c: nt+sb vs sb&sd
c: sp dsk vs sp plow
c: nt+sb vs sp d+p
14 May 2007
Num
DF
Den
DF
F Value
Pr > F
4
4
4
1
1
1
1
1
1
1
1
1
1
1
1
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
0.50
2.19
2.30
0.29
0.01
1.09
0.14
0.26
0.29
6.97
0.57
0.24
0.41
6.74
0.21
0.7383
0.1085
0.0966
0.5956
0.9108
0.3089
0.7169
0.6132
0.5950
0.0161
0.4590
0.6279
0.5305
0.0177
0.6497
SSP Core Facility
351
Department of Statistics
Issues with Test Results
 denominator degrees of freedom?
DF in NLMIXED based on simple N-1 rule
MIXED uses Satterthwaite/KR
NLMIXED analog?
 bias in test statistics?
In MIXED, ML variance estimates biased 
Test statistics biased 
Excessive type I error rates familiar in MIXED
Same in NLMIXED?
14 May 2007
SSP Core Facility
352
Department of Statistics
Alternative NLMIXED Analysis
1. Use MIXED to obtain REML estimates of W2
and S2
2. Include REML variance component estimates
in NLMIXED as known
3. NLMIXED will compute std errors and test
statistics using REML estimates
14 May 2007
SSP Core Facility
353
Department of Statistics
NLMIXED REML Tests
MLE:
W2 = 0.002926
REML: W2 = 0.005199
Label
among a
among b
among c
a: nt vs sum bld
a: nt+sb vs sb&sd
a: sp dsk vs sp plow
a: nt+sb vs sp d+p
b: nt vs sum bld
b: nt+sb vs sb&sd
b: sp dsk vs sp plow
b: nt+sb vs sp d+p
c: nt vs sum bld
c: nt+sb vs sb&sd
c: sp dsk vs sp plow
c: nt+sb vs sp d+p
14 May 2007
S2 = 0.01598
S2 = 0.01845
Num
DF
4
4
4
1
1
1
1
1
1
1
1
1
1
1
1
Den
DF
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
SSP Core Facility
F Value
0.38
1.81
1.89
0.26
0.00
0.77
0.15
0.22
0.18
5.88
0.52
0.21
0.27
5.68
0.20
Pr > F
0.8188
0.1690
0.1537
0.6138
0.9796
0.3918
0.7046
0.6419
0.6737
0.0255
0.4788
0.6555
0.6114
0.0277
0.6586
Vs. ML
.1085
.0966
.0161
.0177
354
Department of Statistics
Hein: “What if we transform the data to linearize it, then use
MIXED?”
Denote response variable emerge by y
then:
y    exp{ exp[   (  date)]
if we assume  =1
then
log[ log( y )]    (  date)
14 May 2007
SSP Core Facility
355
Department of Statistics
Plot of CLogLog over Date by Trt
14 May 2007
SSP Core Facility
356
Department of Statistics
MIXED Analysis of CLogLog
Type 1 Tests of Fixed Effects
Effect
trt
lin
lin*trt
trt*date
Num
DF
Den
DF
F Value
Pr > F
4
1
4
55
12
180
180
180
15.69
1402.85
3.58
7.02
0.0001
<.0001
0.0077
<.0001
Test of Lin and Lin*Trt correspond to
equality of i and i for all treatments
in Gompertz NLMM
14 May 2007
SSP Core Facility
357
Department of Statistics
Decomposing Contrasts
Num
DF
Label
trt (b)
c
b: nt v sum bld
b: nt&sb vs sb&sd
b: sp d v p
b: nt&sb v sp d&p
c: nt v sum bld
c: nt&sb vs sb&sd
c: sp d v p
c: nt&sb v sp d&p
4
4
1
1
1
1
1
1
1
1
Den
DF F Value
15
120
15
15
15
15
120
120
120
120
6.12
3.62
2.15
4.37
2.27
19.96
2.11
3.49
0.99
11.08
Pr > F
Vs NLMM
0.0040
0.0080
0.1631
0.0541
0.1526
0.0005
0.1491
0.0644
0.3214
0.0012
.169
.154
.674
.026
.611
.028
NLMM too conservative?
or is Linearized LMM too liberal?
14 May 2007
SSP Core Facility
358
Department of Statistics
Unresolved Issues
14 May 2007
SSP Core Facility
359
Department of Statistics
Unresolved NLMIXED Issues
 REML vs. ML variance component estimates
 Degrees of Freedom
 Starting Values and Convergence
 Are NLMIXED tests too conservative?
 Implications for standard errors??
 Correlated error repeated measures?
 When are linearized models analyzed using LMM
(e.g. Proc Mixed) preferable?
 Design
14 May 2007
SSP Core Facility
360
Department of Statistics
GLIMMIX vs MIXED/GENMOD
 GLIMMIX has very useful mean comparison
options not available in MIXED
− especially for Factorial Simple Effects
 GLIMMIX can model true GLMM’s
 GLIMMIX is “touchy” (e.g. use of
SUBJECT=)
 Many Research Issues
− RSMOOTH
− Properties of NonNormal KR, working
correlation, DDF, etc.
− Computational Methods
14 May 2007
SSP Core Facility
361
Department of Statistics
Does GLIMMIX replace MIXED/GENMOD?
 For GLMMs – no question
 For GLMs / LMMs
− for the most part – YES
 Most GENMOD & MIXED programs can be
duplicated in GLIMMIX
− Mean Comparison features
− no need to “trick” GENMOD into GLMM with
marginal model (e.g. split-plot, rpt measures)
14 May 2007
SSP Core Facility
362
Download