GEE and Generalized Linear Mixed Models

advertisement
GEE and Generalized Linear
Mixed Models
Tom Greene
Outline
• Subject specific and population average
inference in generalized linear models
• Review of classical generalized linear models
with independent observations
• Generalized Estimating Equations
• Contrasts of GLMMs with GEEs
• GEE example
Classes of Generalized Linear Models
Linear Models
(Linear regression, ANOVA, ANCOVA)
E(Y) = X β,
Responses Independent
Generalized Linear Models
(Logistic regression, Poisson
regression, etc.)
g(E(Y)) = X β
Responses Independent
Linear Mixed Models
E(Y|b) = X β + Z b
Responses Correlated
Correlation modeled in part by
“random effects”
Generalized Linear
Mixed Models (GLMM)
g(E(Y|b)) = X β + Z b
Generalized Estimating
Equations Approach (GEE)
g(E(Y)) = X β
Responses Correlated
Responses Correlated
Correlation modeled in part by
“random effects”
Classes of Generalized Linear Models
for Correlated Data
Linear Mixed Models
E(Y|b) = X β + Z b
Responses Correlated
Correlation modeled in part by
“random effects”
Generalized Estimating
Equations Approach (GEE)
Generalized Linear
Mixed Models (GLMM)
g(E(Y|b)) = X β + Z b
g(E(Y)) = X β
Responses Correlated
Responses Correlated
Correlation modeled in part by
“random effects”
Population Average Inference
Subject Specific Inference
Classes of Generalized Linear Models
for Correlated Data
Population Average Inference
Subject Specific Inference
Generalized Estimating
Equations Approach (GEE)
Generalized Linear
Mixed Models (GLMM)
g(E(Y)) = X β
g(E(Y|b)) = X β + Z b
Responses Correlated
Responses Correlated
•
Analysis describes differences in
the mean of Y across the entire
population
•
Analysis informative from
population perspective; most
relevant from perspective of
 Policy makers
 Providers desiring to optimize
outcomes across entire
population
•
•
•
Analysis describes differences in
the mean of Y conditional on the
patient’s specific random effect b
Most relevant from an individual
patient’s perspective
Often b represent a dimension of
frailty – Hence, X β tells about the
relationship of Y to X among
patients with the same frailty
Extreme Example
Subject specific effects of X on Pr(Death), OR = 20 per 1 unit increase in X
Population average effect of X on Pr(Death), OR = 2.7 per 1 unit increase in X
Example: Toenail Data
Toenail Dermatophyte Onychomycosis:
Common toenail infection, difficult to treat, affecting more
than 2% of population.
Design: Randomized, double-blind, parallel group, multicenter
study for the comparison of two new compounds (A and B) for
oral treatment.
2 x189 patients randomized, 36 centers
48 weeks of total follow up (12 months)
12 weeks of treatment (3 months)
Measurements at months 0, 1, 2, 3, 6, 9, 12.
Research question: Severity relative to treatment of TDO ?
Review of Generalized Linear Models
(Independent Responses)
• Independent responses Yi, i = 1, 2, …, N
– Yi, with distribution from exponential family
 y  b( )

ex
p

c
(
y
,

)


– f(y;θ,ø) =
a( )


• Mean model
– μi = E(Yi|Xi1,Xi2,…,Xip)
– g(μi) = β0 + β1Xi1 + β2Xi2+ βpXip
• Variance function
– Var(Yi) = øV(μi)
– V(μi) is a known function determined by the assumed
distribution of Y within the exponential family
Review of Generalized Linear Models
(Independent Responses)
Review of Generalized Linear Models
(Independent Responses)
Review of Generalized Linear Models
(Independent Responses)
• Independent responses Yi, i = 1, 2, …, N
– Yi, with distribution from exponential family
 y  b( )

ex
p

c
(
y
,

)


– f(y;θ,ø) =
a( )


• Mean model
– μi = E(Yi|Xi1,Xi2,…,Xip)
– g(μi) = β0 + β1Xi1 + β2Xi2+ βJXiJ
• Variance function
The mean model is the
only part we have to get
right for valid largesample inference!!!
– Var(Yi) = øV(μi)
– vi = V(μi) is a known function determined by the assumed
distribution of Y within the exponential family
Extension to GEE for Longitudinal Data
GEE: Generalized Estimating Equations (Liang & Zeger, 1986;
Zeger & Liang, 1986)
• Method is semi-parametric
– estimating equations are derived without full specification
of the joint distribution of a subject’s observations
• Instead, specification of
1. The mean model for the marginal distributions of the yij
2. The variance function of yij given µij
3. The “working” correlation matrix for the vector of
repeated observations from each subject
• Relies on the independence across subjects (or clusters) to
estimate consistently the variance of the regression
coefficients
GEE Method Outline
1. Relate the marginal response μij = E(yij) to a linear
combination of the covariates g(μij) = Xtijβ
• yij is the response for subject i at time j, j = 1,2, .., J
• Xij is a p × 1 vector of covariates
• β is a p × 1 vector of regression coefficients
• g(·) is the link function
2. Describe the variance of yij as a function of the mean
V(yij) = v(μij)ø
• ø is possibly unknown scale parameter
• v(·) is a known variance function
Link and Variance Functions
• Normally-distributed response
g(μij) = μij “Identity link”
v(μij) = 1
V(yij) = ø
• Binary response (Bernoulli)
g(μij) = log[μij/(1 − μij)] “Logit link”
v(μij) = μij(1 − μij)
ø=1
• Poisson response
g(μij) = log(μij) “Log link”
v(μij) = μij
ø =1
GEE Method Outline
3. Choose the form of a n × n “working” correlation
matrix Ri for each Yi
Working Correlation Structures
Working Correlation Structures
Working Correlation Structures
(AR(1)
Working Correlation Structures
GEE Estimation
• Define Ai = n × n diagonal matrix with V(μij) as the jth
diagonal element
• Define Ri(α) = n × n “working” correlation matrix (of
the n repeated measures)
Working variance–covariance matrix for Yi equals
Vi(α) = øAi1/2 Ri(α) Ai1/2
GEE vs. GLMM
1) Target of Inference:
• GEE:
Population Average
• GLMM: Subject Specific
Notes: Recent work on perform population average inference
under GLMM models
GEE vs. GLMM
2) Outputs:
• GEE:
– Coefficients relating Y to X
• GLMM:
– Coefficients relating Y to X conditional on b
– Estimates of subject specific random effects
– Variance of subject specific random effects
GEE vs. GLMM
3) Robustness:
• GEE (with robust variance estimates):
– Inference valid in large samples even if distribution of Y
and/or variance of Y are incorrectly specified
• GLMM (with model-based estimates)
– Valid inference generally requires correct specification of
distribution of Y and of variance of Y
Notes:
1) Recent proposals for robust variance estimates under GLMM
2) Inference for Linear Mixed Models remains valid if Y is not normal for
large N
3) Caveat to GEE robustness: GEE can be biased if time dependent
covariates are used unless an independent working correlation matrix is
used
GEE vs. GLMM
4) Efficiency (power and width of confidence
intervals)
• GEE:
– Usually fairly efficient if variance function is
correctly specified
– Between subject comparisons are nearly efficient
if an independence covariance structure is used
for balanced data
• GLMM:
– Maximum likelihood estimates are asymptotically
efficient as long as the model is correctly specified
GEE vs. GLMM
5) Missing Data:
• “Classical” GEE (with robust variance estimates)
– Valid inference if data are Missing Completely At
Random (MCAR) even if variance model is wrong
– If variance model is correct, estimate of β is still
consistent if data are MAR but not MCAR (but
standard errors are not correct)
• GLMM (with model-based estimates)
– Valid inference if data are Missing At Random (MAR)
Notes:
1) Various strategies for valid GEE inference if data are MAR
Missing data
• Three general approaches to dealing with missing data
under GEE which assume MAR but not MCAR
1. Inverse probability weighting (Robins, Rotnitzky and
Zhao, JASA, 1995)
2. Multiple imputation
3. Inverse probability weighting with augmentation, or
doubly robust estimation
• Each method can incorporate covariate information not
included in the GEE model itself. This can make the
MAR assumption much more plausible.
• Methods 2 and 3 can be considerably more efficient
than standard inverse probability weighting
GEE vs. GLMM
6) Small to Moderate Samples:
• GEE (with robust variance estimates):
– Estimated standard errors are unstable and biased
downwards
• Inefficient estimating equation for estimating variance
• Effectively uses fully unstructured variance model
– “Sample size” means the number of independent
units
– Various corrections have been proposed (available in
PROC GLIMMIX)
• GLMM (with model-based estimates)
– Large-sample approximations are often invoked, but
performance usually better than GEE with small to
moderate N if model is correctly specified.
More Toenail Data
• Multicenter trial comparing active vs. control
oral treatments for toenail infection
• Repeated measurements of binary outcome:
– 0 = none or mild separation
– 1 = severe separation
• 1908 observations in 294 patients, mostly
over 1 year
**** Standard GENMOD GEE program using Robust SEs *****;
**** Binary outcome leads to default logistic link function ****;
proc genmod descending;
Class id;
model outcome = treatment month treatment*month/ dist=bin;
repeated subject=id/type=exch covb corrw;
estimate 'Control Slope' month 1/exp;
estimate 'Treartment Slope' month 1 treatment*month 1/exp;
run;
Working Correlation Matrix
Col1 Col2 Col3 Col4 Col5
Row1 1.0000 0.4212 0.4212 0.4212
Row2 0.4212 1.0000 0.4212 0.4212
Row3 0.4212 0.4212 1.0000 0.4212
Row4 0.4212 0.4212 0.4212 1.0000
Row5 0.4212 0.4212 0.4212 0.4212
Row6 0.4212 0.4212 0.4212 0.4212
Row7 0.4212 0.4212 0.4212 0.4212
Col6
0.4212
0.4212
0.4212
0.4212
1.0000
0.4212
0.4212
Col7
0.4212
0.4212
0.4212
0.4212
0.4212
1.0000
0.4212
0.4212
0.4212
0.4212
0.4212
0.4212
0.4212
1.0000
**** Standard GENMOD GEE program using Robust SEs;
**** Binary outcome leads to default logistic link function;
proc genmod descending;
Class id;
model outcome = treatment month treatment*month/ dist=bin;
repeated subject=id/type=exch covb corrw;
estimate 'Control Slope' month 1/exp;
estimate 'Treatment Slope' month 1 treatment*month 1/exp;
run;
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter
Standard 95% Confidence
Estimate Error
Limits
Z Pr > |Z|
Intercept
-0.5819 0.1720 -0.9191 -0.2446 -3.38 0.0007
treatment
0.0072 0.2595 -0.5013 0.5157 0.03 0.9779
month
-0.1713 0.0300 -0.2301 -0.1125 -5.71 <.0001
treatment*month -0.0777 0.0541 -0.1838 0.0283 -1.44 0.1509
**** Standard GENMOD GEE program using Robust SEs *****;
**** Binary outcome leads to default logistic link function ****;
proc genmod descending;
Class id;
model outcome = treatment month treatment*month/ dist=bin;
repeated subject=id/type=exch covb corrw;
estimate 'Control Slope' month 1/exp;
estimate 'Treatment Slope' month 1 treatment*month 1/exp;
run;
Can ignore in this case
Contrast Estimate Results
Mean
Mean
L'Beta Standard
Label
Estimate Confidence Limits Estimate
Error
Control Slope
0.4573 0.4427 0.4719 -0.1713 0.0300
Exp(Control Slope)
0.8426 0.0253
Treatment Slope
0.4381 0.4165 0.4599 -0.2490 0.0450
Exp(Treatment Slope)
0.7796 0.0351
Label
Contrast Estimate Results
L'Beta
ChiAlpha Confidence Limits Square Pr > ChiSq
Control Slope
0.05 -0.2301 -0.1125 32.60
<.0001
Exp(Control Slope)
0.05 0.7945 0.8936
Treatment Slope
0.05 -0.3373 -0.1607 30.57
<.0001
Exp(Treatment Slope) 0.05 0.7137 0.8515
**** GLIMMIX GLMM Estimating Subject Specific Effects ****;
**** Binary outcome leading to default logistic link function ****;
proc glimmix method=RSPL data=toenail;
Class id;
model outcome (event="1") = treatment month treatment*month/ s
dist=binary;
random int / subject=id;
estimate 'Control Slope' month 1/or;
estimate 'Treartment Slope' month 1 treatment*month 1/or cl; run;
Solutions for Fixed Effects
Standard
Effect
Estimate
Error
DF
t Value
Pr > |t|
Intercept
-0.7204
0.2370
292
-3.04
0.0026
treatment
-0.02594
0.3360 1612
-0.08
0.9385
month
-0.2782 0.03222 1612
-8.64
<.0001
treatment*month -0.09583 0.05105 1612
-1.88
0.0607
*** Small Sample;
data small; set toenail; if id <= 20;
** Standard GENMOD GEE with Robust SEs: 17 Patients Only ***;
** Binary outcome leading to default logistic link function **;
proc genmod descending;
Class id;
model outcome = treatment month treatment*month/ dist=bin;
repeated subject=id/type=exch covb corrw; run;
Parameter
Estimate
Standard 95% Confidence
Error
Limits
Z Pr > |Z|
Intercept
-0.3558 0.6272 -1.5851 0.8736 -0.57 0.5706
treatment
0.0527 0.9679 -1.8444 1.9497 0.05 0.9566
month
-0.1543 0.0991 -0.3485 0.0400 -1.56 0.1196
treatment*month 0.0272 0.1725 -0.3109 0.3654 0.16 0.8746
**** GLIMMIX GEE program using Robust SEs;
**** Binary outcome leads to default logistic link function;
**** Restricted to 17 patients;
**** Small N Adjustment of Morel, Bokossa, and Neerchal (2003);
proc glimmix method=RSPL empirical=mbn data=small;
Class id;
model outcome (event="1") = treatment month treatment*month/ s
dist=binary ddfm=kenwardroger;
random _residual_ / subject=id type=cs;
run;
Solutions for Fixed Effects
Effect
Standard
Estimate
Error
DF
t Value
Pr > |t|
Intercept
-0.3605
0.7369
15
-0.49
0.6317
treatment
0.05762
1.1209
15
0.05
0.9597
month
-0.1530
0.1197
94
-1.28
0.2043
treatment*month 0.02560
0.1984
94
0.13
0.8976
THAT’s ALL
Download