productplots package • now updated version available on github use

advertisement
productplots package
• now updated version available on github
• use
library(devtools)
install_github(“productplots”)
Presentations
• 8 min for each person on the team
• + 2 min intro/conclusions for each project
Generalized Linear
Mixed Effects Models
Stat 557
Heike Hofmann
Outline
Two case studies
• Beat the Blues: using glmm for normal
response
• Respiratory Syndrom: Logistic Regression
with Random Effects
Beat the Blues
Longitudinal data from a clinical trial of an interactive,
multimedia program known as "Beat the Blues" designed to
deliver cognitive behavioral therapy to depressed patients via
a computer terminal. Patients with depression recruited in
primary care were randomized to either the Beating the
Blues program, or to "Treatment as Usual (TAU)".
Data
variable
drug
description
did the patient take anti-depressant drugs (No or Yes).
length
the length of the current episode of depression, a factor with levels
<6m (less than six months) and >6m (more than six months).
treatment
treatment group, a factor with levels TAU (treatment as usual) and
BtheB (Beat the Blues)
bdi.pre
bdi2m, bdi3m, bdi.5m, bdi.8m
Beck Depression Inventory II before treatment.
Beck Depression Inventory II at x months
Beat the Blues
Beck Depression Inventory (BDI)
Twenty-one multiple choice questions on how the subject has been
feeling in the last week.
Each question is scored on a scale of 0 - 3. Total score is BDI, as an
indicator of the depression's severity.
Higher total scores correspond to more severe depressive symptoms.
For example:
■ (0) I do not feel sad.
■ (1) I feel sad.
■ (2) I am sad all the time and I can't snap out of it.
■ (3) I am so sad or unhappy that I can't stand it.
The cutoffs used: 0–13: minimal depression; 14–19: mild depression;
20–28: moderate depression; and 29–63: severe depression.
Reshape data first
library(ggplot2)
# install.packages("HSAUR2")
data("BtheB", package="HSAUR2")
BtheB$subject <- 1:nrow(BtheB)
btheb <- melt(BtheB, id.vars=c
("drug","length","treatment","subject","bdi.pre"))
table(btheb$variable)
btheb$time <- gsub("bdi.([0-9]*)m","\\1", as.character(btheb$variable))
Treatment has a
positive effect
TAU
qplot(time, value,
geom="boxplot",
BtheB
50
data=btheb,
facets=~treatment)
value
40
30
20
10
0
2
3
5
8
2
time
3
5
8
Variability between and
within individuals
50
40
treatment
30
value
TAU
BtheB
NA
20
10
0
6 88 29 43 67 16 90 94 71 95 30 56 99 33 10 84 7 45 18 78 32 20 96 31 37 38 98 77 89 83 4 15 9 86 76 22 11 61 2 42 8 62 47 75 35 80 40 81 14 19 53 50NA
reorder(factor(subject), value, median)
Idea
• Try out different routines for glmm using a
normal family as response.
Should result in the same answers.
Solving the GLMM
problem
•
for linear models: REML approach side-steps a
complex integral in the maximization of the
likelihood
•
for generalized linear mixed effects we don’t have a
shortcut for ML.
We have to evaluate integral of the form
for a model with fixed effects β, random effects u and link l(μ)
Numerical integration
• approximate data: Penalized quasi-likelihood
• approximate integrand: Laplace method
• approximate integral: adaptive Gaussian
quadrature
• EM algorithm (not particularly fast
compared to the other methods)
PQL approach
• use inverse link h to come back to model
form
• Taylor expansion of h results in
which gives us an updating scheme, using
the LHS as ‘data’
Laplace
•
Interpret integral as a posterior mean; approximate
by Taylor expansion of the log-integrand.
•
•
Hessian h’’ and h*’’ come from data
integration replaced by differentiation: faster and
numerically more stable
Gaussian quadrature
• replace integral by weighted sum
• for normal density this can be expanded in
Hermite polynomials with Gaussian weights
wi
• adaptive quadrature optimizes placement
and number of xi and weights wi
Solutions in R
• lmer in lme4
uses Laplace by default; has parameter
nAGQ - if it is specified, the number of
points for Gaussian quadrature.
• lme in nlme
uses EM algorithm (slow)
• glmmPQL in MASS
• glmmML in glmmML
uses Laplace (default) or Gaussian-Hermite
package lme4
• use glmer to fit BDI with fixed main effects
treatment, time, drug, length, bdi.pre
and random intercept for subjects
• extend the above model to additionally
incorporate a random effect of time for
subjects.
• Use anova to decide between the two
models
package lme4
• cftest in the multcomp package can be
used to get univariate p-values for fixed
effects based on asymptotic normality
• get fitted values and plot against observed.
Other packages
• Install and load the following packages:
• MASS for glmmPQL
• repeated (Google it!) for glmm
Download available from http://
www.commanster.eu/rcode.html
• glmmML for glmmML
glmm with normal
response
• Use functions
glmm, glmmPQL, and
glmmML to fit the same model to the
BtheB data.
Are there any differences?
REML based fit in lme4
!
Simultaneous Tests for General Linear Hypotheses
Fit: lmer(formula = value ~ treatment + time + drug + length + bdi.pre +
(1 | subject), data = btheb, REML = TRUE)
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
(Intercept) == 0
5.57379
2.29942
2.424
0.0154 *
treatmentBtheB == 0 -2.31514
1.71505 -1.350
0.1771
time == 0
-0.70161
0.14694 -4.775 1.80e-06 ***
drugYes == 0
-2.81602
1.77282 -1.588
0.1122
length>6m == 0
0.17906
1.68154
0.106
0.9152
bdi.pre == 0
0.64035
0.07991
8.013 1.11e-15 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Univariate p values reported)
glmmPQL
(Intercept) treatmentBtheB
5.5923934
-2.3290826
time
-0.7047648
drugYes
-2.8249524
length>6m
0.1970795
bdi.pre
0.6396762
time
-0.7016057
drugYes
-2.8160156
length>6m
0.1790583
bdi.pre
0.6403492
lmer
(Intercept) treatmentBtheB
5.5737906
-2.3151384
Based on multiple calls of lme (in nlme), only
slightest difference to glmer results
!
Simultaneous Tests for General Linear Hypotheses
Fit: glmm(value ~ treatment + time + drug + length + bdi.pre,
nest = subject,
data = na.omit(btheb), points = 20)
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
(Intercept) == 0
5.76927
1.00878
5.719 1.07e-08 ***
treatmentBtheB == 0 -2.31742
0.64274 -3.606 0.000312 ***
time == 0
-0.70249
0.13816 -5.084 3.69e-07 ***
drugYes == 0
-2.76536
0.66925 -4.132 3.60e-05 ***
length>6m == 0
0.04278
0.65103
0.066 0.947604
bdi.pre == 0
0.63371
0.03213 19.722 < 2e-16 ***
sd == 0
7.08240
0.30613 23.135 < 2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Univariate p values reported)
different number of points results in
qualitative differences
!
Simultaneous Tests for General Linear Hypotheses
Fit: glmm(value ~ treatment + time + drug + length + bdi.pre,
nest = subject,
data = na.omit(btheb), points = 10)
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
(Intercept) == 0
5.32919
0.98459
5.413 6.21e-08 ***
treatmentBtheB == 0 -0.77265
0.63399 -1.219
0.223
time == 0
-0.70058
0.13467 -5.202 1.97e-07 ***
drugYes == 0
-3.71481
0.65148 -5.702 1.18e-08 ***
length>6m == 0
-0.17861
0.63554 -0.281
0.779
bdi.pre == 0
0.64320
0.03135 20.516 < 2e-16 ***
sd == 0
6.95510
0.28938 24.035 < 2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Univariate p values reported)
different number of points results in
qualitative differences
> fit.glmmML<-glmmML(value~ treatment + time + drug+ length+ bdi.pre, cluster=subject,
family=gaussian, data=btheb)
Error in glmmML.fit(X, Y, weights, cluster.weights, start.coef, start.sigma, :
Unknown family; only 'binomial' and 'poisson' implemented
In addition: Warning message:
In model.matrix.default(mt, mf, contrasts) :
variable 'time' converted to a factor
glmmML only implemented for binomial and
poisson
Respiratory Symptoms
In each of two centers, eligible patients were randomly
assigned to active treatment or placebo. During the
treatment, the respiratory status (categorized poor or
good) was determined at each of four, monthly visits. The
trial recruited 111 participants (54 in the active group, 57 in
the placebo group) and there were no missing data for
either the responses or the covariates. The question of
interest is to assess whether the treatment is effective and
to estimate its effect.
Data
variable
centre
treatment
gender
age
description
the study center, a factor with levels 1 and 2.
the treatment arm, a factor with levels placebo and treatment.
a factor with levels female and male.
the age of the patient.
status
the respiratory status (response variable), a factor with levels poor and good.
month
the month, each patient was examined at months 0, 1, 2, 3 and 4.
subject
the patient ID, a factor with levels 1 to 111.
respiratory
• use glmer to fit status with fixed main
effects treatment, month, age, gender,
centre, status.0 and random intercept for
subjects
• is treatment significant (use cftest)
respiratory
• find model fits for glmmPQL, glmm, and
glmmML and compare
• Where are the biggest differences?
Download