productplots package • now updated version available on github • use library(devtools) install_github(“productplots”) Presentations • 8 min for each person on the team • + 2 min intro/conclusions for each project Generalized Linear Mixed Effects Models Stat 557 Heike Hofmann Outline Two case studies • Beat the Blues: using glmm for normal response • Respiratory Syndrom: Logistic Regression with Random Effects Beat the Blues Longitudinal data from a clinical trial of an interactive, multimedia program known as "Beat the Blues" designed to deliver cognitive behavioral therapy to depressed patients via a computer terminal. Patients with depression recruited in primary care were randomized to either the Beating the Blues program, or to "Treatment as Usual (TAU)". Data variable drug description did the patient take anti-depressant drugs (No or Yes). length the length of the current episode of depression, a factor with levels <6m (less than six months) and >6m (more than six months). treatment treatment group, a factor with levels TAU (treatment as usual) and BtheB (Beat the Blues) bdi.pre bdi2m, bdi3m, bdi.5m, bdi.8m Beck Depression Inventory II before treatment. Beck Depression Inventory II at x months Beat the Blues Beck Depression Inventory (BDI) Twenty-one multiple choice questions on how the subject has been feeling in the last week. Each question is scored on a scale of 0 - 3. Total score is BDI, as an indicator of the depression's severity. Higher total scores correspond to more severe depressive symptoms. For example: ■ (0) I do not feel sad. ■ (1) I feel sad. ■ (2) I am sad all the time and I can't snap out of it. ■ (3) I am so sad or unhappy that I can't stand it. The cutoffs used: 0–13: minimal depression; 14–19: mild depression; 20–28: moderate depression; and 29–63: severe depression. Reshape data first library(ggplot2) # install.packages("HSAUR2") data("BtheB", package="HSAUR2") BtheB$subject <- 1:nrow(BtheB) btheb <- melt(BtheB, id.vars=c ("drug","length","treatment","subject","bdi.pre")) table(btheb$variable) btheb$time <- gsub("bdi.([0-9]*)m","\\1", as.character(btheb$variable)) Treatment has a positive effect TAU qplot(time, value, geom="boxplot", BtheB 50 data=btheb, facets=~treatment) value 40 30 20 10 0 2 3 5 8 2 time 3 5 8 Variability between and within individuals 50 40 treatment 30 value TAU BtheB NA 20 10 0 6 88 29 43 67 16 90 94 71 95 30 56 99 33 10 84 7 45 18 78 32 20 96 31 37 38 98 77 89 83 4 15 9 86 76 22 11 61 2 42 8 62 47 75 35 80 40 81 14 19 53 50NA reorder(factor(subject), value, median) Idea • Try out different routines for glmm using a normal family as response. Should result in the same answers. Solving the GLMM problem • for linear models: REML approach side-steps a complex integral in the maximization of the likelihood • for generalized linear mixed effects we don’t have a shortcut for ML. We have to evaluate integral of the form for a model with fixed effects β, random effects u and link l(μ) Numerical integration • approximate data: Penalized quasi-likelihood • approximate integrand: Laplace method • approximate integral: adaptive Gaussian quadrature • EM algorithm (not particularly fast compared to the other methods) PQL approach • use inverse link h to come back to model form • Taylor expansion of h results in which gives us an updating scheme, using the LHS as ‘data’ Laplace • Interpret integral as a posterior mean; approximate by Taylor expansion of the log-integrand. • • Hessian h’’ and h*’’ come from data integration replaced by differentiation: faster and numerically more stable Gaussian quadrature • replace integral by weighted sum • for normal density this can be expanded in Hermite polynomials with Gaussian weights wi • adaptive quadrature optimizes placement and number of xi and weights wi Solutions in R • lmer in lme4 uses Laplace by default; has parameter nAGQ - if it is specified, the number of points for Gaussian quadrature. • lme in nlme uses EM algorithm (slow) • glmmPQL in MASS • glmmML in glmmML uses Laplace (default) or Gaussian-Hermite package lme4 • use glmer to fit BDI with fixed main effects treatment, time, drug, length, bdi.pre and random intercept for subjects • extend the above model to additionally incorporate a random effect of time for subjects. • Use anova to decide between the two models package lme4 • cftest in the multcomp package can be used to get univariate p-values for fixed effects based on asymptotic normality • get fitted values and plot against observed. Other packages • Install and load the following packages: • MASS for glmmPQL • repeated (Google it!) for glmm Download available from http:// www.commanster.eu/rcode.html • glmmML for glmmML glmm with normal response • Use functions glmm, glmmPQL, and glmmML to fit the same model to the BtheB data. Are there any differences? REML based fit in lme4 ! Simultaneous Tests for General Linear Hypotheses Fit: lmer(formula = value ~ treatment + time + drug + length + bdi.pre + (1 | subject), data = btheb, REML = TRUE) Linear Hypotheses: Estimate Std. Error z value Pr(>|z|) (Intercept) == 0 5.57379 2.29942 2.424 0.0154 * treatmentBtheB == 0 -2.31514 1.71505 -1.350 0.1771 time == 0 -0.70161 0.14694 -4.775 1.80e-06 *** drugYes == 0 -2.81602 1.77282 -1.588 0.1122 length>6m == 0 0.17906 1.68154 0.106 0.9152 bdi.pre == 0 0.64035 0.07991 8.013 1.11e-15 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Univariate p values reported) glmmPQL (Intercept) treatmentBtheB 5.5923934 -2.3290826 time -0.7047648 drugYes -2.8249524 length>6m 0.1970795 bdi.pre 0.6396762 time -0.7016057 drugYes -2.8160156 length>6m 0.1790583 bdi.pre 0.6403492 lmer (Intercept) treatmentBtheB 5.5737906 -2.3151384 Based on multiple calls of lme (in nlme), only slightest difference to glmer results ! Simultaneous Tests for General Linear Hypotheses Fit: glmm(value ~ treatment + time + drug + length + bdi.pre, nest = subject, data = na.omit(btheb), points = 20) Linear Hypotheses: Estimate Std. Error z value Pr(>|z|) (Intercept) == 0 5.76927 1.00878 5.719 1.07e-08 *** treatmentBtheB == 0 -2.31742 0.64274 -3.606 0.000312 *** time == 0 -0.70249 0.13816 -5.084 3.69e-07 *** drugYes == 0 -2.76536 0.66925 -4.132 3.60e-05 *** length>6m == 0 0.04278 0.65103 0.066 0.947604 bdi.pre == 0 0.63371 0.03213 19.722 < 2e-16 *** sd == 0 7.08240 0.30613 23.135 < 2e-16 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Univariate p values reported) different number of points results in qualitative differences ! Simultaneous Tests for General Linear Hypotheses Fit: glmm(value ~ treatment + time + drug + length + bdi.pre, nest = subject, data = na.omit(btheb), points = 10) Linear Hypotheses: Estimate Std. Error z value Pr(>|z|) (Intercept) == 0 5.32919 0.98459 5.413 6.21e-08 *** treatmentBtheB == 0 -0.77265 0.63399 -1.219 0.223 time == 0 -0.70058 0.13467 -5.202 1.97e-07 *** drugYes == 0 -3.71481 0.65148 -5.702 1.18e-08 *** length>6m == 0 -0.17861 0.63554 -0.281 0.779 bdi.pre == 0 0.64320 0.03135 20.516 < 2e-16 *** sd == 0 6.95510 0.28938 24.035 < 2e-16 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Univariate p values reported) different number of points results in qualitative differences > fit.glmmML<-glmmML(value~ treatment + time + drug+ length+ bdi.pre, cluster=subject, family=gaussian, data=btheb) Error in glmmML.fit(X, Y, weights, cluster.weights, start.coef, start.sigma, : Unknown family; only 'binomial' and 'poisson' implemented In addition: Warning message: In model.matrix.default(mt, mf, contrasts) : variable 'time' converted to a factor glmmML only implemented for binomial and poisson Respiratory Symptoms In each of two centers, eligible patients were randomly assigned to active treatment or placebo. During the treatment, the respiratory status (categorized poor or good) was determined at each of four, monthly visits. The trial recruited 111 participants (54 in the active group, 57 in the placebo group) and there were no missing data for either the responses or the covariates. The question of interest is to assess whether the treatment is effective and to estimate its effect. Data variable centre treatment gender age description the study center, a factor with levels 1 and 2. the treatment arm, a factor with levels placebo and treatment. a factor with levels female and male. the age of the patient. status the respiratory status (response variable), a factor with levels poor and good. month the month, each patient was examined at months 0, 1, 2, 3 and 4. subject the patient ID, a factor with levels 1 to 111. respiratory • use glmer to fit status with fixed main effects treatment, month, age, gender, centre, status.0 and random intercept for subjects • is treatment significant (use cftest) respiratory • find model fits for glmmPQL, glmm, and glmmML and compare • Where are the biggest differences?