Analysis of Binomial Response Data with Generalized Linear Mixed Effects Models Andrew Lithio Program of Study Committee: Dr. Dan Nettleton, Major Professor Dr. Jarad Niemi Dr. Vivekananda Roy May 6, 2013 Chapter 1 Introduction Generalized Linear Mixed Models (GLMM) are a flexible extension of Generalized Linear Models which have the ability to model random effects, correlations between observations, and overdispersion in observed data. GLMMs are often well suited for analysis of data in biostatistics, where response data are commonly modeled as Poisson or binomial random variables, and experimental designs may contain blocks which induce correlation in observed data [Bolker et al., 2008]. However, even when random effects are assumed to be multivariate normal, the full likelihood is often an intractable integral of such high dimension that numerical integration is not feasible. To estimate parameters of these models, it is common to either compute pseudo-data that approximately follows a Linear Mixed Model, or to approximate the likelihood directly. Two popular computational routines used for estimating parameters of GLMMs are PROC GLIMMIX of SAS, and glmer of the lme4 R package by Douglas Bates. GLIMMIX defaults to utilizing the former method, while glmer uses the latter. As a third possibility, Bayesian methods avoid integrating the likelihood altogether by sampling from the posterior density. Throughout this paper we consider data from a functional assay experiment on the adaptation of flatworms to hypoxia, which is the deprivation of oxygen. Researchers used mutated worms where the expression of one of 13 different genes was inhibited in each worm, with the goal of identifying which genes help worms adapt to hypoxia. For each gene in the experiment, 20 adults laid eggs that were put into a hypoxia chamber, while the eggs of 20 other adults were kept in a normoxia state. The total number of eggs put into the hypoxia chamber and kept in normoxia were 1 recorded for each gene. For each combination of gene and treatment there were well over 60 eggs, with a median of 155.5. After 72 hours, the total number of worms that had hatched from eggs and grown to adulthood were counted and recorded for each state. In this paper, we will refer to the proportion of eggs that hatch and grow to adulthood as simply the proportion hatched or the hatch rate. This process was carried out for all genes of interest in a single batch, then repeated two more times on different days. Researchers also indicated that we should expect to observe overdispersion in the data due to suspicions of nonuniform genetic backgrounds of the specimens. When wild type worms–worms with no gene expression inhibited–were used, 100% of eggs hatched and grew to adulthood when kept in either normoxia or hypoxia. We wish to identify genes whose suppression lead to significantly lower hatch rates under hypoxia than under normoxia. As an example, data for two genes are given in Table 1. Table 1: Observed Data for mmcm-1 and Gene Treatment Day Total Eggs mmcm-1 Normoxia 1 95 1 111 mmcm-1 Hypoxia mmcm-1 Normoxia 2 155 2 125 mmcm-1 Hypoxia mmcm-1 Normoxia 3 185 mmcm-1 Hypoxia 3 368 Normoxia 1 127 hif-1 Hypoxia 1 159 hif-1 hif-1 Normoxia 2 80 hif-1 Hypoxia 2 211 Normoxia 3 95 hif-1 hif-1 Hypoxia 3 116 hif-1 Eggs Hatched 71 67 90 69 126 196 127 22 80 44 95 24 Hatch Rate 0.747 0.604 0.581 0.552 0.681 0.533 1.00 0.139 1.00 0.209 1.00 0.207 A possible model for data from a single gene is as follows. Let j index treatment with j = 1 for normoxia and j = 2 for hypoxia, and let k index days (k = 1, 2, 3). Let Hjk and njk denote eggs hatched and total eggs, respectively, for the j th treatment and k th day. Assume Hjk ∼ Binomial(njk , πjk ) and let Yjk = 2 Hjk . njk Furthermore, suppose logit(πjk ) = ηjk , and that η = (η11 , η21 , η12 , η22 , η13 , η23 )T = Xβ + Zγ, where X= 1 1 1 1 1 1 0 1 0 1 0 1 Z= β= β0 β1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 r γ= = e 0 0 1 0 0 0 r1 r2 r3 e11 e21 e12 e22 e13 e23 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 r ∼ N (0, σr2 I3 ) and e ∼ N (0, σe2 I6 ). r The vector of random effects, γ, is partitioned into , where r has 3 e elements representing block effects and e, independent of r, has 6 elements, one for each binomial observation, to account for overdispersion. The fixed β0 effects are contained in β = , and consist of one intercept term and one β1 term representing the effect of hypoxia. We interpret β0 as the expected logit of the probability of hatching under normoxia, and β0 + β1 as the expected logit of the probability under hypoxia. A model for the full dataset may be parameterized similarly, with the 13 genes indexed by i. Fixed effects in the full model consist of one intercept term for each gene and another term representing the effect of hypoxia for 3 that gene, for a total of 26 elements in β. The specification of r and e is analogous, where we assume a unique eijk for each observation and a unique rik for each combination of gene and day, giving e 78 elements and r 39 elements, for a total of 117 elements in γ. We are interested in the performance of the pseudo-data, likelihood approximation, and Bayesian methods of estimation on data structured similarly to ours–data with binomial responses, random block effects and overdispersion, with a relatively high number of trials for each binomial observation. We will also investigate whether there is a benefit to building a model that includes all 13 genes, or if separate models for each gene perform just as well. In Section 2.4 we introduce the concept of separation, which is an additional complication in estimating the parameters of models with binomial responses. The remainder of the paper is structured as follows: Chapter 2 details the algorithms used by GLIMMIX and glmer, as well as specifies two possible hierarchical models for use with Bayesian inferential methods. Section 2.4 defines separation, its effects on our estimation techniques, and possible ways to negate those effects. Chapter 3 reports the results of three simulation studies and applies the three methods to the observed data. We conclude with a discussion of the results in Chapter 4. 4 Chapter 2 Methods 2.1 Restricted Pseudo Likelihood The first approach we will discuss is based on pseudo likelihood, and is also called a linearization method. The restricted pseudo likelihood approach (RPL) uses a Taylor series approximation to compute pseudo-data which approximately follows a linear mixed model, whose parameters are then estimated in the traditional manner. The following discussion is founded upon Wolfinger and O’Connell [Wolfinger and O’Connell, 1993] and the GLIMMIX manual. In each of the following sections we will focus on the case of binomial response data, making note when we are able to generalize. 2.1.1 Notation and Model Statement We begin with an n × 1 vector of observed data, Y = (Y1 , ..., Yn )T , with E[Y |γ] = g −1 (Xβ + Zγ) = π = (π1 , ..., πn )T where γ is an q × 1 vector of random effects, g is a differentiable monotonic component wise function taken throughout this paper to be the logit function, X is an n × m matrix, and Z is an n × q design matrix. Here we take Y to be a random vector of observed proportions of binomial random variables instead of counts, and for simplicity, use a single subscript i to indicate the ith element of Y or π. Furthermore, γ ∼ N (0, G), where each element of G is a known function function of a vector of variance components, and Var[Y |γ] = A1/2 RA1/2 5 2.1 Restricted Pseudo Likelihood where A is a diagonal matrix of the conditional variance functions, in our case πi (1 − πi ) . The R matrix can used to induce nonwith ith diagonal entry ni diagonal correlation or variance structures. We will set R to the identity, but it is often taken to be some unknown overdispersion parameter, φ, times the πi (1 − πi ) identity. In our case, Var[Y |γ] is a diagonal matrix with entries of . ni To simplify notation where possible, we set η = Xβ + Zγ so that g −1 (η) = E[Y |γ] = π. 2.1.2 Taylor Series Approximation We begin with a first-order Taylor series approximation about initial estimates β̂ and γ̂ ˆ ˆ g −1 (η) ≈ g −1 (η̂) + ∆X(β − β̂) + ∆Z(γ − γ̂) where ˆ = ∆ ∂g −1 (η) ∂η β̂,γ̂ which is a diagonal matrix of derivatives of the mean evaluated at the initial eη̂i . We estimates. For the logit function, the diagonal terms are (1 + eη̂i )2 rearrange the Taylor series approximation to get ˆ −1 (π − g −1 (η̂)) + X β̂ + Z γ̂ ≈ Xβ + Zγ ∆ 2.1.3 (2.1) Pseudo-Model Now define the pseudo-data ˆ −1 (Y − g −1 (η̂)) + X β̂ + Z γ̂ P =∆ ˆ −1 fixed, the expected value of P is the left side of of With β̂, γ̂, η̂, and ∆ (2.1), and ˆ −1 . d |γ] = ∆ ˆ −1 Â1/2 RÂ1/2 ∆ Var[P In our case, this simplifies to an n × n diagonal matrix with terms of 2 (1 + eη̂i )2 π̂i (1 − π̂i ) . We then take η̂ ei ni P = Xβ + Zγ + 6 2.1 Restricted Pseudo Likelihood as a linear mixed model with pseudo-response P , fixed effects β and random effects γ as previously defined, and Var[] =Var[P |γ]. We assume is normally distributed with mean 0. Additionally, define θ as the q × 1 vector of unknowns in G and R. We can write the marginal variance of P as ˆ −1 A1/2 RA1/2 ∆ ˆ −1 V (θ) = ZGZ T + ∆ 2.1.4 Optimization The restricted log pseudo-likelihood of our linear mixed model is known to be, up to an additive constant, 1 1 1 l(θ, p) = − log|V (θ)| − rT V (θ)−1 r − log|X T V (θ)X| 2 2 2 where r = p − X(X T V −1 X)− X T V −1 p. Note that β has been profiled out. The elements of θ are estimated using numerical optimization methods, then fixed and random effects are estimated as usual for linear mixed models, yielding β̂ = (X T V (θ̂)−1 X)− X T V (θ̂)−1 p γ̂ = ĜZ T V (θ̂)−1 r̂ Using these estimates, the pseudo-data are recomputed and parameters are re-estimated, continuing until the estimates converge. 2.1.5 Standard Errors We now compute an estimate of the variance-covariance matrix of β̂. Denote d |π̂] = S. Then the mixed model estimated conditional variability of P , Var[P equations are T −1 T −1 X S X X T S −1 Z X S p β̂ = , T −1 T −1 −1 Z T S −1 p Z S X Z S Z + G(θ̂) γ̂ and X T S −1 X X T S −1 Z Z T S −1 X Z T S −1 Z + G(θ̂)−1 Ω̂ Ω̂X T V (θ̂)−1 ZG(θ̂) −G(θ̂)Z T V (θ̂)−1 X Ω̂ M + G(θ̂)Z T V (θ̂)−1 X Ω̂X T V (θ̂)−1 ZG(θ̂) C= = − 7 2.2 Laplace Approximation is known to provide an estimate of the covariance matrix of the [β̂ T , γ̂ T −γ T ]T , where Ω̂ = (X T V (θ̂)−1 X)− and M = (Z T S −1 Z + G(θ̂)−1 )−1 . The standard errors of the elements of β̂ are the square roots of the diagonal terms of Ω̂. 2.2 Laplace Approximation Our second approach to estimating GLMMs uses LaPlace’s integral approximation (LA). It was applied to GLMMs by Breslow and Clayton [Breslow and Clayton, 1993]. It is the default method of the glmer function in Douglas Bates’s lme4 package of R for estimating GLMMs. Here we closely follow Breslow and Clayton, Schelldorfer [Schelldorfer et al., 2013], and Bates [Bates, 2012]. 2.2.1 Likelihood Definition We maintain the same notation as in the previous section, with the exception of introducing a linear transformation of γ, γ = Λθ u, where u ∼ N (0, Iq ) so that γ ∼ N (0, Λθ ΛTθ ). Our full likelihood can be written as Z L(β, θ|y) = [p(y|β, Λθ , u)f (u|Λθ )] du, where p(y|β, Λθ , u) is the conditional pmf of y and f (u|Λθ ) is the density of u. As stated before, in our case this integral has no closed-form solution, but we will use LaPlace’s method to approximate this integral directly. Since f (u|Λθ ) is multivariate normal, we substitute the density to get Z 1 T L(β, θ|y) ∝ exp log(p(y|β, Λθ , u) − u u du 2 Z = exp(−S(u))du, 8 2.2 Laplace Approximation where S(u) = − log(p(y|β, Λθ , u) + 21 uT u. LaPlace’s method will utilize a second order Taylor series approximation about ũ = argmaxu [−S(u)], so we first show how to find ũ given β and θ using a penalized iteratively reweighted least squares algorithm (PIRLS). 2.2.2 PIRLS Our goal is to minimize S(u), which can be done using the Newton-Raphson algorithm. We first find the derivatives of S(u), S 0 (u) = −(ZΛθ )T B(y − π) + u and S 00 (u) = (ZΛθ )T W (ZΛθ ) + Iq where W is a diagonal matrix with elements (v(πi )g 0 (πi )2 )−1 , B is a diagonal matrix with elements (v(πi )g 0 (πi ))−1 , and v(πi ) is the conditional variance function. Our updates of u can then be found by solving S 00 (u(j) )u(j+1) = S 00 (u(j) )u(j) − S 0 (u(j) ) (ZΛθ )T W (j) (ZΛθ ) + Iq u(j+1) = (ZΛθ )T W (j) (ZΛθ ) + Iq u(j) + (ZΛθ )T B (j) (y − π (j) ) − u(j) (ZΛθ )T W (j) (ZΛθ ) + Iq u(j+1) = (ZΛθ )T W (j) z (j) (j) where z (j) = (ZΛθ )u(j) + W −1 B (j) (y − π (j) ). We continue iterations until convergence, usually determined by relative change in the linear predictor, η. 2.2.3 Likelihood Approximation LaPlace’s approximation uses the second order Taylor series approximation about ũ found from PIRLS to form the kernel of a normal distribution in 9 2.3 Bayesian Methods order to carry out the integration. Z L(β, θ|y) ∝ exp(−S(u))du Z 1 ≈ exp(−S(ũ) − (u − ũ)T S 00 (ũ)(u − ũ))du 2 Z 1 = exp(−S(ũ)) exp(− (u − ũ)T S 00 (ũ)(u − ũ)du 2 −1 ∝ exp(−S(ũ))|L| where LLT = S 00 (ũ). This expression is then maximized with respect to β and θ using numerical methods. 2.2.4 Standard Errors The approximate covariance matrix of β̂ is calculated in the same manner as for RPL. Following Section 2.1.5, we calculate d β̂) = Ω̂ = (X T V (θ̂)−1 X)− , Var( where ˆ −1 Â∆ ˆ −1 V (θ̂) = ZG(θ̂)Z T + ∆ ˆ is the diagonal matrix of derivatives of the mean evaluated at β̂ where ∆ and γ̂, and  is the diagonal matrix of the conditional variance functions evaluated at convergence. 2.3 Bayesian Methods As an alternative to RPL and LA, we introduce two models which allow for the application of Bayesian methods. The first is an adapted beta-binomial model intended for analysis of data from a single gene, while the second involves a multi-gene hierarchical model which makes use of the entire data set. We will use JAGS [Plummer, 2011] to approximate the posteriors for each of these models. Initial values will be chosen to be dispersed relative to the priors and we will use the Gelman-Rubin statistic to check for a lack of convergence. 10 2.3 Bayesian Methods 2.3.1 Adapted Beta-Binomial Model Let j index treatment with j = 1 for normoxia and j = 2 for hypoxia, and let k index days (k = 1, 2, 3). Let Hjk and njk denote eggs hatched and total eggs, respectively, for the j th treatment and k th day. Assume Hjk ∼ Binomial(njk , πjk ) and let Yjk = Hjk . njk Furthermore, suppose logit(πjk ) = logit(pj ) + rk + ejk ind rk ∼ N (0, σr2 ) ind ejk ∼ N (0, σe2 ), where pj represents E[Yjk |rk = 0, ejk = 0]. We assign the following priors: ind pj ∼ Beta(1, 1), σr ∼ Unif(0, 1), and σe ∼ Unif(0, 1). The use of uniform priors on the variance components is recommended by Gelman [Gelman et al., 2008]. After performing an exploratory analysis of the data we do not expect σr or σe to exceed one, and a repeated analysis using Unif(0, 10) priors instead reveals no change. The prior on pj is an attempt to be relatively noninformative, but it can also be thought of as similar to adding 1 success and 1 failure to each binomial observation. 2.3.2 Hierarchical Model Let i index gene (i = 1, ..., 13), j index treatment with j = 1 for normoxia and j = 2 for hypoxia, and let k index days (k = 1, 2, 3). Let Hijk and 11 2.3 Bayesian Methods nijk denote eggs hatched and total eggs, respectively, for the ith gene, j th treatment, and k th day. Assume Hijk ∼ Binomial(nijk , πijk ), and let Yijk = Hijk . nijk Furthermore, suppose logit(πijk ) = ηijk , and ηijk =µi + (−1)j τi + rik + eijk , ind rik ∼ N (0, σr2 ), ind eijk ∼ N (0, σe2 ), with priors ind µi ∼ N (γ, ψ), γ ∼ N (0, 1.52 ), ψ ∼ IG(3, 2), ind τi ∼ N (0, ξ), ξ ∼ IG(3, 2), σr ∼Unif(0, 1), and σe ∼Unif(0, 1). As in the previous model, the use of uniform priors on the variance components follows the recommendation of Gelman [Gelman et al., 2008]. As before, repeated analysis using Unif(0, 10) priors instead reveals no change. Hyperparameters for γ, ψ, and ξ were chosen to induce reasonable priors on 12 2.4 Separation the inverse logit of ηijk , with the goal of having nearly uniform weight on the (0, 1) interval. Here, µi − τi represents the expected logit of the hatch probability of gene i under normoxia, with µi + τi representing the same quantity under hypoxia. Then µi represents a center for gene i, with a treatment difference of 2τi . The parameter γ acts as the location for the distribution of µi , and ψ controls the scale of the µi , indicating how closely the data from different genes agree. Likewise, ξ is the scale for the treatment effects, indicating if there is a wide distribution of treatment effects or if they all take similar values. 2.4 Separation Data with binomial responses with particularly high or low probabilities can exhibit quasi-complete separation, which is where a predictor or a combination of predictor variables can perfectly separate–or determine–an outcome. Seven of the thirteen genes in our dataset display this property, where in each gene we observe 100% hatch rates under normoxia. Intuitively, this gives us cause for concern when recalling our use of the logit link function and likelihood based methods. In fact, both frequentist methods discussed return unstable parameter estimates with extraordinarily large standard errors, although the GLIMMIX simply returns an error message indicating a lack of convergence. In the remainder of this section, we briefly describe why this occurs, and explore possible solutions. 2.4.1 Breakdown Under Separation Recall that the variance of our observed binomial proportions conditional on π(1 − π) . But as π → 1 (or π → 0), π(1 − π) → 0. the random effects is n In the RPL algorithm this leads to very small values in our  matrix, and consequently inflates the V (θ̂)−1 matrix, inflating both our estimates of β and the standard errors. Similarly, in the LA algorithm the convergence of π(1−π) towards 0 begins to dominate the weighting matrix W , again leading to unstable estimates. 13 2.4 Separation 2.4.2 Analysis Under Separation While parameter estimates and standard errors from LA are unreliable, we might still use the maximized likelihood to test the significance of a predictor or set of predictors via a likelihood ratio test (this is not valid under RPL because it uses a likelihood for pseudo-data instead of an approximation to the likelihood of the data). However, there are no readily available methods for further inference. Firth’s penalized likelihood approach can be applied to logistic regression with fixed effects [Heinze and Schemper, 2002], but an extension to mixed effects has not been derived. One ad hoc solution for analysis under separation is to add one success and one failure to the binomial observations. This process pulls the “observed” probabilities away from 0 or 1 and allows analysis to proceed as usual, while hopefully not adversely affecting the resulting inferences. Another alternative is to take a Bayesian approach. The priors put on the parameters of our model can naturally restrict our estimation to a reasonable range and allow inference without artificially altering the data. In the following chapter we investigate the performance of Bayesian methods and the add one success and one failure technique. 14 Chapter 3 Simulation and Application In this chapter, we apply the frequentist methods discussed in Chapter 2 to the model discussed in Chapter 1, as well as use Bayesian methods to estimate the parameters of the adapted beta-binomial model and hierarchical model specified in Section 2.3. Note that our parameterization of fixed effects for the frequentist methods is different than that of the models for Bayesian analysis defined in Section 2.3. The models for Bayesian analysis are parameterized to ensure that the induced priors on the normoxia and hypoxia probabilities are the same. The results produced from the Bayesian analysis will be interpreted in terms of the frequentist parameterization above to facilitate comparison across methods. In the remainder of this chapter we report the results of simulation studies comparing the performance of the above methods for the single gene models and the full models. We then apply these methods to the observed data for select individual genes as well as the full data set and compare the resulting inferences. Special attention will be paid to analysis of data displaying separation. Throughout the chapter we will highlight two genes that are representative of the dataset- the mmcm-1 gene and the hif-1 gene. The observed hatch proportions of each of the genes are reported in Table 2. Table 2: Observed Data Gene Normoxia Hypoxia Effect mmcm-1 0.670 0.5627 -0.1093 hif-1 1.000 0.185 -0.815 15 3.1 Simulation Study 3.1 Simulation Study We report the results of three simulation studies below. The first two entail data from a single gene, with the second study simulating genes with separated data. For the second study we add one success and one failure to each observation for the RPL and LA methods before estimation to allow for inference. The third study simulates 13 genes and estimates full models, where some genes display separation and some do not. Again, the genes with separation will have one success and one failure added before the RPL and LA analyses are performed. Each simulation consists of 1000 independent repetitions. In each simulation we set σr = 0.5 and σe = 0.3. Results are reported on the logit scale. We will interpret β0 as the estimated log odds of hatching under normoxia and exp(β1 ) as the multiplicative change in odds associated with hypoxia. Following the default reports of glmer and GLIMMIX, we will use Wald type intervals for LA and t intervals for RPL. We use the posterior mean as point estimates and equal tailed credible intervals for our Bayesian analysis. 3.1.1 Single Gene We first compare each method’s performance in the absence of separation. We set β0 , the log odds of hatching under normoxia, to 1, which corresponds to a probability of 0.731. We then vary β1 , the treatment effect, ranging from 0 to −2, which corresponds to ranging from no effect to decreasing the odds by 86.5%. Boxplots of the parameter estimates are shown in Figures 1-4, and mean squared errors are given in Table 3. Boxplots of the length of the confidence/credible intervals for the fixed effects are drawn in Figure 5 and 6, and Table 4 lists the coverage of those intervals. 16 3.1 Simulation Study 17 3.1 Simulation Study Table 3: MSE of Parameter Estimates in Simulation 1 Method Parameter True Value of β1 RPL LA Bayesian β0 0 0.1224 0.1228 0.1194 β0 -0.5 0.1316 0.1321 0.1197 β0 -1 0.1176 0.1179 0.1036 β0 -2 0.1279 0.1284 0.0970 β1 0 0.0807 0.0811 0.0720 β1 -0.5 0.0839 0.0841 0.0751 β1 -1 0.0806 0.0808 0.0791 β1 -2 0.0792 0.0795 0.0864 σr 0 0.0980 0.0783 0.0036 σr -0.5 0.0984 0.0798 0.0041 σr -1 0.0966 0.0774 0.0043 σr -2 0.0978 0.0798 0.0042 σe 0 0.0399 0.0639 0.0072 σe -0.5 0.0422 0.0645 0.0071 σe -1 0.0369 0.0591 0.0062 σe -2 0.0388 0.0620 0.0074 18 3.1 Simulation Study Table 4: Coverage Rates for 95% Intervals in Simulation 1 Method Parameter True Value of β1 RPL LA Bayesian β0 0 0.990 0.807 1.000 β0 -0.5 0.990 0.799 1.000 β0 -1 0.984 0.823 0.998 β0 -2 0.991 0.801 0.999 β1 0 0.992 0.812 1.000 β1 -0.5 0.987 0.804 1.000 β1 -1 0.992 0.797 1.000 β1 -2 0.994 0.840 1.000 It is apparent from the boxplots in Figures 1-4 that LA and RPL return very similar estimates for the fixed effects, while the Bayesian estimates of the fixed effects are shrunken towards 0. However, since the Bayesian estimates are less variable, the mean squared errors are comparable and often better than those of the frequentist methods. Furthermore, Figures 5 and 6 show that the 95% intervals produced in the Bayesian analyses and by RPL are much wider than those from LA, although the coverage rates of the latter are much lower than desired and the coverage rates of the Bayesian analyses and RPL are higher than desired. On the other hand, if interest is focused on estimating the variance parameters, the Bayesian estimates are less variable, and tend to be closer to the true generating values than those produced by LA and RPL. Note that variance components estimated by LA tend to be a little higher and more variable than those from RPL, but confidence intervals for fixed effects also tend to be shorter. Table A.1 in the appendix lists the probabilities of rejecting the null hypothesis β1 = 0 at the 0.05 level. It is clear that in this scenario LA gives us more power, although also a low coverage rate and an unacceptably high type I error rate. The second simulation has a similar structure, but we set the probability of hatching under normoxia at 1, and again vary only the treatment effects. The values of σr and σe remain the same. Estimates of the log odds of hatching under normoxia all exceed 5, which corresponds to a probability of over 0.99 and renders the summaries given in the first simulation rather uninformative. Figure 7 plots the estimates of β0 and β1 , but the true values of β0 and β1 are both undefined since the probability of hatching under normoxia is set at 1. Instead, we will report the estimated log odds of hatching under hypoxia as ρ and the coverage of its 95% intervals, but since 19 3.1 Simulation Study we are still interested in testing β1 = 0, we will report the power and lengths of 95% intervals corresponding to β1 . We generate data using ρ = 2.2 and ρ = 0, which correspond to probabilities of 0.9 and 0.5 of hatching under hypoxia. Table A.2 gives the power of testing β1 = 0. Estimates are plotted in Figures 7-11, with interval lengths drawn in Figure 12. Mean squared errors and are recorded in Table 5 and coverage rates of 95% intervals can be found in Table 6. 20 3.1 Simulation Study Table 5: MSE of Parameter Estimates in Simulation 2 Method Parameter True Value of ρ RPL LA Bayesian ρ 2.2 0.0250 0.1220 0.3433 ρ 0 0.0103 0.7095 0.4728 σr 2.2 0.2224 0.1693 0.0005 σr 0 0.2272 0.1528 0.0053 σe 2.2 0.0760 0.1081 0.0274 σe 0 0.0782 0.2508 0.0418 Table 6: Coverage Rates for 95% Intervals in Simulation 2 Method Parameter True Value of ρ RPL LA Bayesian ρ 2.2 1.000 0.776. 0.986 ρ 0 1.000 0.757 0.960 We observe some parallels to results from the first simulation, but the similarities between LA and RPL begin to break down. Notably, RPL allows accurate estimation of ρ, with substantially lower MSE, but with wider confidence intervals than LA. We also see that LA estimates the block variance more closely than RPL, but its estimates of the subject level variation are much more variable. LA also estimates ρ poorly, especially when the effect is larger, and displays a much lower coverage rate than desired. Compara21 3.1 Simulation Study tively, in the presence of separation, the adapted beta-binomial model still performs well in estimating variance components. Estimates of ρ from the adapted beta-binomial model are not as accurate as those from RPL, and while our credible intervals for β1 are still wide, the coverage rates for ρ approach 0.95. Table A.2 in the Appendix lists the powers for testing β1 = 0. Interestingly, our power for ρ = 2.2 is slightly higher than for ρ = 0, due to the presence of the outliers plotted in Figure 12. However, we maintain powers near 1 for all methods and each value of ρ. 3.1.2 Full Model For our simulation study involving 13 genes, we again set σr = 0.5 and σe = 0.3. Three genes were set to display separation with varying magnitudes of treatment effect, the 10 other genes were set to have varying means and treatment effects. The values used to simulate data are reported in Table 7, but we will focus only on the first 3 genes, which are representative of our dataset. We again report MSE in Table 8 and coverage rates in Table 9. We report point estimates ρ for gene 1 as we did in the second simulation, but report lengths of 95% intervals for β1 in Figure 18 as well as the probability of rejecting the null hypothesis β1 = 0 in Table A.3. Table 7: True Values Probability Gene Normoxia Effect 1 1 −0.8 2 0.95 0 0.6 −0.1 3 4 1 −0.01 5 1 −0.10 0.90 −0.05 6 7 0.85 −0.00 8 0.80 −0.01 9 0.75 −0.00 10 0.70 −0.05 11 0.65 −0.00 12 0.55 −0.00 13 0.50 −0.05 Logit Normoxia Effect ∞ −∞ 2.94 0 0.41 −0.41 ∞ −∞ ∞ −∞ 2.20 −0.46 1.73 0 1.39 −0.06 1.10 0 0.85 −0.23 0.62 0 0.20 0 0 −0.20 22 3.1 Simulation Study 23 3.1 Simulation Study Table 8: MSE of Parameter Estimates in Simulation 3 Method Parameter True Value RPL LA Hierarchy ρ Gene 1 −1.39 0.1235 0.1244 0.1356 β0 Gene 2 2.94 0.1602 0.1644 0.2046 β1 Gene 2 0 0.1606 0.1648 0.1611 β0 Gene 3 0.41 0.1095 0.1098 0.0870 β1 Gene 3 −0.41 0.0763 0.0765 0.0817 σr 0.5 0.0090 0.0183 0.0115 σe 0.3 0.0052 0.0144 0.0068 Table 9: Coverage Rates in Simulation 3 Method Parameter RPL LA Hierarchy β0 Gene 2 0.953 0.900 0.954 β1 Gene 2 0.955 0.917 0.944 β0 Gene 3 0.960 0.878 0.966 β1 Gene 3 0.950 0.849 0.967 Interestingly, the hierarchical model performs very closely to RPL and LA in estimating fixed effects, but only maintains the advantage of closer estimation of the variance components compared to LA. We observe LA 24 3.2 Single Gene Results producing lower estimates of variance components than RPL and the hierarchical model, as well as shorter confidence intervals for fixed effects and lower coverage. Looking at the rejection rates for testing β1 = 0 for each gene in Table A.3 reveals that our Type I error rates have decreased to much more acceptable levels, although is still substantially higher than desired for LA, and we maintain powers of 1 for gene 1. The powers for gene 3 do differ between method, but due to the low coverage rate of LA, we would be better served deferring to RPL or the hierarchical model. Fitting a model including all the genes results in coverage rates of RPL and the hierarchical model in line, or very close to meeting, our specified 0.95 rate. We are also able to build a hierarchical model for analysis with Bayesian methods that estimates fixed effects with the essentially the same precision as frequentist methods, though with more accurate estimation of variance components than with LA and not requiring ad hoc adjustment of the data. While the fitting of fixed effects is not improved in the frequentist methods, we do see improvements in the estimation of variance components, especially using RPL. 3.2 Single Gene Results We now report the estimates from the pertinent methods for the mmcm-1 gene and hif-1 gene. Corresponding 95% intervals are below point estimates. Table 10: Estimates for mmcm-1 Parameter β0 β1 σr σe RPL 0.692 (−0.002, 1.386) −0.460 (−1.217, 0.296) 0.165 0.136 Method LA 0.683 (0.434, 0.931) −0.468 (−0.739, −0.197) 0.120 0.049 25 Bayesian 0.615 (−0.516, 1.731) −0.450, (−1.603, 0.703) 0.414 0.330 3.3 Full Model Results Table 11: Estimates for hif-1 Parameter β0 β1 σr σe RPL 4.621 (2.093, 7.149) −6.082 (−8.690, −3.475) 0 0.158 Method LA 4.623 (3.483, 5.763) −6.081 (−7.245, −4.917) 0.000 0.057 Bayesian 4.318 (2.746, 5.982) −5.754 (−7.451, −4.114) 0.430 0.413 These results mirror what we observed in our simulation studies. The Bayesian method estimates higher variance components than LA and RPL do, but its fixed effects estimates are not in close agreement with the other two methods either. The effect of hypoxia is significant at the 0.05 level in all tests for hif-1, but only the test from LA is significant for mmcm-1. However, if we were interested in estimating and testing the fixed effects, we would be better served estimating the parameters of a full model, since simulation 1 showed that in this situation tests using LA have a very high type I error rate and tests using RPL or Bayesian methods have very low power. 3.3 Full Model Results We now report results from the full dataset model. To facilitate comparisons, Table 12 only includes variance components and parameters from mmcm-1 and hif-1. 26 3.3 Full Model Results Table 12: Estimates of the Full Model Method Parameter RPL LA β0 mmcm 0.692 0.693 (0.235, 1.150) (0.348, 1.039) β1 mmcm −0.450 −0.448 (−1.088, 0.185) (−0.889, −0.007) β0 hif 4.618 4.635 (3.408, 5.823) (3.454, 5.816) β1 hif −6.087 −6.106 (−7.383, −4.790) (−7.327, −4.884) σr 0.046 0.115 σe 0.358 0.218 Hierarchical 0.753 (0.030, 1.508) −0.457 (−1.342, 0.413) 7.991 (4.965, 13.110) −9.452 (−14.615, −6.364) 0.276 0.494 Note that the RPL and LA estimates of the fixed effects are nearly identical to those from the single gene models. Although the confidence intervals do differ, they are not uniformly wider or shorter. As we expected from the simulations, the hierarchical model performs comparably to the other methods. Note that the treatment effect of gene mmcm is now only significant using the LA test, which simulations showed had a high type I error rate. We are therefore unable to declare a significant effect of hypoxia for the mmcm gene. The parameter estimates produced by the hierarchical model for hif-1 are not in line with the estimates of the other methods, which is not completely unexpected, since we observed Bayesian methods producing more extreme estimates of β0 and β1 in simulation 2. However, the 95% credible interval produced by the hierarchical model for β1 of the hif-1 gene still clearly indicates there is a statistically significant effect. Estimates of variance components from RPL and LA have increased from the single gene case, but are still fairly distinct from the estimates produced by Bayesian methods. 27 Chapter 4 Discussion Our goal was to evaluate three different common methods for fitting GLMMs, while addressing how to fit models when the data displays separation. Furthermore, we wished to determine how the dataset in question should be modeled, either on a gene-by-gene case or by building a full model using all of the data available. Simulation studies indicate that the RPL and LA methods for fitting GLMMs perform nearly identically in estimating fixed effects, with the hierarchical model proposed in Chapter 2 performing similarly. However, if there is interest in estimating variance components, Bayesian methods may provide more accurate estimates in the single gene case, with RPL performing similarly with the full model. We fitted models to data with separation both by proposing Bayesian constructions and utilizing an ad hoc solution of adding one success and one failure to the data before analysis. While not an ideal solution, this fix leads to better estimates of fixed effects in the single gene models, where there are only six total observations. It also does not seem to negatively affect estimates in the full model, but if one wants to avoid such a workaround, the hierarchical constructions performs similarly without the need to alter the data, while simultaneously estimating variance components accurately. Finally, for RPL and LA, we see marginal gains at best in point estimates when specifying a full model construction. This is perhaps not very surprising, as the full model does not facilitate much borrowing of information across genes. On the other hand, the hierarchical model does make use of this paradigm, and correspondingly we see marked improvements in estimation over the performance of simple one gene Bayesian models. 28 Although it was not of interest here, it is worth noting that the RPL method is more flexible in the model structures one can estimate than the LA method is. 29 Bibliography [gli, 2008] (2008). SAS/STAT 9.2 User’s Guide, chapter The GLIMMIX Procedure. SAS Institute Inc. [Bates, 2012] Bates, D. (2012). Linear mixed model implentation in lme4. [Bates et al., 2011] Bates, D., Maechler, M., and Bolker, B. (2011). lme4: Linear mixed-effects models using S4 classes. R package version 0.99937542. [Bolker et al., 2008] Bolker, B., Brooks, M., Clark, C., Geange, S., Poulsen, J., Stevens, H., and White, J.-S. (2008). Generalized linear mixed models: a practical guide for ecology and evolution. Trends in Ecology and Evolution, 23(3):127–135. [Breslow and Clayton, 1993] Breslow, N. and Clayton, D. (1993). Approximate inference in generalized linear mixed moels. Journal of the American Statistical Association, 88(421):9–25. [Gelman et al., 2008] Gelman, A., Jakulin, A., Pittau, M., and Su, Y.-S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4):1360–1383. [Heinze and Schemper, 2002] Heinze, G. and Schemper, M. (2002). A solution to the problem of separation in logistic regression. Statistics in Medicine, 21:2409–2419. [Plummer, 2011] Plummer, M. (2011). JAGS Version 3.1. [R Development Core Team, 2011] R Development Core Team (2011). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. ISBN 3-900051-07-0. 30 BIBLIOGRAPHY [Schelldorfer et al., 2013] Schelldorfer, J., Meier, L., Bhlmann, P., Winterthur, A., and Zrich, E. (2013). Glmmlasso: An algorithm for highdimensional generalized linear mixed models using 1-penalization. Journal of Computational and Graphical Statistics. [Wolfinger and O’Connell, 1993] Wolfinger, R. and O’Connell, M. (1993). Generalized linear mixed models: a pseudo-likelihood approach. Journal of Statistical Computation and Simulation, 4:233–243. 31 Chapter 5 Appendix Table A.1: Rejection Rate for β1 = 0 Simulation 1 Method True Value of β1 RPL LA Bayesian 0 0.008 0.188 0.000 −0.5 0.133 0.606 0.006 −1 0.478 0.954 0.126 −2 0.938 1.000 0.993 Table A.2: Rejection Rate for β1 = 0 Simulation 2 Method True Value of ρ RPL LA Bayesian 2.2 0.981 0.999 1.000 0 1.000 0.983 0.997 Table A.3: Rejection Rates for β1 = 0 Simulation 3 Method Gene True Value of β1 RPL LA Hierarchy Gene 1 −∞ 1.000 1.000 1.000 Gene 2 0 0.045 0.083 0.056 Gene 3 −0.41 0.286 0.480 0.249 32