Analysis of Binomial Response Data with Generalized Linear Mixed Effects Models

advertisement
Analysis of Binomial Response Data with
Generalized Linear Mixed Effects Models
Andrew Lithio
Program of Study Committee:
Dr. Dan Nettleton, Major Professor
Dr. Jarad Niemi
Dr. Vivekananda Roy
May 6, 2013
Chapter 1
Introduction
Generalized Linear Mixed Models (GLMM) are a flexible extension of Generalized Linear Models which have the ability to model random effects, correlations between observations, and overdispersion in observed data. GLMMs
are often well suited for analysis of data in biostatistics, where response data
are commonly modeled as Poisson or binomial random variables, and experimental designs may contain blocks which induce correlation in observed data
[Bolker et al., 2008]. However, even when random effects are assumed to be
multivariate normal, the full likelihood is often an intractable integral of such
high dimension that numerical integration is not feasible.
To estimate parameters of these models, it is common to either compute
pseudo-data that approximately follows a Linear Mixed Model, or to approximate the likelihood directly. Two popular computational routines used for
estimating parameters of GLMMs are PROC GLIMMIX of SAS, and glmer
of the lme4 R package by Douglas Bates. GLIMMIX defaults to utilizing the
former method, while glmer uses the latter. As a third possibility, Bayesian
methods avoid integrating the likelihood altogether by sampling from the
posterior density.
Throughout this paper we consider data from a functional assay experiment on the adaptation of flatworms to hypoxia, which is the deprivation
of oxygen. Researchers used mutated worms where the expression of
one of 13 different genes was inhibited in each worm, with the goal of
identifying which genes help worms adapt to hypoxia. For each gene in
the experiment, 20 adults laid eggs that were put into a hypoxia chamber,
while the eggs of 20 other adults were kept in a normoxia state. The total
number of eggs put into the hypoxia chamber and kept in normoxia were
1
recorded for each gene. For each combination of gene and treatment there
were well over 60 eggs, with a median of 155.5. After 72 hours, the total
number of worms that had hatched from eggs and grown to adulthood
were counted and recorded for each state. In this paper, we will refer
to the proportion of eggs that hatch and grow to adulthood as simply
the proportion hatched or the hatch rate. This process was carried out
for all genes of interest in a single batch, then repeated two more times
on different days. Researchers also indicated that we should expect to
observe overdispersion in the data due to suspicions of nonuniform genetic
backgrounds of the specimens. When wild type worms–worms with no
gene expression inhibited–were used, 100% of eggs hatched and grew to
adulthood when kept in either normoxia or hypoxia. We wish to identify
genes whose suppression lead to significantly lower hatch rates under hypoxia
than under normoxia. As an example, data for two genes are given in Table 1.
Table 1: Observed Data for mmcm-1 and
Gene
Treatment Day Total Eggs
mmcm-1 Normoxia 1
95
1
111
mmcm-1 Hypoxia
mmcm-1 Normoxia 2
155
2
125
mmcm-1 Hypoxia
mmcm-1 Normoxia 3
185
mmcm-1 Hypoxia
3
368
Normoxia 1
127
hif-1
Hypoxia
1
159
hif-1
hif-1
Normoxia 2
80
hif-1
Hypoxia
2
211
Normoxia 3
95
hif-1
hif-1
Hypoxia
3
116
hif-1
Eggs Hatched
71
67
90
69
126
196
127
22
80
44
95
24
Hatch Rate
0.747
0.604
0.581
0.552
0.681
0.533
1.00
0.139
1.00
0.209
1.00
0.207
A possible model for data from a single gene is as follows. Let j index treatment with j = 1 for normoxia and j = 2 for hypoxia, and let k
index days (k = 1, 2, 3). Let Hjk and njk denote eggs hatched and total
eggs, respectively, for the j th treatment and k th day. Assume
Hjk ∼ Binomial(njk , πjk )
and let
Yjk =
2
Hjk
.
njk
Furthermore, suppose
logit(πjk ) = ηjk ,
and that
η = (η11 , η21 , η12 , η22 , η13 , η23 )T = Xβ + Zγ,
where




X=



1
1
1
1
1
1
0
1
0
1
0
1







 Z=






β=

β0
β1
1
1
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
1
0
0
0
0







r
γ=
=

e





0
0
1
0
0
0
r1
r2
r3
e11
e21
e12
e22
e13
e23
0
0
0
1
0
0

0
0
0
0
1
0
0
0
0
0
0
1





















r ∼ N (0, σr2 I3 )
and
e ∼ N (0, σe2 I6 ).
r
The vector of random effects, γ, is partitioned into
, where r has 3
e
elements representing block effects and e, independent of r, has 6 elements,
one for each binomial observation,
to account for overdispersion. The fixed
β0
effects are contained in β =
, and consist of one intercept term and one
β1
term representing the effect of hypoxia. We interpret β0 as the expected logit
of the probability of hatching under normoxia, and β0 + β1 as the expected
logit of the probability under hypoxia.
A model for the full dataset may be parameterized similarly, with the 13
genes indexed by i. Fixed effects in the full model consist of one intercept
term for each gene and another term representing the effect of hypoxia for
3
that gene, for a total of 26 elements in β. The specification of r and e is
analogous, where we assume a unique eijk for each observation and a unique
rik for each combination of gene and day, giving e 78 elements and r 39
elements, for a total of 117 elements in γ.
We are interested in the performance of the pseudo-data, likelihood approximation, and Bayesian methods of estimation on data structured similarly to ours–data with binomial responses, random block effects and overdispersion, with a relatively high number of trials for each binomial observation.
We will also investigate whether there is a benefit to building a model that
includes all 13 genes, or if separate models for each gene perform just as
well. In Section 2.4 we introduce the concept of separation, which is an additional complication in estimating the parameters of models with binomial
responses.
The remainder of the paper is structured as follows: Chapter 2 details the
algorithms used by GLIMMIX and glmer, as well as specifies two possible
hierarchical models for use with Bayesian inferential methods. Section 2.4
defines separation, its effects on our estimation techniques, and possible ways
to negate those effects. Chapter 3 reports the results of three simulation
studies and applies the three methods to the observed data. We conclude
with a discussion of the results in Chapter 4.
4
Chapter 2
Methods
2.1
Restricted Pseudo Likelihood
The first approach we will discuss is based on pseudo likelihood, and is also
called a linearization method. The restricted pseudo likelihood approach
(RPL) uses a Taylor series approximation to compute pseudo-data which
approximately follows a linear mixed model, whose parameters are then estimated in the traditional manner. The following discussion is founded upon
Wolfinger and O’Connell [Wolfinger and O’Connell, 1993] and the GLIMMIX manual. In each of the following sections we will focus on the case
of binomial response data, making note when we are able to generalize.
2.1.1
Notation and Model Statement
We begin with an n × 1 vector of observed data, Y = (Y1 , ..., Yn )T , with
E[Y |γ] = g −1 (Xβ + Zγ) = π = (π1 , ..., πn )T
where γ is an q × 1 vector of random effects, g is a differentiable monotonic
component wise function taken throughout this paper to be the logit function,
X is an n × m matrix, and Z is an n × q design matrix. Here we take Y
to be a random vector of observed proportions of binomial random variables
instead of counts, and for simplicity, use a single subscript i to indicate the
ith element of Y or π. Furthermore, γ ∼ N (0, G), where each element of G
is a known function function of a vector of variance components, and
Var[Y |γ] = A1/2 RA1/2
5
2.1 Restricted Pseudo Likelihood
where A is a diagonal matrix of the conditional variance functions, in our case
πi (1 − πi )
. The R matrix can used to induce nonwith ith diagonal entry
ni
diagonal correlation or variance structures. We will set R to the identity, but
it is often taken to be some unknown overdispersion parameter, φ, times the
πi (1 − πi )
identity. In our case, Var[Y |γ] is a diagonal matrix with entries of
.
ni
To simplify notation where possible, we set η = Xβ + Zγ so that g −1 (η) =
E[Y |γ] = π.
2.1.2
Taylor Series Approximation
We begin with a first-order Taylor series approximation about initial estimates β̂ and γ̂
ˆ
ˆ
g −1 (η) ≈ g −1 (η̂) + ∆X(β
− β̂) + ∆Z(γ
− γ̂)
where
ˆ =
∆
∂g −1 (η)
∂η
β̂,γ̂
which is a diagonal matrix of derivatives of the mean evaluated at the initial
eη̂i
. We
estimates. For the logit function, the diagonal terms are
(1 + eη̂i )2
rearrange the Taylor series approximation to get
ˆ −1 (π − g −1 (η̂)) + X β̂ + Z γ̂ ≈ Xβ + Zγ
∆
2.1.3
(2.1)
Pseudo-Model
Now define the pseudo-data
ˆ −1 (Y − g −1 (η̂)) + X β̂ + Z γ̂
P =∆
ˆ −1 fixed, the expected value of P is the left side of of
With β̂, γ̂, η̂, and ∆
(2.1), and
ˆ −1 .
d |γ] = ∆
ˆ −1 Â1/2 RÂ1/2 ∆
Var[P
In our case, this simplifies to an n × n diagonal matrix with terms of
2
(1 + eη̂i )2
π̂i (1 − π̂i )
. We then take
η̂
ei
ni
P = Xβ + Zγ + 6
2.1 Restricted Pseudo Likelihood
as a linear mixed model with pseudo-response P , fixed effects β and random
effects γ as previously defined, and Var[] =Var[P |γ]. We assume is normally distributed with mean 0. Additionally, define θ as the q × 1 vector of
unknowns in G and R. We can write the marginal variance of P as
ˆ −1 A1/2 RA1/2 ∆
ˆ −1
V (θ) = ZGZ T + ∆
2.1.4
Optimization
The restricted log pseudo-likelihood of our linear mixed model is known to
be, up to an additive constant,
1
1
1
l(θ, p) = − log|V (θ)| − rT V (θ)−1 r − log|X T V (θ)X|
2
2
2
where r = p − X(X T V −1 X)− X T V −1 p. Note that β has been profiled out.
The elements of θ are estimated using numerical optimization methods, then
fixed and random effects are estimated as usual for linear mixed models,
yielding
β̂ = (X T V (θ̂)−1 X)− X T V (θ̂)−1 p
γ̂ = ĜZ T V (θ̂)−1 r̂
Using these estimates, the pseudo-data are recomputed and parameters
are re-estimated, continuing until the estimates converge.
2.1.5
Standard Errors
We now compute an estimate of the variance-covariance matrix of β̂. Denote
d |π̂] = S. Then the mixed model
estimated conditional variability of P , Var[P
equations are
T −1
T −1 X S X
X T S −1 Z
X S p
β̂
=
,
T −1
T −1
−1
Z T S −1 p
Z S X Z S Z + G(θ̂)
γ̂
and
X T S −1 X
X T S −1 Z
Z T S −1 X Z T S −1 Z + G(θ̂)−1
Ω̂
Ω̂X T V (θ̂)−1 ZG(θ̂)
−G(θ̂)Z T V (θ̂)−1 X Ω̂ M + G(θ̂)Z T V (θ̂)−1 X Ω̂X T V (θ̂)−1 ZG(θ̂)
C=
=
−
7
2.2 Laplace Approximation
is known to provide an estimate of the covariance matrix of the [β̂ T , γ̂ T −γ T ]T ,
where
Ω̂ = (X T V (θ̂)−1 X)−
and
M = (Z T S −1 Z + G(θ̂)−1 )−1 .
The standard errors of the elements of β̂ are the square roots of the diagonal
terms of Ω̂.
2.2
Laplace Approximation
Our second approach to estimating GLMMs uses LaPlace’s integral approximation (LA). It was applied to GLMMs by Breslow and Clayton
[Breslow and Clayton, 1993]. It is the default method of the glmer function in Douglas Bates’s lme4 package of R for estimating GLMMs. Here we
closely follow Breslow and Clayton, Schelldorfer [Schelldorfer et al., 2013],
and Bates [Bates, 2012].
2.2.1
Likelihood Definition
We maintain the same notation as in the previous section, with the exception
of introducing a linear transformation of γ,
γ = Λθ u,
where u ∼ N (0, Iq ) so that γ ∼ N (0, Λθ ΛTθ ).
Our full likelihood can be written as
Z
L(β, θ|y) = [p(y|β, Λθ , u)f (u|Λθ )] du,
where p(y|β, Λθ , u) is the conditional pmf of y and f (u|Λθ ) is the density of
u. As stated before, in our case this integral has no closed-form solution, but
we will use LaPlace’s method to approximate this integral directly. Since
f (u|Λθ ) is multivariate normal, we substitute the density to get
Z
1 T
L(β, θ|y) ∝ exp log(p(y|β, Λθ , u) − u u du
2
Z
= exp(−S(u))du,
8
2.2 Laplace Approximation
where S(u) = − log(p(y|β, Λθ , u) + 21 uT u. LaPlace’s method will utilize a
second order Taylor series approximation about ũ = argmaxu [−S(u)], so we
first show how to find ũ given β and θ using a penalized iteratively reweighted
least squares algorithm (PIRLS).
2.2.2
PIRLS
Our goal is to minimize S(u), which can be done using the Newton-Raphson
algorithm. We first find the derivatives of S(u),
S 0 (u) = −(ZΛθ )T B(y − π) + u
and
S 00 (u) = (ZΛθ )T W (ZΛθ ) + Iq
where W is a diagonal matrix with elements (v(πi )g 0 (πi )2 )−1 , B is a diagonal
matrix with elements (v(πi )g 0 (πi ))−1 , and v(πi ) is the conditional variance
function. Our updates of u can then be found by solving
S 00 (u(j) )u(j+1) = S 00 (u(j) )u(j) − S 0 (u(j) )
(ZΛθ )T W (j) (ZΛθ ) + Iq u(j+1) = (ZΛθ )T W (j) (ZΛθ ) + Iq u(j) + (ZΛθ )T B (j) (y − π (j) ) − u(j)
(ZΛθ )T W (j) (ZΛθ ) + Iq u(j+1) = (ZΛθ )T W (j) z (j)
(j)
where z (j) = (ZΛθ )u(j) + W −1 B (j) (y − π (j) ). We continue iterations until
convergence, usually determined by relative change in the linear predictor,
η.
2.2.3
Likelihood Approximation
LaPlace’s approximation uses the second order Taylor series approximation
about ũ found from PIRLS to form the kernel of a normal distribution in
9
2.3 Bayesian Methods
order to carry out the integration.
Z
L(β, θ|y) ∝ exp(−S(u))du
Z
1
≈ exp(−S(ũ) − (u − ũ)T S 00 (ũ)(u − ũ))du
2
Z
1
= exp(−S(ũ)) exp(− (u − ũ)T S 00 (ũ)(u − ũ)du
2
−1
∝ exp(−S(ũ))|L|
where LLT = S 00 (ũ). This expression is then maximized with respect to β
and θ using numerical methods.
2.2.4
Standard Errors
The approximate covariance matrix of β̂ is calculated in the same manner as
for RPL. Following Section 2.1.5, we calculate
d β̂) = Ω̂ = (X T V (θ̂)−1 X)− ,
Var(
where
ˆ −1 Â∆
ˆ −1
V (θ̂) = ZG(θ̂)Z T + ∆
ˆ is the diagonal matrix of derivatives of the mean evaluated at β̂
where ∆
and γ̂, and  is the diagonal matrix of the conditional variance functions
evaluated at convergence.
2.3
Bayesian Methods
As an alternative to RPL and LA, we introduce two models which allow for
the application of Bayesian methods. The first is an adapted beta-binomial
model intended for analysis of data from a single gene, while the second
involves a multi-gene hierarchical model which makes use of the entire data
set. We will use JAGS [Plummer, 2011] to approximate the posteriors for
each of these models. Initial values will be chosen to be dispersed relative to
the priors and we will use the Gelman-Rubin statistic to check for a lack of
convergence.
10
2.3 Bayesian Methods
2.3.1
Adapted Beta-Binomial Model
Let j index treatment with j = 1 for normoxia and j = 2 for hypoxia, and
let k index days (k = 1, 2, 3). Let Hjk and njk denote eggs hatched and total
eggs, respectively, for the j th treatment and k th day. Assume
Hjk ∼ Binomial(njk , πjk )
and let
Yjk =
Hjk
.
njk
Furthermore, suppose
logit(πjk ) = logit(pj ) + rk + ejk
ind
rk ∼ N (0, σr2 )
ind
ejk ∼ N (0, σe2 ),
where pj represents E[Yjk |rk = 0, ejk = 0]. We assign the following priors:
ind
pj ∼ Beta(1, 1),
σr ∼ Unif(0, 1), and
σe ∼ Unif(0, 1).
The use of uniform priors on the variance components is recommended by
Gelman [Gelman et al., 2008]. After performing an exploratory analysis of
the data we do not expect σr or σe to exceed one, and a repeated analysis
using Unif(0, 10) priors instead reveals no change. The prior on pj is an
attempt to be relatively noninformative, but it can also be thought of as
similar to adding 1 success and 1 failure to each binomial observation.
2.3.2
Hierarchical Model
Let i index gene (i = 1, ..., 13), j index treatment with j = 1 for normoxia
and j = 2 for hypoxia, and let k index days (k = 1, 2, 3). Let Hijk and
11
2.3 Bayesian Methods
nijk denote eggs hatched and total eggs, respectively, for the ith gene, j th
treatment, and k th day. Assume
Hijk ∼ Binomial(nijk , πijk ),
and let
Yijk =
Hijk
.
nijk
Furthermore, suppose
logit(πijk ) = ηijk ,
and
ηijk =µi + (−1)j τi + rik + eijk ,
ind
rik ∼ N (0, σr2 ),
ind
eijk ∼ N (0, σe2 ),
with priors
ind
µi ∼ N (γ, ψ),
γ ∼ N (0, 1.52 ),
ψ ∼ IG(3, 2),
ind
τi ∼ N (0, ξ),
ξ ∼ IG(3, 2),
σr ∼Unif(0, 1), and
σe ∼Unif(0, 1).
As in the previous model, the use of uniform priors on the variance components follows the recommendation of Gelman [Gelman et al., 2008]. As
before, repeated analysis using Unif(0, 10) priors instead reveals no change.
Hyperparameters for γ, ψ, and ξ were chosen to induce reasonable priors on
12
2.4 Separation
the inverse logit of ηijk , with the goal of having nearly uniform weight on the
(0, 1) interval.
Here, µi − τi represents the expected logit of the hatch probability of gene
i under normoxia, with µi + τi representing the same quantity under hypoxia.
Then µi represents a center for gene i, with a treatment difference of 2τi . The
parameter γ acts as the location for the distribution of µi , and ψ controls the
scale of the µi , indicating how closely the data from different genes agree.
Likewise, ξ is the scale for the treatment effects, indicating if there is a wide
distribution of treatment effects or if they all take similar values.
2.4
Separation
Data with binomial responses with particularly high or low probabilities can
exhibit quasi-complete separation, which is where a predictor or a combination of predictor variables can perfectly separate–or determine–an outcome.
Seven of the thirteen genes in our dataset display this property, where in
each gene we observe 100% hatch rates under normoxia. Intuitively, this
gives us cause for concern when recalling our use of the logit link function
and likelihood based methods. In fact, both frequentist methods discussed
return unstable parameter estimates with extraordinarily large standard errors, although the GLIMMIX simply returns an error message indicating a
lack of convergence. In the remainder of this section, we briefly describe why
this occurs, and explore possible solutions.
2.4.1
Breakdown Under Separation
Recall that the variance of our observed binomial proportions conditional on
π(1 − π)
. But as π → 1 (or π → 0), π(1 − π) → 0.
the random effects is
n
In the RPL algorithm this leads to very small values in our  matrix, and
consequently inflates the V (θ̂)−1 matrix, inflating both our estimates of β
and the standard errors. Similarly, in the LA algorithm the convergence of
π(1−π) towards 0 begins to dominate the weighting matrix W , again leading
to unstable estimates.
13
2.4 Separation
2.4.2
Analysis Under Separation
While parameter estimates and standard errors from LA are unreliable, we
might still use the maximized likelihood to test the significance of a predictor
or set of predictors via a likelihood ratio test (this is not valid under RPL
because it uses a likelihood for pseudo-data instead of an approximation to
the likelihood of the data). However, there are no readily available methods
for further inference. Firth’s penalized likelihood approach can be applied
to logistic regression with fixed effects [Heinze and Schemper, 2002], but an
extension to mixed effects has not been derived. One ad hoc solution for
analysis under separation is to add one success and one failure to the binomial
observations. This process pulls the “observed” probabilities away from 0
or 1 and allows analysis to proceed as usual, while hopefully not adversely
affecting the resulting inferences.
Another alternative is to take a Bayesian approach. The priors put on
the parameters of our model can naturally restrict our estimation to a reasonable range and allow inference without artificially altering the data. In
the following chapter we investigate the performance of Bayesian methods
and the add one success and one failure technique.
14
Chapter 3
Simulation and Application
In this chapter, we apply the frequentist methods discussed in Chapter 2 to
the model discussed in Chapter 1, as well as use Bayesian methods to estimate
the parameters of the adapted beta-binomial model and hierarchical model
specified in Section 2.3. Note that our parameterization of fixed effects for the
frequentist methods is different than that of the models for Bayesian analysis
defined in Section 2.3. The models for Bayesian analysis are parameterized to
ensure that the induced priors on the normoxia and hypoxia probabilities are
the same. The results produced from the Bayesian analysis will be interpreted
in terms of the frequentist parameterization above to facilitate comparison
across methods.
In the remainder of this chapter we report the results of simulation
studies comparing the performance of the above methods for the single gene
models and the full models. We then apply these methods to the observed
data for select individual genes as well as the full data set and compare
the resulting inferences. Special attention will be paid to analysis of data
displaying separation. Throughout the chapter we will highlight two genes
that are representative of the dataset- the mmcm-1 gene and the hif-1 gene.
The observed hatch proportions of each of the genes are reported in Table 2.
Table 2: Observed Data
Gene
Normoxia Hypoxia Effect
mmcm-1
0.670
0.5627 -0.1093
hif-1
1.000
0.185
-0.815
15
3.1 Simulation Study
3.1
Simulation Study
We report the results of three simulation studies below. The first two entail data from a single gene, with the second study simulating genes with
separated data. For the second study we add one success and one failure to
each observation for the RPL and LA methods before estimation to allow
for inference. The third study simulates 13 genes and estimates full models,
where some genes display separation and some do not. Again, the genes with
separation will have one success and one failure added before the RPL and
LA analyses are performed. Each simulation consists of 1000 independent
repetitions. In each simulation we set σr = 0.5 and σe = 0.3.
Results are reported on the logit scale. We will interpret β0 as the estimated log odds of hatching under normoxia and exp(β1 ) as the multiplicative change in odds associated with hypoxia. Following the default reports
of glmer and GLIMMIX, we will use Wald type intervals for LA and t intervals for RPL. We use the posterior mean as point estimates and equal tailed
credible intervals for our Bayesian analysis.
3.1.1
Single Gene
We first compare each method’s performance in the absence of separation.
We set β0 , the log odds of hatching under normoxia, to 1, which corresponds
to a probability of 0.731. We then vary β1 , the treatment effect, ranging
from 0 to −2, which corresponds to ranging from no effect to decreasing the
odds by 86.5%. Boxplots of the parameter estimates are shown in Figures
1-4, and mean squared errors are given in Table 3. Boxplots of the length of
the confidence/credible intervals for the fixed effects are drawn in Figure 5
and 6, and Table 4 lists the coverage of those intervals.
16
3.1 Simulation Study
17
3.1 Simulation Study
Table 3: MSE of Parameter Estimates in Simulation 1
Method
Parameter True Value of β1 RPL
LA
Bayesian
β0
0
0.1224 0.1228
0.1194
β0
-0.5
0.1316 0.1321
0.1197
β0
-1
0.1176 0.1179
0.1036
β0
-2
0.1279 0.1284
0.0970
β1
0
0.0807 0.0811
0.0720
β1
-0.5
0.0839 0.0841
0.0751
β1
-1
0.0806 0.0808
0.0791
β1
-2
0.0792 0.0795
0.0864
σr
0
0.0980 0.0783
0.0036
σr
-0.5
0.0984 0.0798
0.0041
σr
-1
0.0966 0.0774
0.0043
σr
-2
0.0978 0.0798
0.0042
σe
0
0.0399 0.0639
0.0072
σe
-0.5
0.0422 0.0645
0.0071
σe
-1
0.0369 0.0591
0.0062
σe
-2
0.0388 0.0620
0.0074
18
3.1 Simulation Study
Table 4: Coverage Rates for 95% Intervals in Simulation 1
Method
Parameter True Value of β1 RPL
LA
Bayesian
β0
0
0.990 0.807
1.000
β0
-0.5
0.990 0.799
1.000
β0
-1
0.984 0.823
0.998
β0
-2
0.991 0.801
0.999
β1
0
0.992 0.812
1.000
β1
-0.5
0.987 0.804
1.000
β1
-1
0.992 0.797
1.000
β1
-2
0.994 0.840
1.000
It is apparent from the boxplots in Figures 1-4 that LA and RPL return very similar estimates for the fixed effects, while the Bayesian estimates
of the fixed effects are shrunken towards 0. However, since the Bayesian
estimates are less variable, the mean squared errors are comparable and
often better than those of the frequentist methods. Furthermore, Figures 5
and 6 show that the 95% intervals produced in the Bayesian analyses and by
RPL are much wider than those from LA, although the coverage rates of the
latter are much lower than desired and the coverage rates of the Bayesian
analyses and RPL are higher than desired. On the other hand, if interest is
focused on estimating the variance parameters, the Bayesian estimates are
less variable, and tend to be closer to the true generating values than those
produced by LA and RPL. Note that variance components estimated by
LA tend to be a little higher and more variable than those from RPL, but
confidence intervals for fixed effects also tend to be shorter. Table A.1 in the
appendix lists the probabilities of rejecting the null hypothesis β1 = 0 at the
0.05 level. It is clear that in this scenario LA gives us more power, although
also a low coverage rate and an unacceptably high type I error rate.
The second simulation has a similar structure, but we set the probability
of hatching under normoxia at 1, and again vary only the treatment effects.
The values of σr and σe remain the same. Estimates of the log odds of
hatching under normoxia all exceed 5, which corresponds to a probability
of over 0.99 and renders the summaries given in the first simulation rather
uninformative. Figure 7 plots the estimates of β0 and β1 , but the true
values of β0 and β1 are both undefined since the probability of hatching
under normoxia is set at 1. Instead, we will report the estimated log odds of
hatching under hypoxia as ρ and the coverage of its 95% intervals, but since
19
3.1 Simulation Study
we are still interested in testing β1 = 0, we will report the power and lengths
of 95% intervals corresponding to β1 . We generate data using ρ = 2.2 and
ρ = 0, which correspond to probabilities of 0.9 and 0.5 of hatching under
hypoxia. Table A.2 gives the power of testing β1 = 0. Estimates are plotted
in Figures 7-11, with interval lengths drawn in Figure 12. Mean squared
errors and are recorded in Table 5 and coverage rates of 95% intervals can
be found in Table 6.
20
3.1 Simulation Study
Table 5: MSE of Parameter Estimates in Simulation 2
Method
Parameter True Value of ρ RPL
LA
Bayesian
ρ
2.2
0.0250 0.1220
0.3433
ρ
0
0.0103 0.7095
0.4728
σr
2.2
0.2224 0.1693
0.0005
σr
0
0.2272 0.1528
0.0053
σe
2.2
0.0760 0.1081
0.0274
σe
0
0.0782 0.2508
0.0418
Table 6: Coverage Rates for 95% Intervals in Simulation 2
Method
Parameter True Value of ρ RPL
LA
Bayesian
ρ
2.2
1.000 0.776.
0.986
ρ
0
1.000 0.757
0.960
We observe some parallels to results from the first simulation, but the
similarities between LA and RPL begin to break down. Notably, RPL allows
accurate estimation of ρ, with substantially lower MSE, but with wider confidence intervals than LA. We also see that LA estimates the block variance
more closely than RPL, but its estimates of the subject level variation are
much more variable. LA also estimates ρ poorly, especially when the effect
is larger, and displays a much lower coverage rate than desired. Compara21
3.1 Simulation Study
tively, in the presence of separation, the adapted beta-binomial model still
performs well in estimating variance components. Estimates of ρ from the
adapted beta-binomial model are not as accurate as those from RPL, and
while our credible intervals for β1 are still wide, the coverage rates for ρ approach 0.95. Table A.2 in the Appendix lists the powers for testing β1 = 0.
Interestingly, our power for ρ = 2.2 is slightly higher than for ρ = 0, due
to the presence of the outliers plotted in Figure 12. However, we maintain
powers near 1 for all methods and each value of ρ.
3.1.2
Full Model
For our simulation study involving 13 genes, we again set σr = 0.5 and
σe = 0.3. Three genes were set to display separation with varying magnitudes of treatment effect, the 10 other genes were set to have varying
means and treatment effects. The values used to simulate data are reported
in Table 7, but we will focus only on the first 3 genes, which are representative of our dataset. We again report MSE in Table 8 and coverage
rates in Table 9. We report point estimates ρ for gene 1 as we did in the
second simulation, but report lengths of 95% intervals for β1 in Figure 18
as well as the probability of rejecting the null hypothesis β1 = 0 in Table A.3.
Table 7: True Values
Probability
Gene Normoxia Effect
1
1
−0.8
2
0.95
0
0.6
−0.1
3
4
1
−0.01
5
1
−0.10
0.90
−0.05
6
7
0.85
−0.00
8
0.80
−0.01
9
0.75
−0.00
10
0.70
−0.05
11
0.65
−0.00
12
0.55
−0.00
13
0.50
−0.05
Logit
Normoxia Effect
∞
−∞
2.94
0
0.41
−0.41
∞
−∞
∞
−∞
2.20
−0.46
1.73
0
1.39
−0.06
1.10
0
0.85
−0.23
0.62
0
0.20
0
0
−0.20
22
3.1 Simulation Study
23
3.1 Simulation Study
Table 8: MSE of Parameter Estimates in Simulation 3
Method
Parameter True Value RPL
LA
Hierarchy
ρ Gene 1
−1.39
0.1235 0.1244
0.1356
β0 Gene 2
2.94
0.1602 0.1644
0.2046
β1 Gene 2
0
0.1606 0.1648
0.1611
β0 Gene 3
0.41
0.1095 0.1098
0.0870
β1 Gene 3
−0.41
0.0763 0.0765
0.0817
σr
0.5
0.0090 0.0183
0.0115
σe
0.3
0.0052 0.0144
0.0068
Table 9: Coverage Rates in Simulation 3
Method
Parameter RPL
LA
Hierarchy
β0 Gene 2 0.953 0.900
0.954
β1 Gene 2 0.955 0.917
0.944
β0 Gene 3 0.960 0.878
0.966
β1 Gene 3 0.950 0.849
0.967
Interestingly, the hierarchical model performs very closely to RPL and
LA in estimating fixed effects, but only maintains the advantage of closer
estimation of the variance components compared to LA. We observe LA
24
3.2 Single Gene Results
producing lower estimates of variance components than RPL and the hierarchical model, as well as shorter confidence intervals for fixed effects and
lower coverage. Looking at the rejection rates for testing β1 = 0 for each
gene in Table A.3 reveals that our Type I error rates have decreased to much
more acceptable levels, although is still substantially higher than desired for
LA, and we maintain powers of 1 for gene 1. The powers for gene 3 do differ
between method, but due to the low coverage rate of LA, we would be better
served deferring to RPL or the hierarchical model.
Fitting a model including all the genes results in coverage rates of RPL
and the hierarchical model in line, or very close to meeting, our specified
0.95 rate. We are also able to build a hierarchical model for analysis with
Bayesian methods that estimates fixed effects with the essentially the same
precision as frequentist methods, though with more accurate estimation of
variance components than with LA and not requiring ad hoc adjustment of
the data. While the fitting of fixed effects is not improved in the frequentist
methods, we do see improvements in the estimation of variance components,
especially using RPL.
3.2
Single Gene Results
We now report the estimates from the pertinent methods for the mmcm-1
gene and hif-1 gene. Corresponding 95% intervals are below point estimates.
Table 10: Estimates for mmcm-1
Parameter
β0
β1
σr
σe
RPL
0.692
(−0.002, 1.386)
−0.460
(−1.217, 0.296)
0.165
0.136
Method
LA
0.683
(0.434, 0.931)
−0.468
(−0.739, −0.197)
0.120
0.049
25
Bayesian
0.615
(−0.516, 1.731)
−0.450,
(−1.603, 0.703)
0.414
0.330
3.3 Full Model Results
Table 11: Estimates for hif-1
Parameter
β0
β1
σr
σe
RPL
4.621
(2.093, 7.149)
−6.082
(−8.690, −3.475)
0
0.158
Method
LA
4.623
(3.483, 5.763)
−6.081
(−7.245, −4.917)
0.000
0.057
Bayesian
4.318
(2.746, 5.982)
−5.754
(−7.451, −4.114)
0.430
0.413
These results mirror what we observed in our simulation studies. The
Bayesian method estimates higher variance components than LA and RPL
do, but its fixed effects estimates are not in close agreement with the other
two methods either. The effect of hypoxia is significant at the 0.05 level in all
tests for hif-1, but only the test from LA is significant for mmcm-1. However,
if we were interested in estimating and testing the fixed effects, we would be
better served estimating the parameters of a full model, since simulation 1
showed that in this situation tests using LA have a very high type I error
rate and tests using RPL or Bayesian methods have very low power.
3.3
Full Model Results
We now report results from the full dataset model. To facilitate comparisons,
Table 12 only includes variance components and parameters from mmcm-1
and hif-1.
26
3.3 Full Model Results
Table 12: Estimates of the Full Model
Method
Parameter
RPL
LA
β0 mmcm
0.692
0.693
(0.235, 1.150)
(0.348, 1.039)
β1 mmcm
−0.450
−0.448
(−1.088, 0.185) (−0.889, −0.007)
β0 hif
4.618
4.635
(3.408, 5.823)
(3.454, 5.816)
β1 hif
−6.087
−6.106
(−7.383, −4.790) (−7.327, −4.884)
σr
0.046
0.115
σe
0.358
0.218
Hierarchical
0.753
(0.030, 1.508)
−0.457
(−1.342, 0.413)
7.991
(4.965, 13.110)
−9.452
(−14.615, −6.364)
0.276
0.494
Note that the RPL and LA estimates of the fixed effects are nearly identical to those from the single gene models. Although the confidence intervals
do differ, they are not uniformly wider or shorter. As we expected from
the simulations, the hierarchical model performs comparably to the other
methods. Note that the treatment effect of gene mmcm is now only significant using the LA test, which simulations showed had a high type I error
rate. We are therefore unable to declare a significant effect of hypoxia for the
mmcm gene. The parameter estimates produced by the hierarchical model
for hif-1 are not in line with the estimates of the other methods, which is
not completely unexpected, since we observed Bayesian methods producing
more extreme estimates of β0 and β1 in simulation 2. However, the 95%
credible interval produced by the hierarchical model for β1 of the hif-1 gene
still clearly indicates there is a statistically significant effect. Estimates of
variance components from RPL and LA have increased from the single gene
case, but are still fairly distinct from the estimates produced by Bayesian
methods.
27
Chapter 4
Discussion
Our goal was to evaluate three different common methods for fitting GLMMs,
while addressing how to fit models when the data displays separation. Furthermore, we wished to determine how the dataset in question should be
modeled, either on a gene-by-gene case or by building a full model using all
of the data available.
Simulation studies indicate that the RPL and LA methods for fitting
GLMMs perform nearly identically in estimating fixed effects, with the hierarchical model proposed in Chapter 2 performing similarly. However, if there
is interest in estimating variance components, Bayesian methods may provide more accurate estimates in the single gene case, with RPL performing
similarly with the full model.
We fitted models to data with separation both by proposing Bayesian
constructions and utilizing an ad hoc solution of adding one success and one
failure to the data before analysis. While not an ideal solution, this fix leads
to better estimates of fixed effects in the single gene models, where there
are only six total observations. It also does not seem to negatively affect
estimates in the full model, but if one wants to avoid such a workaround, the
hierarchical constructions performs similarly without the need to alter the
data, while simultaneously estimating variance components accurately.
Finally, for RPL and LA, we see marginal gains at best in point estimates when specifying a full model construction. This is perhaps not very
surprising, as the full model does not facilitate much borrowing of information across genes. On the other hand, the hierarchical model does make
use of this paradigm, and correspondingly we see marked improvements in
estimation over the performance of simple one gene Bayesian models.
28
Although it was not of interest here, it is worth noting that the RPL
method is more flexible in the model structures one can estimate than the
LA method is.
29
Bibliography
[gli, 2008] (2008). SAS/STAT 9.2 User’s Guide, chapter The GLIMMIX
Procedure. SAS Institute Inc.
[Bates, 2012] Bates, D. (2012). Linear mixed model implentation in lme4.
[Bates et al., 2011] Bates, D., Maechler, M., and Bolker, B. (2011). lme4:
Linear mixed-effects models using S4 classes. R package version 0.99937542.
[Bolker et al., 2008] Bolker, B., Brooks, M., Clark, C., Geange, S., Poulsen,
J., Stevens, H., and White, J.-S. (2008). Generalized linear mixed models: a practical guide for ecology and evolution. Trends in Ecology and
Evolution, 23(3):127–135.
[Breslow and Clayton, 1993] Breslow, N. and Clayton, D. (1993). Approximate inference in generalized linear mixed moels. Journal of the American
Statistical Association, 88(421):9–25.
[Gelman et al., 2008] Gelman, A., Jakulin, A., Pittau, M., and Su, Y.-S.
(2008). A weakly informative default prior distribution for logistic and
other regression models. The Annals of Applied Statistics, 2(4):1360–1383.
[Heinze and Schemper, 2002] Heinze, G. and Schemper, M. (2002). A solution to the problem of separation in logistic regression. Statistics in
Medicine, 21:2409–2419.
[Plummer, 2011] Plummer, M. (2011). JAGS Version 3.1.
[R Development Core Team, 2011] R Development Core Team (2011). R: A
Language and Environment for Statistical Computing. R Foundation for
Statistical Computing. ISBN 3-900051-07-0.
30
BIBLIOGRAPHY
[Schelldorfer et al., 2013] Schelldorfer, J., Meier, L., Bhlmann, P., Winterthur, A., and Zrich, E. (2013). Glmmlasso: An algorithm for highdimensional generalized linear mixed models using 1-penalization. Journal
of Computational and Graphical Statistics.
[Wolfinger and O’Connell, 1993] Wolfinger, R. and O’Connell, M. (1993).
Generalized linear mixed models: a pseudo-likelihood approach. Journal
of Statistical Computation and Simulation, 4:233–243.
31
Chapter 5
Appendix
Table A.1: Rejection Rate for β1 = 0 Simulation 1
Method
True Value of β1 RPL
LA
Bayesian
0
0.008 0.188
0.000
−0.5
0.133 0.606
0.006
−1
0.478 0.954
0.126
−2
0.938 1.000
0.993
Table A.2: Rejection Rate for β1 = 0 Simulation 2
Method
True Value of ρ RPL
LA
Bayesian
2.2
0.981 0.999
1.000
0
1.000 0.983
0.997
Table A.3: Rejection Rates for β1 = 0 Simulation 3
Method
Gene
True Value of β1 RPL
LA Hierarchy
Gene 1
−∞
1.000 1.000
1.000
Gene 2
0
0.045 0.083
0.056
Gene 3
−0.41
0.286 0.480
0.249
32
Download