ADB-ADB2-Atkins20111143-RR

advertisement
Longitudinal Count Regression
1
Technical Appendix
This technical appendix covers several issues that are important in fitting GLMMs but
also require more statistical background than is assumed in the main body of the text.
Brief Overview of Optimization in GLMMs
A critical issue for more complex statistical models is estimation. Considering the
models described in equations 1-3 (and GLMMs generally), what method can be used to estimate
the various parameters (i.e., fixed and random-effects)? Linear regression is often referred to as
ordinary least squares regression, which describes the estimation strategy used for linear
regression. Least squares estimation is directly parallel to solving simple algebra problems:
There is an equation with one or more unknowns that can be solved directly. However, as soon
as we move beyond OLS regression, many statistical models no longer have a “closed form”
solution to estimation, and iterative fitting algorithms are required. GLMMs can be estimated
via maximum likelihood (ML) estimation or Bayesian Markov chain Monte Carlo (MCMC)
estimation, which are both considered here briefly.
Software implementations of ML for linear mixed models are now fairly quick and
accurate except for special circumstances (e.g., cross-classified models or extremely large
datasets). However, with GLMMs there is an added layer of complexity that comes with the
non-normal outcome. Specifically, to solve the likelihood equation, it is necessary to integrate
over the random-effects, which is far more challenging with a non-identity link function that
connects the left and right-hand side of the model. Currently, various estimation methods are
used for parameters in GLMMs. Here we provide a general description aimed at applied issues,
but further details can be found in Raudenbush and Bryk (2002) and Hedeker and Gibbons
(2006). An excellent, article length overview of GLMMs that touches on estimation issues and
Longitudinal Count Regression
2
software is Bolker et al. (2009), though examples are related to ecology. The earliest methods
were called marginal quasilikelihood (MQL) and penalized quasilikelihood (PQL), but both
MQL and PQL have been shown to lead to biased estimates, especially for variance terms (see,
e.g., Rodriguez & Goldman, 2001). Moreover, the approximations used by these methods did
not allow the use of deviance tests to compare models. PQL is still an option in some software
packages (see Table 1 in Bolker et al., 2009). Currently, most software packages for GLMM use
either a Laplace estimation procedure or adaptive Gaussian quadrature (AGQ). Both are more
accurate than either MQL or PQL and yield accurate likelihood statistics that can be used for
model comparison purposes. When available, the AGQ procedure is more accurate than
estimation using Laplace estimation; however, AGQ can be quite slow, particularly with more
than one random-effect (see, e.g., Rabe-Hesketh & Skrondal, 2008; Raudenbush & Bryk, 2002).
In addition, to use AGQ it is necessary to specify a number of quadrature points, and greater
accuracy can be achieved with more quadrature points. But, again, model fitting time increases
as quadrature points increase. Thus, there is a clear trade-off between speed of estimation and
accuracy. Particular for smaller datasets or complex models, estimation methods should be
scrutinized.
An alternative approach to estimation is to use Bayesian Markov chain Monte Carlo
(MCMC) methods, which are growing in popularity as computing power has reduced many of
the historically challenging issues in fitting such models. Bayesian models do have several
fundamental differences from the more common statistical methods (often referred to as
frequentist methods). Bayesian models explicitly incorporate prior information into the model,
which has led to criticisms of Bayesian methods as subjective. It is possible to include specific
information in prior distributions (e.g., the effect of a given parameter should lie in a specific
Longitudinal Count Regression
3
interval) often called “informative priors.” However, in applied research it is far more common
to use uninformative priors. These imply no practical preference for any specific value of a
parameter over another. Moreover, except in small datasets, uninformative priors contribute a
relatively small amount to the resulting estimates as compared to the data itself (Gelman & Hill,
2007).
A second difference is that Bayesian methods typically use MCMC estimation. MCMC
is a simulation-based estimation procedure, which has shown to be very accurate under a wide
array of conditions (for an overview of Bayesian methods for GLMM, see Draper, 2008, or for a
general introduction to Bayesian methods for social science, see Lynch, 2007). Although ML
fitting procedures are iterative, there is always a convergence criterion at which point the
algorithm stops. With MCMC estimation there is no convergence criterion, and the data analyst
specifies a number of iterations, at which point the analyst must ascertain whether the
simulations have converged to appropriate, final estimates. Although this aspect of Bayesian
methods has also been somewhat controversial, there is a general consensus on both tools and
guidelines for judging when results have converged (see, discussion of convergence issues in
Draper, 2008, and Gelman & Hill, 2007). Finally, MCMC estimation typically leads to a sample
of estimates for each parameter, resulting from the simulation (e.g., 1,000 draws from the
simulation for each parameter might be saved for analysis). Somewhat similar to bootstrapping,
these simulated draws from the posterior distributions of parameters can be summarized by their
mean and confidence intervals. Although this quick overview of Bayesian methods may sound
quite different from what occurs with more common statistical procedures (and ML for
GLMMs), results for simpler models are often highly similar if not identical across Bayesian and
frequentist approaches (Gelman & Hill, 2007). Thus, somewhat similar to AGQ, MCMC
Longitudinal Count Regression
4
estimation is highly accurate though more time consuming than alternatives; however, one
notable difference with ML estimation generally is that MCMC estimation has been shown to be
accurate (i.e., point estimates and appropriate coverage of CI) even in small sample sizes (see,
e.g., discussion in chapter 13 of Raudenbush & Bryk, 2002).
Conditional vs. Marginal Fixed-Effects: Random-Effects and Link Functions in GLMM
With a linear mixed model (LMM) the link function is the identity function, which
practically means there is no link function (i.e., it is similar to multiplying something by one,
nothing changes). In the main body of the text, we considered the implications of this for
interpreting the fixed-effects, but it also has important implications for the random-effects, which
in turn affects predictions from GLMM and directly relates to the distinction between marginal
coefficients (sometimes called population-average models) vs. conditional coefficients (or unitspecific models). Most of the material in this section of the appendix is taken from Raudenbush
and Bryk (2002; chapter 10), Raudenbush (2008), Breslow and Clayton (1993), and Heagerty
and Zeger (2000).
Let’s return to the full model from our initial Poisson GLMM:
log(E[RAPIti]) = b0 + b1Male + b2Time + b3Male*Time + u0i + u1iTime
(A1a)
where subscripts are the same as earlier. All error terms are assumed normally distributed with a
mean of zero and unknown variance. If equation A1a represented an LMM, then the randomeffects (i.e., subject specific effects) have mean zero on the scale of the linear predictor (i.e.,
right-hand side of equation) as well as the scale of the outcome (i.e., left-hand side of equation).
Because of this, they do not contribute anything to the average predictions from the model. For
example, simple slopes for interpreting the interaction of Male and Time do not need to include
Longitudinal Count Regression
5
the random-effects, as the mean of zero and identity link function means that our predictions are
averaging over the random-effects distributions.
However, with GLMMs that have a non-identity link function, this relationship changes.
The right hand side of the equation is considered the linear predictor in the language of GLMM,
and it is connected to the outcome via a link function. For our count regression models, this is
the log link, and we can make this explicit via:
E[RAPIti] = exp(b0 + b1Male + b2Time + b3Male*Time + u0i + u1iTime)
(A1b)
With equation A1b the error terms still have a mean of zero, but only on the linear predictor
scale. The error terms do not have a mean of zero on the original scale of the RAPI because we
have to exponentiate the random effects to get them back to the original scale of the RAPI. For
example, this can be seen in aFigure 1, which plots the subject-specific effects for the intercept
from the model above (i.e., random-effects centered around the fixed-effect intercept), on both
the scale of the outcome (y-axis) and the scale of the linear predictor (x-axis). Note that the
distributions are centered around the fixed-effect intercept. The histogram on top of the graph
shows the (approximately) normally distributed intercept variance on the linear predictor scale,
whereas the marginal histogram to the right shows the same distribution on the scale of the
outcome (i.e., after the values have been raised to the base e). The solid black line shows the
mean of the subject-specific effects on the linear predictor scale, which is simply the fixed-effect
estimate (i.e., mean of the distribution on the top of the graph; 4.59 on the outcome scale). The
dotted line shows the mean of the exponentiated subject-specific effects (i.e., mean of the
distribution on the right of the graph; 6.63 on the outcome scale).
Several points are worth noting here: (a) the skewed distribution on the outcome scale
makes sense with what we know of the data, but (b) exponentiating the random effects means
Longitudinal Count Regression
6
that there is no longer a mean of zero. In fact, this is the reason why the rate ratio of the fixedeffect intercept (RR = 4.2) notably underestimates the average value of the outcome for women
at time equal to zero (M = 6.3). Breslow and Clayton (1993) and others have noted that
predictions from a Poisson GLMM need to include both fixed and random-effects:
E[Y ] = exp(XB + diag(ZDZ ') / 2)
A2
which uses matrix notation to designate: X as a fixed-effect design matrix, B as a vector of fixedeffect coefficients, Z as a random-effect design matrix (and Z' as its matrix transpose), and D as a
variance-covariance matrix of random-effects. Finally, exp is the exponentiate function (i.e.,
raise to the base e, the inverse link function for Poisson) and diag specifies the diagonal elements
of the resulting matrix.
The equation above was used to estimate the marginal predictions from the count
submodel of the TLFB example, shown in Figure 7. For that example, the components of the
equation took the following values:
é
ê
ê
ê
ê
X =ê
ê
ê
ê
ê
ë
1
1
1
1
1
1
1
1
0
1
0
1
0
1
0
1
0
0
1
1
0
0
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
0
0
1
0
0
0
0
0
1
0
1
0
0
0
0
0
0
1
1
ù
é 0.98 ù
ú
ê
ú
ú
0.08 ú
ê
ú
ê 0.18 ú
ú
ú B = ê 0.46 ú
ê
ú
ú
0.11 ú
ê
ú
ê -0.10 ú
ú
ê 0.19 ú
ú
ë
û
û
é
ê
ê
ê
ê
Z =ê
ê
ê
ê
ê
ë
1
1
1
1
1
1
1
1
0
1
0
1
0
1
0
1
ù
ú
ú
ú
é 0.25 -0.01 ù
ú
ú D =ê
ú
ë -0.01 0.04 û
ú
ú
ú
ú
û
Several of these matrices are straightforward: B is the estimated fixed-effects for the count
submodel on the scale of the link function (i.e., not transformed to RR), and D is the estimated
variance-covariance matrix of random-effects. For the TLFB example, the count submodel
includes a random intercept and random slope for weekend, and thus, D is a two by two matrix.
X is a design matrix for the fixed effects, in which the columns correspond to the: a) intercept, b)
Longitudinal Count Regression
7
weekend indicator, c) gender indicator, d) fraternity / sorority indicator, e) weekend by gender
indicator, f) weekend by fraternity / sorority indicator, and g) gender by fraternity / sorority
indicator. The interaction columns (i.e., final three columns) are simply the result of multiplying
the appropriate main effect columns (e.g., the final column is the result of multiplying columns c
and d).
The rows of matrix X correspond to the eight subgroups crossing weekend, gender, and
fraternity / sorority. For example, the first row has only the intercept included and corresponds
to the estimated mean for women on weekdays who are not in a sorority (i.e., the group taking
zero values on all covariates). The final row represents men on weekends who are in a fraternity
(i.e., the group taking values of one on all covariates). Because the model includes random
effects for the intercept and weekend, Z contains those two columns of X corresponding to these
effects. One final note is that for the over-dispersed Poisson model, the per-observation variance
also must be included.1 Using these values of the matrices and the formula above, the estimated
marginal predictions from the count submodel of the TLFB example are:
é
ê
ê
ê
ê
E[Y ] = ê
ê
ê
ê
ê
ë
3.45
3.71
4.20
5.07
5.49
5.34
8.29
9.04
ù
ú
ú
ú
ú
ú
ú
ú
ú
ú
û
1
The per-observation random-effect in an over-dispersed Poisson is like the residual error term in OLS regression
(or a linear mixed model). Equation A2 can be extended to include this term via:
E[Y ] = exp(XB + (diag(ZDZ ')+ s 2 ) / 2)
This was the version of the equation used for the TLFB, where sigma (i.e., the per-observation random effect) was
equal to 0.26.
Longitudinal Count Regression
8
Within rounding error, these are the values that are plotted in Figure 7.
Beyond the consideration that predictions from GLMM must incorporate random-effects
terms, this example also underscores that the interpretation of coefficients from GLMMs are
somewhat different than their LMM counter-parts. In the statistical literature this distinction is
often discussed as differences between population-average vs. unit-specific (e.g., Raudenbush &
Bryk, 2002) or marginal vs. conditional (e.g., Heagerty & Zeger, 2000). As a concrete example,
the model in equation 1 was re-fit using generalized estimating equations (GEE; Liang & Zeger,
1986). GEEs are an alternative class of statistical models that are also appropriate for
longitudinal and clustered data. As opposed to GLMM, GEE models treat the correlations due to
the clustering as a nuisance parameter, and GEE do not directly model subject-specific effects as
in GLMM. For present purposes, their critical feature is that GEE coefficients have a marginal
or population-average interpretation; GEE coefficients do average over the individual
differences. Table A1 displays the raw coefficients (i.e., on log scale) from GEE and GLMM fits
to the RAPI data. Between the two models, the coefficients are different to varying degrees.
However, it can be readily shown that the GEE coefficients reflect approximate averages of the
entire sample. For example, estimates of baseline drinking problems for women and men (e1.79 =
6.0 and e(1.79 + 0.28) = 7.9, respectively) are very close to the raw means in the sample. Because
the GLMM coefficients are conditional on the random-effects distribution, they do not retain this
average interpretation.
Practically, what should we make of this? First, it is important to realize the distinction
between conditional and marginal coefficients (and correspondingly between GLMM and GEE).
It can be a bit startling to exponentiate the fixed-effect intercept of a Poisson GLMM and find
that it is not that close to the mean in the raw data. Thus, for starters, it is important to realize
Longitudinal Count Regression
9
there is a reason for this. Second, after realizing this, it can be tempting to think of marginal
coefficients (and possibly GEE models) as being “correct” and conversely that GLMM are
perhaps “incorrect.” Raudenbush (2008) casts the two models as having two different foci: GEE
(or population-average coefficients) are more appropriate for questions about the sample as a
whole, whereas GLMM (or unit-specific coefficients) are more appropriate for questions focused
on individuals or distributions of individuals. However, it is likely that many times applied users
would prefer to interpret output from GLMM as marginal coefficients, which is clearly not right.
There are formulas to convert conditional coefficients from GLMM to marginal coefficients
(e.g., see Heagerty & Zeger, 2000, which are automatically provided in the HLM software). For
random intercept only models, these are straightforward, but for anything more complex,
converting conditional to marginal is not trivial. For predictions from a GLMM (e.g., mean
difference based on dichotomous covariate, or simple slopes), it is possible to use Equation A2 to
effectively average over the random-effects distribution. The accompanying R code shows an
alternative strategy of using a Monte Carlo simulation from the random-effects to yield marginal
predictions.
Given all this, should GEE be preferred (or, said another way, should GLMM be
avoided) if the focus is on drawing conclusions about the sample as a whole? Unfortunately, it is
not quite that straightforward. For simpler models, the decision between GEE and GLMM is a
matter of convenience, and hence, GEE might be preferred for interpretational reasons.
However, GEE have certain limitations: They make a stronger assumption about missing data
and assume that time-varying covariates are not correlated over time. In addition, GLMMs
estimate subject-specific effects, allowing the data analyst to examine the distribution of
individual intercepts and slopes (e.g., what percentage of individuals are improving over time in
Longitudinal Count Regression 10
a treatment study?), which at times may be substantively interesting. GEE treats the correlations
due to clustering as a nuisance and does not estimate individual effects. Finally, there are broad
discipline differences in the familiarity with the models generally. Mixed models have been
much more prevalent in the social sciences, whereas GEE at this point is not as common.
Extending linear mixed models to non-normal outcomes is likely building from an established
foundation in the social sciences. In summary, our primary recommendation with respect to
conditional versus marginal models is that GLMM users should familiarize themselves with the
issues so that they might be informed users and correctly interpret their models. The citations
noted throughout this section would be an excellent starting place.
Longitudinal Count Regression 11
Table A1
RAPI Data Raw Coefficients for GEE and GLMM
RRGEE
RRGLMM
Intercept
6.00
4.00
Male
1.32
1.22
Time
0.99
0.96
Male x Time
1.01
1.02
Longitudinal Count Regression 12
Figure Captions for Appendix
Figure A1. Plot of subject-specific intercepts (i.e., random-effects centered around the fixedeffect intercept) on the scale of the linear predictor (x-axis) and outcome (y-axis). The solid line
is the mean of the distribution on the linear predictor scale, whereas the dotted line is the mean
on the outcome scale (i.e., after exponentiating all the values).
Longitudinal Count Regression 13
Longitudinal Count Regression 14
Figure Captions for Extra Figures
eFigure 1. Histograms and quantile-quantile plots of random effects from Poisson generalized
linear mixed model fit to RAPI data, including over-dispersion term.
eFigure 2. Plot of observed frequency counts along with predicted counts from Poisson GLMM
(dotted line).
Longitudinal Count Regression 15
Longitudinal Count Regression 16
Download