introdução

advertisement
Likelihood Based Inference for Skew--Normal Independent Linear Mixed Models
Victor Lachos Davila1; Pulak Ghosh2; Reinaldo Arellano-Valle
INTRODUÇÃO
Longitudinal data analysis, in which repeated measurements are taken on subjects at various
time points, plays an important role in applied statistics, especially in biomedical research
involving clinical trials. Linear mixed model (LMM; Laird and Ware, 1982) has become the
most frequently used analytic tool for longitudinal data analysis with continuous repeated
measures. A linear mixed model consists of a fixed effects and random effects. The random
effects account for the between--subject variation. In a linear mixed model framework it is
routinely assumed that the random effects and the within--subject measurement error have a
normal distribution. While this assumption makes the model easy to apply in widely used
softwares such as SAS, the accuracy of this assumptions is difficult to check and the routine
use of normality is recently questioned by many authors (Butler and Louis, 1992; Verbeke
and Lesaffre, 1997; Zhang and Davidian, 2001; Ghidey et al., 2004; Lin and Lee, 2007).
Normality assumption is too restrictive as it suffers from the lack of robustness against
departures from the normal distribution, particularly when data shows multimodality and
skewness, and thus may not provide an accurate estimation of between—subject variation.
For example, Zhang and Davidian (2001) showed that the estimated subject--specific
intercept from the Framingham heart study data was not normally distributed and thus use of
normal distribution in this scenario may obscure important features of between--subject
variation.
Despite the above drawbacks, the widespread use of normal linear mixed model (N-LMM) is,
in part, motivated by mathematical convenience and by the fact that under general regularity
conditions estimates of the fixed effects are robust to nonnormality of random effects
(Verbeke and Lesaffre, 1997). However, a misspecified distribution of the random effects
may bias the estimation of the standard errors of the parameters as well as efficient estimation
1
Universidade Estadual de Campinas, hlachos@ime.unicamp.br
Georgia State University, pulakghosh@gmail.com
3 Pontificica Universidad Católica de Chile, Reivalle@mat.puc.cl
2
of fixed effects (Ghidey et al., 2004). Furthermore, inference on individual effects can be
misleading when the random effects distribution deviates from normality. Thus, it is of
practical interest to develop statistical model with considerable flexibility in the distributional
assumptions of the random effects as well as measurement error.
There has been considerable work in this direction. Verbeke and Lesaffre (1996) introduce a
heterogeneous linear mixed model where random effects distribution is relaxed using a finite
mixture of normal. Pinheiro et al. (2001) proposed a multivariate t—linear mixed model and
showed that it would perform well in the presence of outliers. Zhang and Davidian (2001)
proposed an LMM in which the random effects follow the so--called seminonparametric
(SNP) distribution. Rosa et al. (2003) adopted a Bayesian framework to carry out posterior
analysis in LMM with the thick--tailed class of normal/independent (NI) distributions (Lange
and Sinsheimer, 1993). Ghidey et al. (2004) develops a LMM with a smooth random effects
density. Ma et al. (2004) consider a generalized flexible skew--elliptical distribution for the
random effects density and proposed somewhat complicated algorithms for maximum
likelihood (ML) estimation and Bayesian inference via
Markov Chain Monte Carlo
(MCMC). Recently, Arellano--Valle et al. (2005), Lin and Lee (2007) and Lachos et al.
(2007a) proposed a skew--normal linear mixed model (SN--LMM) based on multivariate
skew-normal (SN) distribution introduced by Azzalini and Dalla--Valle (1996). They also
develop an EM--type algorithms for the maximum likelihood estimation (MLE). A common
feature of these classes of LMMs is that the normal linear mixed model (N-LMM) is a special
member in each class.
In this paper we propose a parametric robust modeling of LMM based on skew--normal/
independent distributions (Lachos et al., 2007b). In particular, we assume a skew--normal/
independent (SNI) distribution for the random effects and a NI distribution for the within-subject error. Together, the observed responses will follow a SNI distribution and defines
what we call a skew--normal/independent linear mixed model (SNI--LMM). The SNI class of
distribution is quite attractive as it simultaneously accounts for the skewness and thickness of
the tails in the data. Particularly, the SNI distributions provide a group of skew--thick--tailed
distributions that are useful for robust inference and contains as proper elements the skew-normal (SN), the skew--t (ST), the skew--slash (SSL), the skew--power exponential (SPE)
and the skew--contaminated normal (SCN) distribution. The marginal density of the observed
quantities are obtained analytically by integrating out the random effects, leading to a
observed (marginal) likelihood function that can be maximized directly by using existing
statistical softwares such as Ox, R or Matlab. The hierarchical representation of the proposed
model makes possible the implementation of a EM--type algorithm, which for special cases
and common situations yields closed form expressions for the E and M--steps. Furthermore,
we note that the information matrix has a common part for all elements in the family that
facilitates the direct application of inferences in SNI--LMM. We further analyze the
longitudinal Framingham cholesterol data whose distribution of the random effects has been
found to be non—normal and positively skewed by Zhang and Davidian (2001), Ghidey et al.
(2004), and Lin and Lee (2007).
The rest of the article is organized as follows. After a brief introduction of SNI distributions
in Section 2, the SNI--LMM is introduced in Section 3. In Section 4 we discuss the ML
estimation and inference, including the estimation of the random effects and the prediction of
future values. The observed information matrix is derived analytically in Section 5. In Section
6, simulation studies are conducted to examine the performance of the estimation for
subject--specific effects and prediction of futures values. The advantage of the proposed
methodology is illustrated using the Framingham cholesterol data in Section 7 and some
concluding remarks are presented in Section 8.
MATERIAL E MÉTODOS
- Maximum Likelihood Estimation
- EM- algorithm
-Montecarlo Simulation
RESULTADOS E CONCLUSÕES
In this paper we deal with a SNI-LMM, with the skew-normal LMM (Arellano--Valle, et al.,
2005) as special case. A closed form expression is obtained for the likelihood function of the observed
data which can be maximized by using existing statistical software. An EM--type algorithm is
developed by exploring statistical properties of the SNI class considered. The observed information
matrix is derived analytically which allows direct implementation of inference on this class of models.
For the Cholesterol Framingham data, we show that the skew-t and the skew--contaminated normal
distribution gives a better fit.
REFERÊNCIAS BIBLIOGRÁFICAS
Arellano-Valle R. B., Bolfarine, H. and Lachos, V. H. (2005).Skew-normal linear mixed
models. Journal of Data Science, 3 415-438.
Azzalini, A., Capitanio, A. (2003). Distributions generated by perturbation of symmetry with
emphasis on the multivariate skew-t distribution}. Journal of the Royal Statistical Society,
Series B, 65, 367-389.
Azzalini, A. and Dalla-Valle, A. (1996). The multivariate skew-normal distribution.
Biometrika, 83, 715-726.
Branco, M. and Dey, D. (2001). A general class of multivariate skew-elliptical distribution.
Journal of Multivariate Analysis, 79, 93-113.
Butler, S. M. and Louis, T. A. (1992). Random effects models with non-parametric priors.
Statistics in Medicine, 11, 1981–-2000.
Lachos, V. H., Bolfarine, H., Arellano-Valle, R. B. and Montenegro, L. C. (2007a).
Likelihood based inference for multivariate skew-normal regression models. Communications
in Statistics: Theory and Methods, 36, 1769-1786.
Lachos, V. H., Labra, F. V., and Ghosh, P. (2007b). Multivariate skew-normal/ independent
distributions: properties and inference. Submitted to Scandinavian Journal of Statistics.
Lange, K., and Sinsheimer, J. S. (1993). Normal/independent distributions and their
applications in robust regression . Journal of Computational and Graphical Statistics, 2,
175-198.
Zhang, D., Davidian, M. (2001). Linear mixed models with
flexible distributions of random effects for longitudinal data. Biometrics, 57, 795-802.
Download