advertisement

Mikulov, 19th July 2005 Work supported by {jochen.einbeck, john.hinde}@nuigalway.ie National University of Ireland, Galway Jochen Einbeck and John Hinde Random Effect Modelling with Mode Trees i=1 k=1 (NPML, Aitkin, 1996). • No distributional assumption about Z −→ Nonparametric maximum likelihood • Z ∼ N (0, σ 2) −→ Gaussian Quadrature (Hinde, 1982) Two well-known approaches: with mass points zk and masses πk . i=1 The marginal likelihood can be approximated by a finite mixture (Laird, 1978) ( K ) Z n n Y Y X f (yi|zk , β)πk L(β, g(z)) = f (yi|zi, β)g(zi) dzi ≈ µi ≡ E(yi|zi, β) = h(ηi) ≡ h(x0iβ + zi). Framework: Generalized linear model with random effect f (y|(zk , πk , σk )k=1,...,K ) = k=1 K X πk f (y|zk , σk2) where f (y|zk , σk2) is a normal density with mean zk and standard deviation σk . Mixture density: µi = E(yi|zi) = zi Random effect model without fixed term: • zi ∼ Z, where Z is left unspecified. • yi ∼ N (zi, σi2), i = 1, . . . , n Given: A set of observations y1, . . . , yn with A special case: Fitting finite Gaussian mixtures 15000 20000 25000 galaxy$velocity 30000 of components K. Aim: Estimate the mass points zk , the variances σk2, the masses πk , and the number 10000 Recession velocities (in km/s) of 82 galaxies. Example: Galaxy Data ∂` − λ( πk − 1) = 0, ∂πk P k=1 ∂` = 0, ∂σk M-Step Update parameter estimates. E-Step Adjust weights given current parameter estimates. Starting points Select starting values zk0, πk0, and σk0, k = 1, . . . , K. =⇒ can be solved by standard EM algorithm: which turn out to be weighted versions of the single-distribution score equations. ∂` = 0, ∂zk and calculate the score equations i=1 For fixed K, consider the log-likelihood ( K ) n X X `= log πk f (yi|zk , σk2) , NPML Estimation 16.13 9.71 22.78 MASS3 0.024 0.512 0.043 -2 log L: 0.423 380.9 1.721 Standard deviations: 0.085 Mixture proportions: MASS2 MASS1 Coefficients: Set e.g. K=5: 0.626 0.342 19.72 MASS4 0.922 0.037 33.04 MASS5 0 20 EM Trajectories: mass points Application on galaxy data 35 30 25 20 15 10 60 EM iterations 40 80 1 n ”one of the things you do not know is the number of things that you do not know” • There does not exist any automatic routine to select K. Richardson & Green, 1997: • The EM trajectories behave quite erratically in the first cycles, and tend to cross. Finding the optimal solution requires a tedious grid search for tol. P where tol: scaling parameter, gk : Gauss-Hermite mass points, σ̂ = (yi − ȳ)2. zk0 = ȳ + tol ∗ σ̂ ∗ gk • Results depend heavily on the choice of starting points zk0, usually defined as 1996). • The algorithm is simple, converges in every case, and is ”impressively stable” (Aitkin, Properties of current NPML implementations (as in GLIM 4) 20 25 velocity/1000 15 30 35 1 nh i=1 10 20 25 velocity/1000 15 30 35 est. mixture components ´ K Yih−y . ³ bandwidth h. number of components of a Gaussian mixture. But: Number of modes depends on Carreira-Perpiñan & Williams, 2003: The number of modes is a lower bound for the 10 estimated density Idea: Consider density estimate fˆ(y, h) = f n P pi_k*f_k Exploiting the multimodal structure 0.10 0.00 0.20 0.10 0.00 20 25 yi (velocity/1000) 15 30 35 . .. saturated model (K = 82) ←− random effect models (1 < K < 82) .. ←− fixed effect model (K = 1) ”zoom into the random effect distribution” More generally, applied on the ’residuals’ h−1(yi) − x0iβ̂ of a GLM: 10 The mode tree (Minnotte & Scott, 1993) h 5 1 0.5 0.1 10 20 velocity/1000 15 Examples for bandwidth selection h 5 1 0.5 0.1 25 30 35 AIC (22 modes) Silverman (3 modes) BCV (1 mode) Bandwidth selectors: is reached. (Silverman, 81) hcrit = inf{h, fˆ(·, h) has at most k modes} • From that bandwidth, climb down the mode tree until the next critical bandwidth where A = min{σ̂, IQR/1.34} hopt = 0.9An−1/2, • Calculate Silverman’s optimal bandwidth Bandwidth selection in 2 steps dj = 1 − (1 − tol)j , (0 < tol ≤ 1) mediately. as otherwise the EM trajectories might be kicked out off the optimal solution im- EM algorithm via In the latter case, one has to damp the standard deviation in the initial loops of the directly as mass point estimates ẑk , or as starting points zk0 for the EM algorithm. • a very accurate estimate for the location of the mass points, which can be either used • an estimate for the number of modes, and thus for the number of components Using hcrit, the mode tree gives −15 −5 0 5 10 15 mass points Mode tree, critical bandwidth, and EM trajectories: Application on galaxy data h 5 0.5 0.1 30 20 10 0 20 30 EM iterations 10 40 −15 −15 −5 −5 0 0 Climbing down the tree h h 5 0.5 0.1 5 0.5 0.1 5 5 10 10 15 15 mass points mass points 30 20 10 30 20 10 0 0 5 15 20 100 EM iterations 50 30 150 25 EM iterations 10 (2) (1) (3) NPML NPML GQ ? Estimation Method ? (4) Mode Trees = zk0 ? Density Modes = zk0 via EM algorithm ? Gauss-Hermite Mass Points zk Starting points zk0 1 2 3 −2logL 4 Simulation study: 100 samples from fitted mixture distribution (σ = σk = const). 600 550 500 450 250 150 50 0 1 2 3 4 EM iterations tivity to tuning parameters (in particular tol) is drastically reduced. • When using mode trees together with the proposed damping procedure, the sensi- so accurate that one hardly needs EM at all! • Given K, mode trees give a set of estimated mass points, which are in many cases variations in the bandwidth may change drastically the number of detected modes. tion for the choice of K. However, this is no reliable automatic routine, as small • Mode trees together with a suitable bandwidth selector give a useful recommenda- even if the multimodal structure cannot be seen in the data cloud itself. • The mode tree is a useful instrument to assess visually the number of components, Summary www.nuigalway.ie/maths/je/npml.html ..... is being implemented in an R package {npml} (Einbeck, Darnell, & Hinde), see • Variance component models • Random coefficient models • Include explanatory variables • Set an appropriate link function • Replace Gaussian by another exponential family distibution Everything more general.... SILVERMAN. (1981): Using kernel density estimates to investigate multimodal regression. JRSSB 43, 97–99. Components (with discussion) JRSSB 59, 731–792. RICHARDSON, S. and GREEN, P. (1997): On Bayesian Analysis of Mixtures with an Unknown Number of density features. JCGS 2, 51-68. MINNOTTE, M. C. and SCOTT, D. W. (1993): The mode tree: A tool for visualization of nonparametric LAIRD, N. M. (1978): Nonparametric maximum likelhood estimation of a mixing distribution. JASA, 73, 805–811. mixture. Lecture Notes in Computer Science 2695, 625–640. CARREIRA-PERPIÑAN, M. A. and WILLIAMS, C.K.I. (2003): On the number of modes of a Gaussian HINDE, J. (1982): Compound Poisson regression models. Lecture Notes in Statistics 14 ,109-121. and Computing 6, 251–262. AITKIN, M. (1996): A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics References