rees T de Mo

```Mikulov, 19th July 2005
Work supported by
{jochen.einbeck, john.hinde}@nuigalway.ie
National University of Ireland, Galway
Jochen Einbeck and John Hinde
Random Effect Modelling with Mode Trees
i=1
k=1
(NPML, Aitkin, 1996).
• No distributional assumption about Z −→ Nonparametric maximum likelihood
• Z ∼ N (0, σ 2) −→ Gaussian Quadrature (Hinde, 1982)
Two well-known approaches:
with mass points zk and masses πk .
i=1
The marginal likelihood can be approximated by a finite mixture (Laird, 1978)
( K
)
Z
n
n
Y
Y X
f (yi|zk , β)πk
L(β, g(z)) =
f (yi|zi, β)g(zi) dzi ≈
&micro;i ≡ E(yi|zi, β) = h(ηi) ≡ h(x0iβ + zi).
Framework: Generalized linear model with random effect
f (y|(zk , πk , σk )k=1,...,K ) =
k=1
K
X
πk f (y|zk , σk2)
where f (y|zk , σk2) is a normal density with mean zk and standard deviation σk .
Mixture density:
&micro;i = E(yi|zi) = zi
Random effect model without fixed term:
• zi ∼ Z, where Z is left unspecified.
• yi ∼ N (zi, σi2), i = 1, . . . , n
Given: A set of observations y1, . . . , yn with
A special case: Fitting finite Gaussian mixtures
15000
20000
25000
galaxy\$velocity
30000
of components K.
Aim: Estimate the mass points zk , the variances σk2, the masses πk , and the number
10000
Recession velocities (in km/s) of 82 galaxies.
Example: Galaxy Data
∂` − λ( πk − 1)
= 0,
∂πk
P
k=1
∂`
= 0,
∂σk
M-Step Update parameter estimates.
E-Step Adjust weights given current parameter estimates.
Starting points Select starting values zk0, πk0, and σk0, k = 1, . . . , K.
=⇒ can be solved by standard EM algorithm:
which turn out to be weighted versions of the single-distribution score equations.
∂`
= 0,
∂zk
and calculate the score equations
i=1
For fixed K, consider the log-likelihood
( K
)
n
X
X
`=
log
πk f (yi|zk , σk2) ,
NPML Estimation
16.13
9.71
22.78
MASS3
0.024
0.512
0.043
-2 log L:
0.423
380.9
1.721
Standard deviations:
0.085
Mixture proportions:
MASS2
MASS1
Coefficients:
Set e.g. K=5:
0.626
0.342
19.72
MASS4
0.922
0.037
33.04
MASS5
0
20
EM Trajectories:
mass points
Application on galaxy data
35
30
25
20
15
10
60
EM iterations
40
80
1
n
”one of the things you do not know is the number of things that you do not know”
• There does not exist any automatic routine to select K. Richardson &amp; Green, 1997:
• The EM trajectories behave quite erratically in the first cycles, and tend to cross.
Finding the optimal solution requires a tedious grid search for tol.
P
where tol: scaling parameter, gk : Gauss-Hermite mass points, σ̂ =
(yi − ȳ)2.
zk0 = ȳ + tol ∗ σ̂ ∗ gk
• Results depend heavily on the choice of starting points zk0, usually defined as
1996).
• The algorithm is simple, converges in every case, and is ”impressively stable” (Aitkin,
Properties of current NPML implementations (as in GLIM 4)
20
25
velocity/1000
15
30
35
1
nh
i=1
10
20
25
velocity/1000
15
30
35
est. mixture components
&acute;
K Yih−y .
&sup3;
bandwidth h.
number of components of a Gaussian mixture. But: Number of modes depends on
Carreira-Perpiñan &amp; Williams, 2003: The number of modes is a lower bound for the
10
estimated density
Idea: Consider density estimate fˆ(y, h) =
f
n
P
pi_k*f_k
Exploiting the multimodal structure
0.10
0.00
0.20
0.10
0.00
20
25
yi (velocity/1000)
15
30
35
.
..
saturated model (K = 82)
←− random effect models (1 &lt; K &lt; 82)
..
←− fixed effect model (K = 1)
”zoom into the random effect distribution”
More generally, applied on the ’residuals’ h−1(yi) − x0iβ̂ of a GLM:
10
The mode tree (Minnotte &amp; Scott, 1993)
h
5
1
0.5
0.1
10
20
velocity/1000
15
Examples for bandwidth selection
h
5
1
0.5
0.1
25
30
35
AIC (22 modes)
Silverman (3 modes)
BCV (1 mode)
Bandwidth selectors:
is reached.
(Silverman, 81)
hcrit = inf{h, fˆ(&middot;, h) has at most k modes}
• From that bandwidth, climb down the mode tree until the next critical bandwidth
where A = min{σ̂, IQR/1.34}
hopt = 0.9An−1/2,
• Calculate Silverman’s optimal bandwidth
Bandwidth selection in 2 steps
dj = 1 − (1 − tol)j ,
(0 &lt; tol ≤ 1)
mediately.
as otherwise the EM trajectories might be kicked out off the optimal solution im-
EM algorithm via
In the latter case, one has to damp the standard deviation in the initial loops of the
directly as mass point estimates ẑk , or as starting points zk0 for the EM algorithm.
• a very accurate estimate for the location of the mass points, which can be either used
• an estimate for the number of modes, and thus for the number of components
Using hcrit, the mode tree gives
−15
−5
0
5
10 15
mass points
Mode tree, critical bandwidth, and EM trajectories:
Application on galaxy data
h
5
0.5
0.1
30
20
10
0
20
30
EM iterations
10
40
−15
−15
−5
−5
0
0
Climbing down the tree
h
h
5
0.5
0.1
5
0.5
0.1
5
5
10
10
15
15
mass points
mass points
30
20
10
30
20
10
0
0
5
15
20
100
EM iterations
50
30
150
25
EM iterations
10
(2)
(1)
(3)
NPML
NPML
GQ
?
Estimation Method
?
(4)
Mode Trees
= zk0
?
Density Modes
= zk0 via EM algorithm
?
Gauss-Hermite
Mass Points zk
Starting points zk0
1
2
3
−2logL
4
Simulation study: 100 samples from fitted mixture distribution (σ = σk = const).
600
550
500
450
250
150
50
0
1
2
3
4
EM iterations
tivity to tuning parameters (in particular tol) is drastically reduced.
• When using mode trees together with the proposed damping procedure, the sensi-
so accurate that one hardly needs EM at all!
• Given K, mode trees give a set of estimated mass points, which are in many cases
variations in the bandwidth may change drastically the number of detected modes.
tion for the choice of K. However, this is no reliable automatic routine, as small
• Mode trees together with a suitable bandwidth selector give a useful recommenda-
even if the multimodal structure cannot be seen in the data cloud itself.
• The mode tree is a useful instrument to assess visually the number of components,
Summary
www.nuigalway.ie/maths/je/npml.html
..... is being implemented in an R package {npml} (Einbeck, Darnell, &amp; Hinde), see
• Variance component models
• Random coefficient models
• Include explanatory variables
• Set an appropriate link function
• Replace Gaussian by another exponential family distibution
Everything more general....
SILVERMAN. (1981): Using kernel density estimates to investigate multimodal regression. JRSSB 43, 97–99.
Components (with discussion) JRSSB 59, 731–792.
RICHARDSON, S. and GREEN, P. (1997): On Bayesian Analysis of Mixtures with an Unknown Number of
density features. JCGS 2, 51-68.
MINNOTTE, M. C. and SCOTT, D. W. (1993): The mode tree: A tool for visualization of nonparametric
LAIRD, N. M. (1978): Nonparametric maximum likelhood estimation of a mixing distribution. JASA, 73, 805–811.
mixture. Lecture Notes in Computer Science 2695, 625–640.
CARREIRA-PERPIÑAN, M. A. and WILLIAMS, C.K.I. (2003): On the number of modes of a Gaussian
HINDE, J. (1982): Compound Poisson regression models. Lecture Notes in Statistics 14 ,109-121.
and Computing 6, 251–262.
AITKIN, M. (1996): A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics
References
```