EM algorithm 1/15

advertisement
EM algorithm
1/15
A Gaussian mixture model
I
Consider a random variable Y generated by a mixture of
two component mixture normal distribution. That is
Y = (1 − ∆)Z1 + ∆Z2 ,
where Z1 ∼ N(µ1 , σ12 ) and Z2 ∼ N(µ2 , σ22 ), Z1 and Z2 are
independent and P(∆ = 1) = π.
I
Suppose we observe n independent and identically
distributed sample y1 , · · · , yn .
I
The question of interest is to estimate π, µ1 , µ2 , σ12 and σ22 .
2/15
Data plot
0.15
0.00
0.05
0.10
Density
0.20
0.25
0.30
Histogram of Y
-2
0
2
4
6
8
Y
Figure: Histogram of data and the density plot of the mixture normal.
True parameter values used in the this data set: µ1 = 0.5, µ2 = 4,
σ12 = 0.8, σ22 = 1.2, π = 0.5.
3/15
Maximum likelihood estimators
Let φ(y; µi , σi2 ) (i = 1, 2) be the density of the normal
distribution with mean µi and σi2 . The density of Y is then
fY (y ) = (1 − π)φ(y ; µ1 , σ12 ) + πφ(y ; µ2 , σ22 ).
Then the log-likelihood for θ = (π, µ1 , σ12 , µ2 , σ22 )T is
`(θ) =
n
X
log[(1 − π)φ(yi ; µ1 , σ12 ) + πφ(yi ; µ2 , σ22 )].
i=1
4/15
Maximum likelihood estimators
The direct maximization of the likelihood function `(θ) is
difficult, since it is a non-linear function of θ. Also because of
the sum of terms inside of logarithm.
However, if we assume that we can observe the latent variable
∆. Then the joint density of Y , ∆ (called complete data) is
f (y, ∆) = {φ(y ; µ1 , σ12 )1−∆ φ(y; µ2 , σ22 )∆ }{π ∆ (1 − π)1−∆ }.
5/15
Maximum likelihood estimators
Then the corresponding likelihood function based on the
complete data is
n
X
`(θ) =
[(1 − ∆i ) log{φ(yi ; µ1 , σ12 )} + ∆i log{φ(yi ; µ2 , σ22 )}]
i=1
+
n
X
[(1 − ∆i ) log{(1 − π)} + ∆i log{π}].
i=1
The above likelihood function is very easy to be maximized. We
can even have closed form solutions. The problem is that, the
latent variables ∆i ’s are unobservable.
6/15
Maximum likelihood estimators
If ∆i ’s are known, it is easy to show that the MLEs for µ1 , µ2 , σ12
and σ22 are
Pn
Pn
∆i yi
i=1 (1 − ∆i )yi
µ̂1 = Pn
, µ̂2 = Pi=1
,
n
i=1 (1 − ∆i )
i=1 ∆i
and
σ̂12
Pn
=
i=1 (1 − ∆i )(yi −
Pn
i=1 (1 − ∆i )
µ̂1 )2
,
σ̂22
Pn
=
i=1 ∆i (yi −
Pn
i=1 ∆i
µ̂2 )2
.
Also, the MLE for π is
n
π̂ =
1X
∆i .
n
i=1
7/15
E-Step
Since ∆i ’s are unknown, we proceed in an iterative fashion, and
replacing the ∆i as their conditional expected values
(k−1)
r̂i
= E(∆i |yi , θ̂(k −1) )
in the likelihood function `(θ). Here θ̂(k−1) is the parameter
estimation value from the k − 1 step. After replacing the ∆i ’s by
r̂i ’s, we denote the new likelihood function as `∗ (θ). This is the
so called E-step.
8/15
E-Step
By the definition of the mixture model, we can show that
(k −1)
ri
= E(∆i |yi , θ̂(k −1) )
(k−1)
=
π̂ (k−1) φ(yi ; µ̂2
(k −1)
(1 − π̂ (k−1) )φ(yi ; µ̂1
2(k−1)
, σ̂1
2(k −1)
, σ̂2
)
(k −1)
) + π̂ (k −1) φ(yi ; µ̂2
2(k−1)
, σ̂2
)
9/15
M-Step
In the M-step, we maximizing the likelihood function `∗ (θ) with
respect to θ. Since we can still find closed form MLEs, the
M-step will update the parameters using the following
Pn
Pn (k −1)
(k−1)
(1 − r̂i
)yi
yi
(k)
(k )
i=1 r̂i
µ̂1 = Pi=1
,
µ̂
=
Pn (k −1) ,
2
(k −1)
n
)
i=1 (1 − r̂i
i=1 r̂i
and
2(k )
σ̂1
Pn
=
(k −1)
− r̂i
)(yi − µ̂k−1
)2
2(k )
1
, σ̂2 =
Pn
(k −1)
(1
−
r̂
)
i=1
i
i=1 (1
Also, the MLE for π is π̂ (k) = n−1
(k −1)
(yi −
i=1 r̂i
Pn (k −1)
i=1 r̂i
Pn
(k−1)
.
i=1 r̂i
Pn
µ̂2 )2
.
Iterative E- and
M-steps until the parameter estimation converges.
10/15
-1018
-1019
-1020
log-likelihood values
-1017
Likelihood function as a function of iteration
0
20
40
60
Iterations
Figure: EM-algorithm: log-likelihood as a function of the iteration
number.
11/15
0.59
0.58
0.57
0.55
0.56
Estimation of Pi
0.60
0.61
Estimation of π as a function of iteration
0
20
40
60
Iterations
Figure: EM-algorithm: estimation of π as a function of the iteration
number.
12/15
Estimation of the unknown parameters using
EM-algorithm
Applying the EM-algorithm, the final estimation of parameters
are
µ1 = 0.429648, σ12 = 0.6601889
µ2 = 3.934748, σ22 = 1.737634
π = 0.5429003
13/15
EM algorithm for general missing data problems
I
Suppose that our observed data is z. The log-likelihood
function for the observed data is `(θ; z) depending on
some unknown parameters θ.
I
The latent or missing data is z m . In mixed models, the
latent data is typically defined as the random effects.
I
The complete data is w = (z, z m ), with log-likelihood
`0 (θ; w). The log-likelihood function for complete data is
based on the complete density.
I
In the Gaussian mixture problem, w = (y , ∆).
14/15
EM algorithm
Step 1: starting with some initial guesses for the
parameters, say θ̂(0) .
Step 2 (E-Step): at the j − 1 step (j = 1, 2, · · · ), compute
Q(θ, θ̂(j−1) ) = E(`0 (θ; w)|z; θ̂(j−1) )
as a function of the argument θ.
Step 3 (M-Step): determine the new estimate θ̂(j) as the
maximizer of Q(θ, θ̂(j−1) ) over θ.
Step 4: iterate Steps 2 and 3 until convergence.
15/15
Download