EM algorithm 1/15 A Gaussian mixture model I Consider a random variable Y generated by a mixture of two component mixture normal distribution. That is Y = (1 − ∆)Z1 + ∆Z2 , where Z1 ∼ N(µ1 , σ12 ) and Z2 ∼ N(µ2 , σ22 ), Z1 and Z2 are independent and P(∆ = 1) = π. I Suppose we observe n independent and identically distributed sample y1 , · · · , yn . I The question of interest is to estimate π, µ1 , µ2 , σ12 and σ22 . 2/15 Data plot 0.15 0.00 0.05 0.10 Density 0.20 0.25 0.30 Histogram of Y -2 0 2 4 6 8 Y Figure: Histogram of data and the density plot of the mixture normal. True parameter values used in the this data set: µ1 = 0.5, µ2 = 4, σ12 = 0.8, σ22 = 1.2, π = 0.5. 3/15 Maximum likelihood estimators Let φ(y; µi , σi2 ) (i = 1, 2) be the density of the normal distribution with mean µi and σi2 . The density of Y is then fY (y ) = (1 − π)φ(y ; µ1 , σ12 ) + πφ(y ; µ2 , σ22 ). Then the log-likelihood for θ = (π, µ1 , σ12 , µ2 , σ22 )T is `(θ) = n X log[(1 − π)φ(yi ; µ1 , σ12 ) + πφ(yi ; µ2 , σ22 )]. i=1 4/15 Maximum likelihood estimators The direct maximization of the likelihood function `(θ) is difficult, since it is a non-linear function of θ. Also because of the sum of terms inside of logarithm. However, if we assume that we can observe the latent variable ∆. Then the joint density of Y , ∆ (called complete data) is f (y, ∆) = {φ(y ; µ1 , σ12 )1−∆ φ(y; µ2 , σ22 )∆ }{π ∆ (1 − π)1−∆ }. 5/15 Maximum likelihood estimators Then the corresponding likelihood function based on the complete data is n X `(θ) = [(1 − ∆i ) log{φ(yi ; µ1 , σ12 )} + ∆i log{φ(yi ; µ2 , σ22 )}] i=1 + n X [(1 − ∆i ) log{(1 − π)} + ∆i log{π}]. i=1 The above likelihood function is very easy to be maximized. We can even have closed form solutions. The problem is that, the latent variables ∆i ’s are unobservable. 6/15 Maximum likelihood estimators If ∆i ’s are known, it is easy to show that the MLEs for µ1 , µ2 , σ12 and σ22 are Pn Pn ∆i yi i=1 (1 − ∆i )yi µ̂1 = Pn , µ̂2 = Pi=1 , n i=1 (1 − ∆i ) i=1 ∆i and σ̂12 Pn = i=1 (1 − ∆i )(yi − Pn i=1 (1 − ∆i ) µ̂1 )2 , σ̂22 Pn = i=1 ∆i (yi − Pn i=1 ∆i µ̂2 )2 . Also, the MLE for π is n π̂ = 1X ∆i . n i=1 7/15 E-Step Since ∆i ’s are unknown, we proceed in an iterative fashion, and replacing the ∆i as their conditional expected values (k−1) r̂i = E(∆i |yi , θ̂(k −1) ) in the likelihood function `(θ). Here θ̂(k−1) is the parameter estimation value from the k − 1 step. After replacing the ∆i ’s by r̂i ’s, we denote the new likelihood function as `∗ (θ). This is the so called E-step. 8/15 E-Step By the definition of the mixture model, we can show that (k −1) ri = E(∆i |yi , θ̂(k −1) ) (k−1) = π̂ (k−1) φ(yi ; µ̂2 (k −1) (1 − π̂ (k−1) )φ(yi ; µ̂1 2(k−1) , σ̂1 2(k −1) , σ̂2 ) (k −1) ) + π̂ (k −1) φ(yi ; µ̂2 2(k−1) , σ̂2 ) 9/15 M-Step In the M-step, we maximizing the likelihood function `∗ (θ) with respect to θ. Since we can still find closed form MLEs, the M-step will update the parameters using the following Pn Pn (k −1) (k−1) (1 − r̂i )yi yi (k) (k ) i=1 r̂i µ̂1 = Pi=1 , µ̂ = Pn (k −1) , 2 (k −1) n ) i=1 (1 − r̂i i=1 r̂i and 2(k ) σ̂1 Pn = (k −1) − r̂i )(yi − µ̂k−1 )2 2(k ) 1 , σ̂2 = Pn (k −1) (1 − r̂ ) i=1 i i=1 (1 Also, the MLE for π is π̂ (k) = n−1 (k −1) (yi − i=1 r̂i Pn (k −1) i=1 r̂i Pn (k−1) . i=1 r̂i Pn µ̂2 )2 . Iterative E- and M-steps until the parameter estimation converges. 10/15 -1018 -1019 -1020 log-likelihood values -1017 Likelihood function as a function of iteration 0 20 40 60 Iterations Figure: EM-algorithm: log-likelihood as a function of the iteration number. 11/15 0.59 0.58 0.57 0.55 0.56 Estimation of Pi 0.60 0.61 Estimation of π as a function of iteration 0 20 40 60 Iterations Figure: EM-algorithm: estimation of π as a function of the iteration number. 12/15 Estimation of the unknown parameters using EM-algorithm Applying the EM-algorithm, the final estimation of parameters are µ1 = 0.429648, σ12 = 0.6601889 µ2 = 3.934748, σ22 = 1.737634 π = 0.5429003 13/15 EM algorithm for general missing data problems I Suppose that our observed data is z. The log-likelihood function for the observed data is `(θ; z) depending on some unknown parameters θ. I The latent or missing data is z m . In mixed models, the latent data is typically defined as the random effects. I The complete data is w = (z, z m ), with log-likelihood `0 (θ; w). The log-likelihood function for complete data is based on the complete density. I In the Gaussian mixture problem, w = (y , ∆). 14/15 EM algorithm Step 1: starting with some initial guesses for the parameters, say θ̂(0) . Step 2 (E-Step): at the j − 1 step (j = 1, 2, · · · ), compute Q(θ, θ̂(j−1) ) = E(`0 (θ; w)|z; θ̂(j−1) ) as a function of the argument θ. Step 3 (M-Step): determine the new estimate θ̂(j) as the maximizer of Q(θ, θ̂(j−1) ) over θ. Step 4: iterate Steps 2 and 3 until convergence. 15/15