EM algorithm for mixed effects model 1/15 A linear mixed effects model I Consider the following linear mixed effects model Yij = β0 + ui + xijT β + εij , iid iid where εij ∼ N(0, σe2 ) and ui ∼ N(0, σu2 ), εij and ui are independent. I The unknown parameters include δ = (β0 , β, σe2 , σu2 ). 2/15 The joint likelihood The joint likelihood for δ is L(δ) = m Y f (yi ) = i=1 = = f (yi |ui )f (ui )dui j=1 ni m Z Y Y i=1 f (yi , ui )dui i=1 ni m Z Y Y i=1 m Z Y √ j=1 ×√ 1 2πσu 1 2πσe exp exp n − n − (yij − β0 − ui − xijT β)2 o 2σe2 ui2 o dui 2σu2 3/15 Maximum likelihood estimators The joint likelihood for δ then can be written as ni m n o Y p ai b2 1 X L(δ) = ci 2πai exp − 2 (yij − β0 − xijT β)2 exp( i ) 2 2σe j=1 i=1 where 1 n i 1 √ ci = √ 2πσe 2πσu n 1 ai−1 = 2i + 2 σe σu n i 1 X (yij − β0 − xijT β) bi = 2 σe j=1 Thus, the maximum likelihood estimator of δ is δ̂ = arg max L(δ). δ 4/15 EM algorithm For the above linear mixed effects model, I Complete data: (yi , ui ); I Observed data: yi . The log-likelihood for complete data is m m 1 X ni ) log(2πσe2 ) − log(σu2 ) `(δ; y , u) = − ( 2 2 i=1 m ni m 1 XX 1 X 2 T 2 − 2 (yij − β0 − ui − xij β) − 2 u . 2σe i=1 j=1 2σu i=1 i 5/15 EM algorithm 2(0) Step 1: Starting with some initial values for σu 2(0) , σe (0) , β0 and β (0) . Step 2: (E-step) Evaluate the expectation of the log-likelihood for complete data given the observed data and estimate in the last iteration. Namely, Q(δ, δ (k−1) ) = E[`(δ; y, u)|y ; δ (k−1) ]. Step 3: (M-step) Update δ by δ (k) = arg max Q(δ, δ (k−1) ). δ Step 4: Repeat Step 2 and 3 until convergence. 6/15 M-step when σu2 and σe2 are known If σu2 and σe2 are known, let δ = (β0 , β)T , then M-step include solving the following estimating equations ni m X hX i ∂Q(δ, δ (k−1) ) 1 =E (yij − β0 − ui − xijT β)|y ; δ (k−1) . ∂δ xij i=1 j=1 Therefore, we can update the parameters (β0 , β)T by (k) β0 β (k) = ni ni m X m X hX i−1 X 1 1 (1, xijT ) {yij −E(ui |y; δ (k−1) )}. xij xij i=1 j=1 i=1 j=1 7/15 M-step Because we know the joint distribution of ui and yi is normal, specifically ui 0 σu2 σu2 1T ∼ N , , yi β0 1 + Xi β σu2 1 σe2 Ini + σu2 11T where Xi = (Xi1 , · · · , Xini )T is an ni × p matrix, it can be shown that (k−1) E(ui |y; δ (k−1) ) = σu2 1T (σe2 Ini +σu2 11T )−1 (yi −β0 1−Xi β (k−1) ). 8/15 EM algorithm when σu2 and σe2 are known (0) Step 1: Starting with some initial values for β0 and β (0) . Step 2: At the k -th iteration, update β0 , β by (k) h X ni m X i−1 β0 1 T = (1, x ) ij xij β (k) i=1 j=1 ni m X X 1 × {yij − E(ui |y; δ (k −1) )}. xij i=1 j=1 where (k−1) E(ui |y ; δ (k−1) ) = σu2 1T (σe2 Ini +σu2 11T )−1 (yi −β0 1−Xi β (k−1) ). Step 3: Repeat Step 2 until convergence. 9/15 Logistic regression model with random intercept Consider the logistic regression model with random intercept as following: Yij |Ui independent ∼ log pij 1 − pij Bernoulli(pij ); = γi + xijT β, iid where γi = β0 + Ui and Ui ∼ N(0, σu2 ) for i = 1, · · · , m and j = 1, · · · , ni . 10/15 Likelihood function and MLE The likelihood function for δ = (β0 , β T , σu2 )T is L(δ) = ni m Z Y Y i=1 ×√ exp(β0 + Ui + xijT β)yij (1 + exp(β0 + Ui + xijT β))yij −1 j=1 1 2πσu exp(− ui2 )dui . 2σu2 Therefore, the maximum likelihood estimator for δ is δ̂ = arg max L(δ). For many cases, the likelihood function L(δ) does not have a closed form. Numerical methods are required to obtain the MLE. 11/15 EM algorithm when σu2 is known (0) Step 1: Starting with some initial values for β0 and β (0) . Step 2: At the k -th iteration, update β0 , β by solving the following estimating equations ni m X X 1 {yij − E(pij (Ui )|y; δ (k −1) )} = 0 xij i=1 j=1 where pij (ui ) = exp(β0 + ui + xijT β) 1 + exp(β0 + ui + xijT β) . Step 3: Repeat Step 2 until convergence. 12/15 EM algorithm By the definition of pij (Ui ), in the above algorithm, we need to evaluate the following E(pij (Ui )|yi ; δ (k−1) Z )= exp(β0 + ui + xijT β) 1 + exp(β0 + ui + xijT β) f (ui |yi ; δ (k −1) ). In practice, the above intergration could be evaluated by Monte Carlo simulation. But it might be still challenging. An alternative approach is to replace E(pij (Ui )|yi ; δ (k−1) ) by pij (Ûi ) where Ûi is the BLUP of Ui . This would avoid the use of integration and results in an algorithm that is similar to IRWLS method. 13/15 EM algorithm I Let Zij be the surrogate response and Zi = (Zi1 , · · · , Zini )T where Zij = log( I pij 1 ) + (yij − pij ) . 1 − pij pij (1 − pij ) Let Vi = Qi + σu2 11T where Qi = Diag( p 1 ij (1−pij ) ). 14/15 EM algorithm (0) 2(0) Step 1: Starting with some initial values for β0 , β (0) and σu . Step 2: At the k -th iteration, update β0 , β by (k) m m X X β0 −1(k−1) T −1(k−1) −1 =( Xi Vi Xi ) XiT Vi Zi (k) β i=1 i=1 2(k−1) T −1 1 Vi (Zi − Xi β (k−1) ) and ni m m X X X 1 −2(k−2) 2 1 Ûi + ( pij (1 − pij )) + σ̂u . = m m where Ûi = σu 2(k−1) σu i=1 i=1 j=1 Step 3: Repeat Step 2 until convergence. 15/15