   

advertisement
Stat 543 EM Algorithm Example
Suppose that n iid trials each produce one of k possible outcomes. If
p j  the probability that any trial produces outcome j
and
X ij  I  trial i produces outcome j 
and
n
n j   X ij  the number of trials producing outcome j
i 1
a joint pmf for all nk of the variables X ij is
 k 1 n
f x | p   p    p j j
j 1
 j 1
k
nj
j
  k 1 
 1   p j 
j 1


nk
A loglikelihood is
k
lX  p    n j ln p j
j 1
n
n n
and it is possible to argue that an MLE of p is  1 , 2 , , k
n
n n

.

For sake of example, suppose that what is observed in n  10 trials in a problem where k  4 is
as below
trial 1 outcome is 1
X 11  1
trial 2 outcome is 3
X 23  1
trial 3 outcome is 2 or 4
X 32  X 34  1
trial 4 outcome is 2
X 42  1
trial 5 outcome is 3
X 53  1
trial 6 outcome is 2 or 3
trial 7 outcome is 1
X 71  1
trial 8 outcome is 1 or 2
trial 9 outcome is 2
trial 10 outcome is 4
X 62  X 63  1
X 81  X 82  1
X 92  1
X 104  1
1
The likelihood function based on this information is
LY  p   p12 p22 p32 1  p1  p2  p3 1  p1  p3  p2  p3  p1  p2 
 p12 p22 p32 p4  p2  p4  p2  p3  p1  p2 
and one might potentially simply try to optimize this directly. Another possibility is consider an
EM algorithm.
If I knew all X ij I would have the log-likelihood
l X  p   n1 ln p1  n2 ln p2  n3 ln p3  n4 ln 1  p1  p2  p3 
that I know how to optimize. I don't have all the n j ' s . But I do know that
n1  2  X 81
n2  2  X 32  X 62  X 82
n3  2  X 63
n4  1  X 34
and for a particular p 0   p01 , p02 , p03 , p04  conditioned on the data in hand (call them Y  y )
E p0  X 81 | Y  y  
p01
p01  p02
E p0  X 32 | Y  y  
p02
p02  p04
E p0  X 62 | Y  y  
p02
p02  p03
E p0  X 82 | Y  y  
p02
p01  p02
E p0  X 63 | Y  y  
p03
p02  p03
E p0  X 34 | Y  y  
p04
p02  p04
So E p0 lX  p  | Y  y  is easy enough to compute. For
2
a  p0   2 
p01
p01  p02


1
1
1
b  p 0   2  p02 



 p01  p02 p02  p03 p02  p04 
p03
c  p0   2 
p02  p03
d  p0   1 
p04
p02  p04
note that a  p 0   b  p 0   c  p 0   d  p 0   10  n and then that
E p0 lX  p  | Y  y   a  p 0  ln p1  b  p 0  ln p2  c  p 0  ln p3  d  p 0  ln p4
This is maximized at
 a  p0  b  p0  c  p0  d  p0  
,
,
,
p*  p 0   

10
10
10 
 10
So, with a starting value p 0 , define iterates by
 
p l 1  p* p  l 
and hope to iterate to a fixed point optimizing the loglikelihood.
3
Download