Expectation Maximization - Introduction to EM algorithm

advertisement
Expectation Maximization
Introduction to EM algorithm
TLT-5906 Advanced Course in Digital Transmission
Jukka Talvitie, M.Sc. (eng)
jukka.talvitie@tut.fi
Department of Communication Engineering
Tampere University of Technology
M.Sc. Jukka Talvitie
5.12.2013
Outline
q Expectation Maximization (EM) algorithm
– Motivation, background
– Where the EM can be used?
q EM principle
– Formal definition
– How the algorithm really works?
– Coin toss example
– About some practical issues
q More advanced examples
– Line fitting with EM algorithm
– Parameter estimation of multivariate Gaussian mixture
q Conclusions
M.Sc. Jukka Talvitie
5.12.2013
Motivation
q Consider classical line fitting problem:
– Assume below measurements of a linear model y=ax+b+n (here
the line parameters are a and b and n is zero mean noise)
2.2
Measurements
2.1
2
1.9
1.8
1.7
1.6
1.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
M.Sc. Jukka Talvitie
0.9
1
5.12.2013
Motivation
q We use LS (Least Squares) to find the best fit:
q Is this the best solution?
2.3
Measurements
LS
2.2
2.1
2
1.9
1.8
1.7
1.6
1.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
M.Sc. Jukka Talvitie
0.9
1
5.12.2013
Motivation
q LS would be the Best Linear Unbiased Estimator, if the noise
would be uncorrelated with fixed variance
q Here, actually the noise term is correlated and the actual linear
model of this realization can be seen below as the black line
– Here the LS gives too much weight for a group of samples in the
middle
2.3
Measurements
LS
Correct line
2.2
2.1
2
1.9
1.8
1.7
1.6
1.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
M.Sc. Jukka Talvitie
1
5.12.2013
Motivation
q Taking the correlation of the noise term into account, we can use Generalized LS
method and the result can be improved considerably
q However, in many cases we do not know the correlation model
– It is hided in the observations and we cannot access it directly
– Therefore, e.g. here we would need to estimate simultaneously the
covariance and the line parameters
q This sort of problems might quite quickly become
2.3
very complicated
Measurements
LS
– How to estimate the covariance without
2.2
Correct line
knowing the line parameters and vice versa? 2.1
Generalized LS
q Intuitive (heuristic) solution:
2
– Iteratively estimate the other parameter, and
1.9
then the other, and continue…
1.8
– No guarantee for the performance in this
case (e.g. compared to maximum likelihood
1.7
(ML) solution)
q The EM algorithm provides the ML solution for
these sort of problems
M.Sc. Jukka Talvitie
1.6
1.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
5.12.2013
1
Expectation Maximization
Algorithm
q Presented by Dempster, Laird and Rubin in [1] in 1977
– Basically the same principle was already proposed earlier by
some other authors in specific circumstances
q EM algorithm is an iterative estimation algorithm that can derive
the maximum likelihood (ML) estimates in the presence of
missing/hidden data (“incomplete data”)
– e.g. the classical case is the Gaussian mixture, where we have
a set of unknown Gaussian distributions (see example later on)
Many-to-one mapping [2]
X: underlying space
x: complete data (required for ML)
Y: observation space
y: observation
x is observed only by means of y(x).
X(y) is a subset of X determined by y.
M.Sc. Jukka Talvitie
5.12.2013
Expectation Maximization
Algorithm
q The basic functioning of the EM algorithm can be divided into two
steps (the parameter to be estimated is θ):
– Expectation step (E-step)
• Take the expected value of the complete data given the
observation and the current parameter estimate qˆ
{
Q(q , qˆk ) = E log f (x | q ) | y, qˆk
}
k
– Maximization step (M-step)
• Maximize the Q-function in the E-step (basically, the data of
the E-step is used as it were measured observations)
qˆk +1 = arg max Q (q | qˆk )
q
q The likelihood of the parameter is increased at every iteration
– EM converges towards some local maximum of the likelihood
function
M.Sc. Jukka Talvitie
5.12.2013
An example: ML estimation vs. EM
algorithm [3]
q We wish to estimate the variance of S:
– observation Y=S+N
• S and N are normally distributed with zero means and
variances θ and 1, respectively
– Now, Y is also normally distributed (zero mean with variance θ+1)
q ML estimate can be easily derived:
qˆML = arg max( p( y | q ))
M
q
= max{0, y 2 - 1}
q The zero in above result becomes from the fact that we know that
the variance is always non-negative
M.Sc. Jukka Talvitie
5.12.2013
An example: ML estimation vs. EM
algorithm
q The same with the EM algorithm
– complete data is now included in S and N
– E-step is then:
Q(q , qˆk ) = E éëln p( s, n | q ) | y, qˆk ùû
– the logarithmic probability distribution for the complete data is
then
ln p ( s, n | q ) = ln p( n) + ln( p( s | q ))
1
S2
= C - ln q 2
2q
®
(C contains all the terms
independent of θ)
2
E
S
[
| Y , qˆk ]
1
ˆ
Q(q , q k ) = C - ln q 2
2q
M.Sc. Jukka Talvitie
5.12.2013
An example: ML estimation vs. EM
algorithm
q M-step:
– maximize the E-step
– We set the derivative to zero and get (use results from math
tables: conditional means and variances, “Law of total variance”)
qˆk +1 = E éë S 2 | Y , qˆk ùû = E 2 éë S | Y , qˆk ùû + var éë S | Y , qˆk ùû
2
æ qˆk
ö
qˆk
Y ÷÷ +
= çç
ˆ
ˆ
è qk + 1 ø qk + 1
q At the steady state (qˆk +1 = qˆk ) we get the same value for the
estimate as in ML estimation (max{0,y2-1})
q What about the convergence? What if we choose the initial value
qˆ = 0
0
M.Sc. Jukka Talvitie
5.12.2013
An example: ML estimation vs. EM
algorithm
q In the previous example, the ML estimate could be solved in a
closed form expression
– In this case there was no need for EM algorithm, since the ML
estimate is given in a straightforward manner (we just showed
that the EM algorithm converges to the peak of the likelihood
function)
q Next we consider a coin toss example:
– The target is to figure out the probability of heads for two coins
– ML estimate can be directly calculated from the results
q We will raise the bets a little bit higher and assume that we don’t
even know which one of the coins is used for the sample set?
– i.e. we are estimating the coin probabilities without knowing
which one of the coins is being tossed
M.Sc. Jukka Talvitie
5.12.2013
An example: Coin toss) [4]
Maximum likelihood
q We have two coins: A and B
q The probabilities for heads are q A
and q B
q We have 5 measurement sets
including 10 coin tosses in each set
q If we know which of the coins are
tossed in each set, we can
calculate the ML probabilities for q A
and q B
q If we don’t know which of the coins
are tossed in each set, ML
estimates cannot be calculated
directly
→ EM algorithm
Binomial distribution
used to calculate
probabilities:
ænö k
n -k
ç k ÷ p (1 - p ) H
è ø
Coin A
Coin B
HTTTHHTHTH
5H, 5T
HHHHTHHHHH
9H, 1T
HTHHHHHTHH
8H,2T
HTHTTTTHHTT
THHHTHHHTH
5 sets, 10 tosses per set
4H, 6T
?
?
?
?
?
HTTTHHTHTH
HHHHTHHHHH
HTHHHHHTHH
HTHTTTTHHTT
THHHTHHHTH
πˆA(0) < 0.6
πˆB(0) < 0.5
1. Initialization
πˆA <
24
< 0.80
24 ∗ 6
πˆB <
9
< 0.45
9 ∗ 11
Example calculations for the
first set (qˆA(0) = 0.6, qˆB(0) = 0.5)
7H,3T
æ10 ö
5
5
ç ÷ × 0.6 × 0.4 » 0.201
è5ø
th
i
w
37 æ10 ö
e”
5
5
liz
2.2
a
ç ÷ × 0.5 × 0.5 » 0.246
=
m
è5ø
or
6
N
1
”
4
0.2
1+
0
0.2
24H, 6T 9H, 11T
Expectation Maximization
2. E-step
ML method (if we
know the coins):
Coin A
Coin B
0.45 x
0.55 x
≈2.2H, 2.2T
≈2.8H, 2.8T
0.80 x
0.20 x
≈7.2H, 0.8T
≈1.8H, 0.2T
0.73 x
0.27 x
≈5.9H, 1.5T
≈2.1H, 0.5T
0.35 x
0.65 x
≈1.4H, 2.1T
≈2.6H, 3.9T
0.65 x
0.35 x
≈4.5H, 1.9T
≈2.5H, 1.1T
πˆA(1) <
πˆB(1) <
M.Sc. Jukka Talvitie
21.3
< 0.71
21.3 ∗ 8.6
11.7
< 0.58
11.7 ∗ 8.4
≈21.3H, 8.6T ≈11.7H, 8.4T
3. M-step
4.
πˆA(10) < 0.80
πˆB(10) < 0.52
5.12.2013
About some practical issues
q Although many examples in the literature are showing excellent
results using the EM algorithm, the reality is often less glamorous
– As the number of uncertain parameters increase in the modeled
system, even the best available guess (in ML sense) might not
be adequate
– NB! This is not the algorithm’s fault. It still provides the best
possible solution in ML sense
q Depending on the form of the likelihood function (provided in the
E-step) the convergence rate of the EM might vary considerably
q Notice, that the algorithm converges towards a local maximum
– To locate the global peak one must use different initial guesses
for the estimated parameters or use some other more advanced
methods to find out the global peak
– With multiple unknown (hidden/latent) parameters the number of
local peaks usually increases
M.Sc. Jukka Talvitie
5.12.2013
Further examples
q Line Fitting (showed only in the lecture)
q Parameter estimation of multivariate Gaussian mixture
– See additional pdf-file for the
• Problem definition
• Equations
– Definition of the log-likelihood function
– E-step
– M-step
– See additional Matlab m-file for the illustration of
• The example in numerical form
– Dimensions and value spaces for each parameter
• The iterative nature of the EM algorithm
– Study how parameters change at each iteration
• How initial guesses for the estimated parameters affect the final
result
M.Sc. Jukka Talvitie
5.12.2013
Conclusions
q EM finds iteratively ML estimates in estimation problems with hidden
(incomplete) data
– likelihood increases at every step of the iteration process
q Algorithm consists of two iteratively taken steps:
– Expectation step (E-step)
• Take the expected value of the complete data given the
observation and the current parameter estimate
– Maximization step (M-step)
• Maximize the Q-function in the E-step (basically, the data of the
E-step is used as it were measured observations)
q Algorithm converges to the local maximum
– Global maximum can be elsewhere
q See reference list for literature regarding use cases of EM algorithm in
the Communications
– These are the references [5]-[16] (not mentioned in the previous slides)
M.Sc. Jukka Talvitie
5.12.2013
References
1.
2.
3.
4.
Dempster, A.P.; Laird, N.M.; Rubin, D.B., “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal
of the Royal Statistical Society, Series B (Methodological), Vol. 39, No. 1., pp. 1-38, 1977.
Moon, T.K., “The Expectation Maximization Algorithm”, IEEE Signal Processing Magazine, vol. 13, pp. 47-60, Nov.
1996.
Chuong, B.D.; Serafim B., What is Expectation Maximization algorithm? [Online]. Not available anymore. Was
originally available on: courses.ece.illinois.edu/ece561/spring08/EM.pdf
The Expectation-Maximization Algorithm. [Online]. Not available anymore. Was originally available on:
ai.stanford.edu/ ~chuongdo/papers/em_tutorial.pdf
Some communications related papers using the EM algorithm (continues in the next slide):
5.
6.
7.
8.
9.
10.
11.
Borran, M.J.; Nasiri-Kenari, M., "An efficient detection technique for synchronous CDMA communication systems
based on the expectation maximization algorithm," Vehicular Technology, IEEE Transactions on , vol.49, no.5,
pp.1663,1668, Sep 2000
Cozzo, C.; Hughes, B.L., "The expectation-maximization algorithm for space-time communications," Information
Theory, 2000. Proceedings. IEEE International Symposium on , vol., no., pp.338,, 2000
Rad, K. R.; Nasiri-Kenari, M., "Iterative detection for V-BLAST MIMO communication systems based on expectation
maximisation algorithm," Electronics Letters , vol.40, no.11, pp.684,685, 27 May 2004
Barembruch, S.; Scaglione, A.; Moulines, E., "The expectation and sparse maximization algorithm," Communications
and Networks, Journal of , vol.12, no.4, pp.317,329, Aug. 2010
Panayirci, E., "Advanced signal processing techniques for wireless communications," Signal Design and its
Applications in Communications (IWSDA), 2011 Fifth International Workshop on , vol., no., pp.1,1, 10-14 Oct. 2011
O'Sullivan, J.A., "Message passing expectation-maximization algorithms," Statistical Signal Processing, 2005
IEEE/SP 13th Workshop on , vol., no., pp.841,846, 17-20 July 2005
Etzlinger, Bernhard; Haselmayr, Werner; Springer, Andreas, "Joint Detection and Estimation on MIMO-ISI Channels
Based on Gaussian Message Passing," Systems, Communication and Coding (SCC), Proceedings of 2013 9th
International ITG Conference on , vol., no., pp.1,6, 21-24 Jan. 2013
M.Sc. Jukka Talvitie
5.12.2013
References
12.
13.
14.
15.
16.
Groh, I.; Staudinger, E.; Sand, S., "Low Complexity High Resolution Maximum Likelihood Channel Estimation in
Spread Spectrum Navigation Systems," Vehicular Technology Conference (VTC Fall), 2011 IEEE , vol., no., pp.1,5,
5-8 Sept. 2011
Wei Wang; Jost, T.; Dammann, A., "Estimation and Modelling of NLoS Time-Variant Multipath for Localization
Channel Model in Mobile Radios," Global Telecommunications Conference (GLOBECOM 2010), 2010 IEEE , vol.,
no., pp.1,6, 6-10 Dec. 2010
Nasir, A.A.; Mehrpouyan, H.; Blostein, S.D.; Durrani, S.; Kennedy, R.A., "Timing and Carrier Synchronization With
Channel Estimation in Multi-Relay Cooperative Networks," Signal Processing, IEEE Transactions on , vol.60, no.2,
pp.793,811, Feb. 2012
Tsang-Yi Wang; Jyun-Wei Pu; Chih-Peng Li, "Joint Detection and Estimation for Cooperative Communications in
Cluster-Based Networks," Communications, 2009. ICC '09. IEEE International Conference on , vol., no., pp.1,5, 1418 June 2009
Xie, Yongzhe; Georghiades, C.N., "Two EM-type channel estimation algorithms for OFDM with transmitter diversity,"
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on , vol.3, no., pp.III2541,III-2544, 13-17 May 2002
M.Sc. Jukka Talvitie
5.12.2013
Download