Expectation Maximization
Introduction to EM algorithm
TLT-5906 Advanced Course in Digital Transmission
Jukka Talvitie, M.Sc. (eng)
jukka.talvitie@tut.fi
Department of Communication Engineering
Tampere University of Technology
M.Sc. Jukka Talvitie
5.12.2013
Outline
q Expectation Maximization (EM) algorithm
– Motivation, background
– Where the EM can be used?
q EM principle
– Formal definition
– How the algorithm really works?
– Coin toss example
– About some practical issues
q More advanced examples
– Line fitting with EM algorithm
– Parameter estimation of multivariate Gaussian mixture
q Conclusions
M.Sc. Jukka Talvitie
5.12.2013
Motivation
q Consider classical line fitting problem:
– Assume below measurements of a linear model y=ax+b+n (here
the line parameters are a and b and n is zero mean noise)
2.2
Measurements
2.1
2
1.9
1.8
1.7
1.6
1.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
M.Sc. Jukka Talvitie
0.9
1
5.12.2013
Motivation
q We use LS (Least Squares) to find the best fit:
q Is this the best solution?
2.3
Measurements
LS
2.2
2.1
2
1.9
1.8
1.7
1.6
1.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
M.Sc. Jukka Talvitie
0.9
1
5.12.2013
Motivation
q LS would be the Best Linear Unbiased Estimator, if the noise
would be uncorrelated with fixed variance
q Here, actually the noise term is correlated and the actual linear
model of this realization can be seen below as the black line
– Here the LS gives too much weight for a group of samples in the
middle
2.3
Measurements
LS
Correct line
2.2
2.1
2
1.9
1.8
1.7
1.6
1.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
M.Sc. Jukka Talvitie
1
5.12.2013
Motivation
q Taking the correlation of the noise term into account, we can use Generalized LS
method and the result can be improved considerably
q However, in many cases we do not know the correlation model
– It is hided in the observations and we cannot access it directly
– Therefore, e.g. here we would need to estimate simultaneously the
covariance and the line parameters
q This sort of problems might quite quickly become
2.3
very complicated
Measurements
LS
– How to estimate the covariance without
2.2
Correct line
knowing the line parameters and vice versa? 2.1
Generalized LS
q Intuitive (heuristic) solution:
2
– Iteratively estimate the other parameter, and
1.9
then the other, and continue…
1.8
– No guarantee for the performance in this
case (e.g. compared to maximum likelihood
1.7
(ML) solution)
q The EM algorithm provides the ML solution for
these sort of problems
M.Sc. Jukka Talvitie
1.6
1.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
5.12.2013
1
Expectation Maximization
Algorithm
q Presented by Dempster, Laird and Rubin in [1] in 1977
– Basically the same principle was already proposed earlier by
some other authors in specific circumstances
q EM algorithm is an iterative estimation algorithm that can derive
the maximum likelihood (ML) estimates in the presence of
missing/hidden data (“incomplete data”)
– e.g. the classical case is the Gaussian mixture, where we have
a set of unknown Gaussian distributions (see example later on)
Many-to-one mapping [2]
X: underlying space
x: complete data (required for ML)
Y: observation space
y: observation
x is observed only by means of y(x).
X(y) is a subset of X determined by y.
M.Sc. Jukka Talvitie
5.12.2013
Expectation Maximization
Algorithm
q The basic functioning of the EM algorithm can be divided into two
steps (the parameter to be estimated is θ):
– Expectation step (E-step)
• Take the expected value of the complete data given the
observation and the current parameter estimate qˆ
{
Q(q , qˆk ) = E log f (x | q ) | y, qˆk
}
k
– Maximization step (M-step)
• Maximize the Q-function in the E-step (basically, the data of
the E-step is used as it were measured observations)
qˆk +1 = arg max Q (q | qˆk )
q
q The likelihood of the parameter is increased at every iteration
– EM converges towards some local maximum of the likelihood
function
M.Sc. Jukka Talvitie
5.12.2013
An example: ML estimation vs. EM
algorithm [3]
q We wish to estimate the variance of S:
– observation Y=S+N
• S and N are normally distributed with zero means and
variances θ and 1, respectively
– Now, Y is also normally distributed (zero mean with variance θ+1)
q ML estimate can be easily derived:
qˆML = arg max( p( y | q ))
M
q
= max{0, y 2 - 1}
q The zero in above result becomes from the fact that we know that
the variance is always non-negative
M.Sc. Jukka Talvitie
5.12.2013
An example: ML estimation vs. EM
algorithm
q The same with the EM algorithm
– complete data is now included in S and N
– E-step is then:
Q(q , qˆk ) = E éëln p( s, n | q ) | y, qˆk ùû
– the logarithmic probability distribution for the complete data is
then
ln p ( s, n | q ) = ln p( n) + ln( p( s | q ))
1
S2
= C - ln q 2
2q
®
(C contains all the terms
independent of θ)
2
E
S
[
| Y , qˆk ]
1
ˆ
Q(q , q k ) = C - ln q 2
2q
M.Sc. Jukka Talvitie
5.12.2013
An example: ML estimation vs. EM
algorithm
q M-step:
– maximize the E-step
– We set the derivative to zero and get (use results from math
tables: conditional means and variances, “Law of total variance”)
qˆk +1 = E éë S 2 | Y , qˆk ùû = E 2 éë S | Y , qˆk ùû + var éë S | Y , qˆk ùû
2
æ qˆk
ö
qˆk
Y ÷÷ +
= çç
ˆ
ˆ
è qk + 1 ø qk + 1
q At the steady state (qˆk +1 = qˆk ) we get the same value for the
estimate as in ML estimation (max{0,y2-1})
q What about the convergence? What if we choose the initial value
qˆ = 0
0
M.Sc. Jukka Talvitie
5.12.2013
An example: ML estimation vs. EM
algorithm
q In the previous example, the ML estimate could be solved in a
closed form expression
– In this case there was no need for EM algorithm, since the ML
estimate is given in a straightforward manner (we just showed
that the EM algorithm converges to the peak of the likelihood
function)
q Next we consider a coin toss example:
– The target is to figure out the probability of heads for two coins
– ML estimate can be directly calculated from the results
q We will raise the bets a little bit higher and assume that we don’t
even know which one of the coins is used for the sample set?
– i.e. we are estimating the coin probabilities without knowing
which one of the coins is being tossed
M.Sc. Jukka Talvitie
5.12.2013
An example: Coin toss) [4]
Maximum likelihood
q We have two coins: A and B
q The probabilities for heads are q A
and q B
q We have 5 measurement sets
including 10 coin tosses in each set
q If we know which of the coins are
tossed in each set, we can
calculate the ML probabilities for q A
and q B
q If we don’t know which of the coins
are tossed in each set, ML
estimates cannot be calculated
directly
→ EM algorithm
Binomial distribution
used to calculate
probabilities:
ænö k
n -k
ç k ÷ p (1 - p ) H
è ø
Coin A
Coin B
HTTTHHTHTH
5H, 5T
HHHHTHHHHH
9H, 1T
HTHHHHHTHH
8H,2T
HTHTTTTHHTT
THHHTHHHTH
5 sets, 10 tosses per set
4H, 6T
?
?
?
?
?
HTTTHHTHTH
HHHHTHHHHH
HTHHHHHTHH
HTHTTTTHHTT
THHHTHHHTH
πA(0) < 0.6
πB(0) < 0.5
1. Initialization
πA <
24
< 0.80
24 ∗ 6
πB <
9
< 0.45
9 ∗ 11
Example calculations for the
first set (qˆA(0) = 0.6, qˆB(0) = 0.5)
7H,3T
æ10 ö
5
5
ç ÷ × 0.6 × 0.4 » 0.201
è5ø
th
i
w
37 æ10 ö
e”
5
5
liz
2.2
a
ç ÷ × 0.5 × 0.5 » 0.246
=
m
è5ø
or
6
N
1
”
4
0.2
1+
0
0.2
24H, 6T 9H, 11T
Expectation Maximization
2. E-step
ML method (if we
know the coins):
Coin A
Coin B
0.45 x
0.55 x
≈2.2H, 2.2T
≈2.8H, 2.8T
0.80 x
0.20 x
≈7.2H, 0.8T
≈1.8H, 0.2T
0.73 x
0.27 x
≈5.9H, 1.5T
≈2.1H, 0.5T
0.35 x
0.65 x
≈1.4H, 2.1T
≈2.6H, 3.9T
0.65 x
0.35 x
≈4.5H, 1.9T
≈2.5H, 1.1T
πA(1) <
πB(1) <
M.Sc. Jukka Talvitie
21.3
< 0.71
21.3 ∗ 8.6
11.7
< 0.58
11.7 ∗ 8.4
≈21.3H, 8.6T ≈11.7H, 8.4T
3. M-step
4.
πA(10) < 0.80
πB(10) < 0.52
5.12.2013
About some practical issues
q Although many examples in the literature are showing excellent
results using the EM algorithm, the reality is often less glamorous
– As the number of uncertain parameters increase in the modeled
system, even the best available guess (in ML sense) might not
be adequate
– NB! This is not the algorithm’s fault. It still provides the best
possible solution in ML sense
q Depending on the form of the likelihood function (provided in the
E-step) the convergence rate of the EM might vary considerably
q Notice, that the algorithm converges towards a local maximum
– To locate the global peak one must use different initial guesses
for the estimated parameters or use some other more advanced
methods to find out the global peak
– With multiple unknown (hidden/latent) parameters the number of
local peaks usually increases
M.Sc. Jukka Talvitie
5.12.2013
Further examples
q Line Fitting (showed only in the lecture)
q Parameter estimation of multivariate Gaussian mixture
– See additional pdf-file for the
• Problem definition
• Equations
– Definition of the log-likelihood function
– E-step
– M-step
– See additional Matlab m-file for the illustration of
• The example in numerical form
– Dimensions and value spaces for each parameter
• The iterative nature of the EM algorithm
– Study how parameters change at each iteration
• How initial guesses for the estimated parameters affect the final
result
M.Sc. Jukka Talvitie
5.12.2013
Conclusions
q EM finds iteratively ML estimates in estimation problems with hidden
(incomplete) data
– likelihood increases at every step of the iteration process
q Algorithm consists of two iteratively taken steps:
– Expectation step (E-step)
• Take the expected value of the complete data given the
observation and the current parameter estimate
– Maximization step (M-step)
• Maximize the Q-function in the E-step (basically, the data of the
E-step is used as it were measured observations)
q Algorithm converges to the local maximum
– Global maximum can be elsewhere
q See reference list for literature regarding use cases of EM algorithm in
the Communications
– These are the references [5]-[16] (not mentioned in the previous slides)
M.Sc. Jukka Talvitie
5.12.2013
References
1.
2.
3.
4.
Dempster, A.P.; Laird, N.M.; Rubin, D.B., “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal
of the Royal Statistical Society, Series B (Methodological), Vol. 39, No. 1., pp. 1-38, 1977.
Moon, T.K., “The Expectation Maximization Algorithm”, IEEE Signal Processing Magazine, vol. 13, pp. 47-60, Nov.
1996.
Chuong, B.D.; Serafim B., What is Expectation Maximization algorithm? [Online]. Not available anymore. Was
originally available on: courses.ece.illinois.edu/ece561/spring08/EM.pdf
The Expectation-Maximization Algorithm. [Online]. Not available anymore. Was originally available on:
ai.stanford.edu/ ~chuongdo/papers/em_tutorial.pdf
Some communications related papers using the EM algorithm (continues in the next slide):
5.
6.
7.
8.
9.
10.
11.
Borran, M.J.; Nasiri-Kenari, M., "An efficient detection technique for synchronous CDMA communication systems
based on the expectation maximization algorithm," Vehicular Technology, IEEE Transactions on , vol.49, no.5,
pp.1663,1668, Sep 2000
Cozzo, C.; Hughes, B.L., "The expectation-maximization algorithm for space-time communications," Information
Theory, 2000. Proceedings. IEEE International Symposium on , vol., no., pp.338,, 2000
Rad, K. R.; Nasiri-Kenari, M., "Iterative detection for V-BLAST MIMO communication systems based on expectation
maximisation algorithm," Electronics Letters , vol.40, no.11, pp.684,685, 27 May 2004
Barembruch, S.; Scaglione, A.; Moulines, E., "The expectation and sparse maximization algorithm," Communications
and Networks, Journal of , vol.12, no.4, pp.317,329, Aug. 2010
Panayirci, E., "Advanced signal processing techniques for wireless communications," Signal Design and its
Applications in Communications (IWSDA), 2011 Fifth International Workshop on , vol., no., pp.1,1, 10-14 Oct. 2011
O'Sullivan, J.A., "Message passing expectation-maximization algorithms," Statistical Signal Processing, 2005
IEEE/SP 13th Workshop on , vol., no., pp.841,846, 17-20 July 2005
Etzlinger, Bernhard; Haselmayr, Werner; Springer, Andreas, "Joint Detection and Estimation on MIMO-ISI Channels
Based on Gaussian Message Passing," Systems, Communication and Coding (SCC), Proceedings of 2013 9th
International ITG Conference on , vol., no., pp.1,6, 21-24 Jan. 2013
M.Sc. Jukka Talvitie
5.12.2013
References
12.
13.
14.
15.
16.
Groh, I.; Staudinger, E.; Sand, S., "Low Complexity High Resolution Maximum Likelihood Channel Estimation in
Spread Spectrum Navigation Systems," Vehicular Technology Conference (VTC Fall), 2011 IEEE , vol., no., pp.1,5,
5-8 Sept. 2011
Wei Wang; Jost, T.; Dammann, A., "Estimation and Modelling of NLoS Time-Variant Multipath for Localization
Channel Model in Mobile Radios," Global Telecommunications Conference (GLOBECOM 2010), 2010 IEEE , vol.,
no., pp.1,6, 6-10 Dec. 2010
Nasir, A.A.; Mehrpouyan, H.; Blostein, S.D.; Durrani, S.; Kennedy, R.A., "Timing and Carrier Synchronization With
Channel Estimation in Multi-Relay Cooperative Networks," Signal Processing, IEEE Transactions on , vol.60, no.2,
pp.793,811, Feb. 2012
Tsang-Yi Wang; Jyun-Wei Pu; Chih-Peng Li, "Joint Detection and Estimation for Cooperative Communications in
Cluster-Based Networks," Communications, 2009. ICC '09. IEEE International Conference on , vol., no., pp.1,5, 1418 June 2009
Xie, Yongzhe; Georghiades, C.N., "Two EM-type channel estimation algorithms for OFDM with transmitter diversity,"
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on , vol.3, no., pp.III2541,III-2544, 13-17 May 2002
M.Sc. Jukka Talvitie
5.12.2013
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )