Gaussian Mixture Model (GMM) using Expectation Maximization (EM) Technique Book : C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006 The Gaussian Distribution Univariate Gaussian Distribution Ν( x | μ, σ ) = mean 1 2 πσ 2 e (x − μ) 2 − 2σ 2 variance Multi-Variate Gaussian Distribution 1 1 T −1 ( ) ( ) Ν( x| μ, ) = exp x μ x μ − − − 1/ 2 2 (2π ) mean covariance We need to estimate these parameters of a distribution One method – Maximum Likelihood (ML) Estimation. ML Method for estimating parameters Consider log of Gaussian Distribution 1 1 1 T ln p( x | μ, ) = − ln( 2π ) − ln − (x − μ ) −1 (x − μ ) 2 2 2 Take the derivative and equate it to zero ∂ ln p( x | μ, ) =0 ∂μ μML 1 N = xn N n =1 ∂ ln p( x | μ, ) =0 ∂ ML 1 N T = (x n − μML )(x n − μML ) N n =1 Where, N is the number of samples or data points Gaussian Mixtures Linear super-position of Gaussians K p( x ) = πk N ( x | μk , k ) k =1 Number of Gaussians Mixing coefficient: weightage for each Gaussian dist. K Normalization and positivity require 0 ≤ πk ≤ 1, πk = 1 k =1 Consider log likelihood K ln p( X | μ, , π ) = ln p( x n ) = ln πk N( x n | μk , k ) n =1 n =1 k =1 N N ML does not work here as there is no closed form solution Parameters can be calculated using Expectation Maximization (EM) technique Example: Mixture of 3 Gaussian 1 1 0.5 0.5 0 0 (a) 0.5 1 0 0 (b) 0.5 1 Latent variable: posterior prob. We can think of the mixing coefficients as prior probabilities for the components For a given value of ‘x’, we can evaluate the corresponding posterior probabilities, called responsibilities From Bayes rule p(k )p( x | k ) γ k ( x ) = p( k | x ) = p( x ) π k N ( x | μk , k ) where, Latent = K Variable π jN ( x | μ j , j ) πk = Nk N j= 1 Interpret Nk as the effective no. of points assigned to cluster k. Expectation Maximization EM algorithm is an iterative optimization technique which is operated locally Optimal Initial point point Estimation step: for given parameter values we can compute the expected values of the latent variable. Maximization step: updates the parameters of our model based on the latent variable calculated using ML method. EM Algorithm for GMM Given a Gaussian mixture model, the goal is to maximize the likelihood function with respect to the parameters comprising the means and covariances of the components and the mixing coefficients). 1. Initialize the means μ j, covariances j and mixing coefficients π j , and evaluate the initial value of the log likelihood. 2. E step. Evaluate the responsibilities using the current parameter values π k N ( x | μk , k ) γ j (x) = K π jN ( x | μ j , j ) j= 1 EM Algorithm for GMM 3. M step. Re-estimate the parameters using the current responsibilities N γ (x j μj = n )x n n =1 N γ (x j n N j = ( )( γ j (xn ) xn − μj xn − μj n =1 ) n =1 N γ (x j n ) ) T 1 N π j = γ j (xn ) N n =1 n =1 4. Evaluate log likelihood K ln p( X | μ, , π ) = ln πk N( x n | μk , k ) n =1 k =1 If there is no convergence, return to step 2. N EM Algorithm : Example EM Algorithm : Example EM Algorithm : Example EM Algorithm : Example EM Algorithm : Example EM Algorithm : Example