Introcution to Pattern Recognition for Human ICT Gaussian Mixture models and EM 2014. 11. 21 Hyunki Hong Contents • Supervised vs. unsupervised learning • Mixture models • Expectation Maximization (EM) Supervised vs. unsupervised learning • The pattern recognition methods covered in class up to this point have focused on the issue of classification. – A pattern consisted of a pair of variables {π₯, π} where 1. π₯ was a collection of observations or features (feature vector). 2. π was the concept behind the observation (label). – Such pattern recognition problems are called supervised (training with a teacher) since the system is given BOTH the feature vector and the correct answer. • In the next we investigate a number of methods that operate on unlabeled data. – Given a collection of feature vectors π = {π₯(1, π₯(2, …, π₯(π} without class labels ππ, these methods attempt to build a model that captures the structure of the data. – These methods are called unsupervised (training without a teacher) since they are not provided the correct answer. the computational process of discovering patterns in large data sets involving methods at AI, database system, etc. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use • Although unsupervised learning methods may appear to have limited capabilities, there are several reasons that make them extremely useful. Airport Surveillance Radar – Labeling large data sets can be a costly procedure (i.e., ASR). – Class labels may not be known beforehand (i.e., data mining). – Large datasets can be compressed into a small set of prototypes (kNN). • The supervised and unsupervised paradigms comprise the vast majority of pattern recognition problems. – A third approach, known as reinforcement learning, uses a reward signal (real-valued or binary) to tell the learning system how well it is performing. – In reinforcement learning, the goal of the learning system (or agent) is to learn a mapping from states onto actions (an action policy) that maximizes the total reward. Approaches to unsupervised learning • Parametric (mixture models) – These methods model the underlying class-conditional densities with a mixture of parametric densities, and the objective is to find the model parameters. – These methods are closely related to parameter estimation. • Non-parametric (clustering) – No assumptions are made about the underlying densities, instead we seek a partition of the data into clusters. • There are two reasons why we cover mixture models at this point. – The solution to the mixture problem (the EM algorithm) is also used for Hidden Markov Models. – A particular form of the mixture model problem leads to also known as the most widely used clustering method: the k-means Mixture models • Consider the now familiar problem of modeling a pdf given a dataset π = {π₯(1, π₯(2, …, π₯(π}. – If the form of the underlying pdf was known (e.g. Gaussian), the problem could be solved using Maximum Likelihood. – If the form of the pdf was unknown, the problem had to be solved with non-parametric DE methods such as Parzen windows. • We will now consider an alternative DE method: modeling the pdf with a mixture of parametric densities – These methods are sometimes known as semi-parametric. – In particular, we will focus on mixture models of Gaussian densities (surprised?). • Mixture models can be posed in terms of the ML criterion. – Given a dataset π = {π₯(1, π₯(2, …, π₯(π}, find the parameters of the model that maximize the log likelihood of the data. where ππ = {ππ,Σπ} and π(ππ) are the parameters and mixing coefficient of the ππ‘h mixture component, respectively. • We could try to find the maximum of this function by differentiation. – For Σπ = ππ πΌ, the solution becomes : 21~23 νμ΄μ§ μ°Έμ‘° Right-hand side • Notice that the previous equations are not a closed form solution – The model parameters ππ, Σπ, and π(ππ) also appear on the RHS as a result of Bayes rule! – Therefore, these expressions represent a highly non-linear coupled system of equations. • However, these expressions suggest that we may be able to use a fixed-point algorithm to find the maxima. - Begin with some “old” value of the model parameters. - Evaluate the RHS of the equations to obtain “ new ” parameter values . - Let these “ new ” values become the “ old ” ones and repeat the process. • Surprisingly, an algorithm of this simple form can be found which is guaranteed to increase the log-likelihood with every iteration! – This example represents a particular case of a more general procedure known as the Expectation-Maximization Expectation-Maximization (EM) algorithm • EM is a general method for finding the ML estimate of the parameters of a pdf when the data has missing values. – There are two main applications of the EM algorithm. 1. When the data indeed has incomplete, missing or corrupted values as a result of a faulty observation process. 2. Assuming the existence of missing or hidden parameters can simplify the likelihood function, which would otherwise lead to an analytically intractable optimization problem. • Assume a dataset containing two types of features. – A set of features π whose value is known. We call these the incomplete data. – A set of features π whose value is unknown. We call these the missing data. Expectation-Maximization (EM) algorithm • We now define a joint pdf π(π, π|π½) called the completedata likelihood. – This function is a random variable since the features π are unknown. – You can think of π(π, π|π) = βπ,π(π), for some function βπ,π(β), where π and πare constant and π is a random variable. • As suggested by its name, the EM algorithm operates by performing two basic operations over and over. – An Expectation step – A Maximization step • EXPECTATION – Find the expected value of log π(π, π|π) with respect to the unknown data π, given the data π and the current parameter estimates π(π−1. where π are the new parameters that we seek to optimize to increase Q. – Note that π and π(π−1 are constants, π is the variable that we wish to adjust, and π is a random variable defined by π(π|π,π(π −1). – Therefore π(π|π(π−1) is just a function of π. • MAXIMIZATION – Find the argument π that maximizes the expected value π(π|π(π−1). • Convergence properties – It can be shown that 1. each iteration (E+M) is guaranteed to increase the log-likelihood and 2. the EM algorithm is guaranteed to converge to a local maximum of • The two steps of the EM algorithm are illustrated below – During the E step, the unknown features π are integrated out assuming the current values of the parameters π(π−1. – During the M step, the values of the parameters that maximize the expected value of the log likelihood are obtained. new old ο± ο½ arg maxQ(ο± | ο± ) Q(ο± | ο± old ) ο½ ο₯ p(Z | X ,ο± old ) log[ p( X , Z | ο± )] Z – IN A NUTSHELL: since Z are unknown, the best we can do is maximize the average log-likelihood across all possible values of Z. EM algorithm and mixture models • Having formalized the EM algorithm, we are now ready to find the solution to the mixture model problem. – To keep things simple, we will assume a univariate mixture model where all the components have the same known standard deviation π. • Problem formulation – As usual, we are given a dataset π = {π₯(1, π₯(2, …, π₯(π}, and we seek to estimate the model parameters π = {π1, π2, …, ππΆ}. – The following process is assumed to generate each random variable π₯(π. 1. First, a Gaussian component is selected according to mixture coeffs π(ππ). 2. Then, π₯(n is generated according to the likelihood π(π₯|ππ) of that particular component. – We will also use hidden variables π = {π§1(π, π§2(π, …, π§π(π} to indicate which of the πΆ Gaussian components generated data point π₯(π. • Solution – The probability π(π₯, π§|π) for a specific example is where only one of the π§π(n can have a value of 1, and all others are zero. – The log-likelihood of the entire dataset is then - To obtain π(π|π(π−1 ) we must then take the expectation over π where we have used the fact that πΈ[π(π§)]= π(πΈ[π§]) for π(π§) linear. – πΈ[π§π(π] is simply the probability that π₯(π was generated by the π π‘β Gaussian component given current model parameters π(π−1. (1) - These two expressions define the π function. – The second step (Maximization) consists of finding the values {π1, π2, …, ππΆ} that maximize the π function. which, computing the zeros of the partial derivative, yields: (2) - Equations (1) and (2) define a fixed-point algorithm that can be used to converge to a (local) maximum of the loglikelihood function. κ°μ λ΄μ© κ°μ°μμ νΌν© λͺ¨λΈ νμμ±, μ μ νμ΅ ο μ΅μ°μΆμ λ²κ³Ό λ¬Έμ μ EM μκ³ λ¦¬μ¦ μλλ³μ, EM μκ³ λ¦¬μ¦μ μν κ³Όμ κ°μ°μμ νΌν© λͺ¨λΈμ μν EM μκ³ λ¦¬μ¦ μΌλ°νλ EM μκ³ λ¦¬μ¦ λ§€νΈλ©μ μ΄μ©ν μ€ν 1 κ°μ°μμ νΌν©λͺ¨λΈ(Gaussian Mixtures) 1-1. κ°μ°μμ νΌν© λͺ¨λΈμ νμμ± νλ₯ λͺ¨λΈ: λ°μ΄ν° λΆν¬ νΉμ±μ μκΈ° μν΄ μ μ ν νλ₯ λ°λν¨μ κ°μ νκ³ μ΄μ λν λͺ¨λΈ μμ± Estimated pdf True pdf κ°μ°μμ λΆν¬ ο μ λλͺ¨λ¬ νν : λ°μ΄ν° νκ· μ€μ¬μΌλ‘ νλμ κ·Έλ£ΉμΌλ‘ λμΉ¨ μ κ²½μ°, 3κ°μ κ°μ°μμ λΆν¬μ κ°μ€ν© 볡μ‘ν ννμ ν¨μμ ννμ΄ κ°λ₯ 6κ°μ κ°μ°μμ νΌν©μΌλ‘ λνλΌ μ μλ λ°μ΄ν° μ§ν© 1-2. GMMμ μ μ Mκ°μ μ±λΆ(λ°λν¨μ)μ νΌν©μΌλ‘ μ μλλ μ 체 νλ₯ λ°λν¨μ Ci : iλ²μ§Έ μ±λΆμμ λνλ΄λ νλ₯ λ³μ νΌν© λͺ¨λΈμ κΈ°λ³Έ μ±λΆμ΄ λλ κ°λ¨ν pdf iλ²μ§Έ μ±λΆμ΄ λλ pdfλ₯Ό μ μνλ νλΌλ―Έν° κ°μ°μμ pdfμμλ μi, Σi iλ²μ§Έ μ±λΆμ΄ μ 체μμ μ°¨μ§νλ μλμ μΈ μ€μλ: Mκ°μ μ±λΆμ κ°μ§λ GMMμ μ 체 νλΌλ―Έν° 곡λΆμ° νλ ¬μ΄ μΈ κ²½μ° αi 1-3. GMMμ νμ΅ GMMμμ νμ΅: λ°μ΄ν°λ₯Ό μ΄μ©ν΄ νλΌλ―Έν° μΆμ → λͺ¨μμ νλ₯ λ°λ μΆμ λ°©λ² μΌμ’ X = {x1, x2, …, xN} μ£Όμ΄μ‘μ λ, iλ²μ§Έ λ°μ΄ν° xiμ νλ₯ λ°λ p(xi)λ₯Ό κ°μ°μμ νΌν© λͺ¨λΈ νν jλ²μ§Έ κ°μ°μμ νλ₯ λ°λ (1μ°¨μμΈ κ²½μ°) Mκ°λ₯Ό κ²°ν©ν νΌν© νλ₯ λ°λ νμ΅ : λ°μ΄ν°μ λ‘κ·Έμ°λλ₯Ό μ΅λλ‘ νλ νλΌλ―Έν° μ°ΎκΈ° νμ΅λ°μ΄ν° μ 체μ λν λ‘κ·Έμ°λ ν¨μ μ΅μ°μΆμ λ 1-3. GMMμ νμ΅ – νκ· κ³Ό 곡λΆμ° μΆμ p (C j | xi , ο± ) ο½ p ( xi | C j , ο± ) p (C j | ο± ) p ( xi | ο± ) ο‘ j p( xi | ο j , ο³ 2j ) ο½ p ( xi | ο± ) μ΄μ© d u du (e ) ο½ e u dx dx : λ°μ΄ν° xi μ£Όμ΄μ‘μ λ, κ·Έκ²μ΄ jλ²μ§Έ μ±λΆμΌλ‘λΆν° λμμ νλ₯ κ° μj μ λ§μ°¬κ°μ§λ‘ μ λ 1-4. GMMμ νμ΅ – νΌν©κ³μμ μΆμ αjμ 쑰건μ κ°μ§ μ΅λν λ¬Έμ ν΄κ²°: λΌκ·Έλμ μΉμ μ΄μ© 쑰건 λ§μ‘±ν΄μΌ ν¨. μ μμ λμ μ μμ μ΄μ©νλ©΄, λ = N p (C j | xi , ο± ) ο½ ο½ p ( xi | C j , ο± ) p (C j | ο± ) p ( xi | ο± ) ο‘ j p( xi | ο j , ο³ 2j ) p ( xi | ο± ) μ΄μ© 1-5. μ΅μ°μΆμ λ²μ λ¬Έμ μ μ°λ³μ μΆμ ν΄μΌ νλ νλΌλ―Έν° ν¬ν¨λ¨ θ ο½ (ο1, ..., οM , ο³12 , ..., ο³ M2 , ο‘1 , ...,ο‘ M ) “λ°λ³΅ μκ³ λ¦¬μ¦”μΌλ‘ ν΄κ²° EM μκ³ λ¦¬μ¦ μμ λ¨κ³μμ μμ κ°μΌλ‘ μ΄κΈ°ν μ΅μ’ μ μΌλ‘ λͺ©μ ν¨μ μ΅λννλ νλΌλ―Έν° ꡬνκΈ° 1-6. GMMκ³Ό μλλ³μ : μΈλΆμμ κ΄μ°°ν μ μλ λ³μ μ) λ¨,μ¬ λ μ§λ¨μ΄ ν¨κ» μν νλμ κ·Έλ£Ήμμ ν μ¬λμ© λ½μ μ μ₯ μΈ‘μ λ°μ΄ν° X={x1, x2, …, xN } μ£Όμ΄μ§ κ²½μ°, νλ₯ λ°λν¨μ p(x) μΆμ μ¬μ κ·Έλ£Ήμ νλ₯ λΆν¬ λ¨μ κ·Έλ£Ήμ νλ₯ λΆν¬ 2 → ꡬν΄μΌ νλ νλΌλ―Έν°: ο j ,ο³ j ,ο‘ j ( j ο½1, 2) λ°μ΄ν°μ ν΄λμ€ λΌλ²¨μ νννκΈ° μν μλ‘μ΄ λ³μ x∈μ¬μ κ·Έλ£Ή ο C1=1, C2=0 x∈λ¨μ κ·Έλ£Ή ο C1=0, C2=1 μλλ³μ : μΈλΆμ μΈ‘μ λ μ μλ κ°μ κ°μ§ λ³μ μλλ³μλ₯Ό κ³ λ €ν νλ₯ λͺ¨λΈ 1-7. μλλ³μλ₯Ό κ°μ§ λͺ¨λΈμ EM μκ³ λ¦¬μ¦ κ° λ°μ΄ν° xiμ λν΄ μ¨κ²¨μ§ λ³μμ κ° ziλ κ°μ΄ κ΄μ°°λλ€κ³ κ°μ νλ©΄ 1. μ£Όμ΄μ§ z κ°μ λ°λΌ λ κ·Έλ£ΉμΌλ‘ λλκ³ , κ° κ·Έλ£Ήμ νλΌλ―Έν° μΆμ OK! Cij : iλ²μ§Έ λ°μ΄ν°μ λν μλλ³μ Cj μ κ° Nj : κ° κ·Έλ£Ήμ μν λ°μ΄ν°μ μ 2. κ° κ·Έλ£Ήμ μ€μλλ₯Ό λνλ΄λ νΌν© κ³μ νλΌλ―Έν° αj μΆμ : μ 체 κ·Έλ£Ήμμ jλ²μ§Έ κ·Έλ£Ήμ΄ μ°¨μ§νλ λΉμ¨ κ·Έλ¬λ μ€μ λ‘λ xiλ§ κ΄μ°°λκ³ , zi = [Ci1, Ci2]λ μ μ μμ. → EM μκ³ λ¦¬μ¦: μλλ³μ z μΆμ κ³Ό νλΌλ―Έν° θμΆμ μ λ°λ³΅ E-Step M-Step “μ¨κ²¨μ§ νλ₯ λ³μ κ°μ§ νλ₯ λͺ¨λΈμ νλΌλ―Έν° μΆμ λ°©λ²” 1-7. μλλ³μ κ°μ§ λͺ¨λΈμ EM μκ³ λ¦¬μ¦ E-Step M-Step (1) E(Expectation)-Step : μλλ³μ zμ κΈ°λμΉ κ³μ° → z μΆμ μΉλ‘ μ¬μ© νμ¬μ νλΌλ―Έν°λ₯Ό μ΄μ©νμ¬ μλλ³μμ κΈ°λμΉλ₯Ό κ³μ°νλ κ³Όμ : μ²μμλ μμμ κ°μΌλ‘ νλΌλ―Έν°λ₯Ό μ νκ³ , μ΄νμλ M-Stepμ μννμ¬ μ»μ΄μ§λ νλΌλ―Έν° μ¬μ© C1κ³Ό C2λ κ°κ° 1 νΉμ 0 κ°μ κ°μ§λ―λ‘, κΈ°λμΉλ Cj =1μΌ νλ₯ κ° μΆμ μΌλ‘ μ»μ΄μ§. μμ κ΄κ³ μ΄μ© P(C j | xi , ο± ) ο½ p ( xi | C j , ο± ) p (C j | ο± ) p ( xi | ο± ) ο‘ j p( xi | ο j , ο³ 2j ) ο½ p ( xi | ο± ) 1-7. μλλ³μλ₯Ό κ°μ§ λͺ¨λΈμ EM μκ³ λ¦¬ μ¦ (2) M(Maximization)-Step: μλλ³μ zμ κΈ°λμΉ μ΄μ©ν νλΌλ―Έν° μΆμ E-Stepμμ μ°Ύμμ§ κΈ°λμΉλ₯Ό μλλ³μμ κ΄μ°°κ°μΌλ‘ κ°μ£Όνμ¬, λ‘κ·Έμ°λλ₯Ό μ΅λννλ νλΌλ―Έν°λ₯Ό μ°Ύλ κ³Όμ z= μ΄μ© 1-8. GMMμ μν EM μκ³ λ¦¬μ¦ 1-8. GMMμ μν EM μκ³ λ¦¬μ¦ 1-9. μΌλ°νλ EM μκ³ λ¦¬μ¦ κ°μ°μμ νΌν©λͺ¨λΈμ κ΅νλ νμ΅ μκ³ λ¦¬μ¦μ μλ. μλλ³μλ₯Ό κ°μ§λ νλ₯ λͺ¨λΈ(HMM,λ±μ μ μ© κ°λ₯) μλλ³μμ νλΌλ―Έν°λ₯Ό ν¨κ» μΆμ ν μ μλ μ΅μ ν μκ³ λ¦¬μ¦ κ΄μ°° λ°μ΄ν°μ νλ₯ λ³μ x, μλ νλ₯ λ³μ z, λͺ¨λΈ νλΌλ―Έν° θμ λν΄, μ 체 νλ₯ λ°λν¨μλ λ λ³μμ κ²°ν©νλ₯ λ°λν¨μ p(x, z | θ) λ‘ νν → xμ κ΄μ°° λ°μ΄ν° X = {x1, x2, …, xN}μ μ΄μ λν λ‘κ·Έμ°λ ln p(X | θ)λ₯Ό μ΅λλ‘ νλ νλΌλ―Έν° μΆμ 1. p(X | θ)λ κ²°ν©νλ₯ λΆν¬ p(x, z | θ)λ₯Ό zμ λν΄ μ λΆ(μ£Όλ³νλ₯ λΆν¬)ν΄μ κ³μ° κ°λ₯ λΆμμ λ°μ΄ν° λ‘λμ°λ(νλ₯ λ³μ xμ λν λ°μ΄ν°λ§μΌλ‘ μ μλ μ°λ)λ₯Ό μ΅λν κ³μ°μ΄ μ΄λ €μ μ£Όλ³νλ₯ λΆν¬: λ μ΄μ°νλ₯ λ³μ(X, Y)μ κ²°ν©νλ£°λΆν¬(joint probability distribution)λ‘λΆν° κ°κ°μ μ΄μ° νλ₯ λ³μμ λν λΆν¬ ꡬν¨. Yκ° κ°μ§μ μλ λͺ¨λ κ°λ€μ κ²°ν©ν¨μμ ν©: νλ₯ λ³μ Xμ μ£Όλ³νλ₯ 1-9. μΌλ°νλ EM μκ³ λ¦¬μ¦ 2. μλλ³μ zμ κ΄μ°° λ°μ΄ν° Z = {z1, z2, …, zN}κ° μ£Όμ΄μ§λ κ²½μ°, λ λ°μ΄ν° μ§ν© Xμ Zμ λν μ°λν¨μλ₯Ό μ μ κ° λ°μ΄ν°μ λ λ¦½μ± μ΄μ©ν΄ κ° λ°μ΄ν°μ λ‘κ·Έμ°λ ν©μ°ν ννλ‘ νν κ°λ₯ λͺ¨λ νλ₯ λ³μμ λν λ°μ΄ν°κ° κ΄μ°°λμ΄ μ μλ μ°λ : μμ λ°μ΄ν° λ‘κ·Έμ°λ μΌλ°μ μΌλ‘ μλλ³μ zμ λν λ°μ΄ν° Zκ° μ£Όμ΄μ§μ§ μμ. → ꡬ체μ μΈ λ°μ΄ν° μ¬μ©νλ λμ μ zμ λν κΈ°λμΉλ₯Ό μ¬μ© zμ λν κΈ°λμΉ= μ λ‘κ·Έμ°λ ν¨μλ₯Ό zμ νλ₯ λΆν¬ν¨μλ₯Ό μ΄μ©ν μ λΆ θ (τ) ν¨μ. θ ν¨μ νμ¬ νλΌλ―Έν° θ (τ)μ κ΄μ°° λ°μ΄ν° X μ΄μ© → zμ λν μ‘°κ±΄λΆ νλ₯ p(z | X, θ(τ)) μ»μ. Q ν¨μ: λ‘ μ μμ νν 1-9. μΌλ°νλ EM μκ³ λ¦¬μ¦ κ΄μ°° λ°μ΄ν°λ‘ μ μλ λΆμμ λ°μ΄ν° λ‘κ·Έμ°λμ μ΅λν λΆκ°λ₯ν κ²½μ°, μλλ³μ λμ ν΄ μμ λ°μ΄ν° λ‘κ·Έμ°λ μ μν, κΈ°λμΉλ‘ μ μλ Q ν¨μ λ₯Ό μ΄μ©νμ¬ νλΌλ―Έν° μΆμ 1-10. 맀νΈλ©μ μ΄μ©ν μ€ν νμ΅νμμ λ°λ₯Έ νλΌλ―Έν°μ λ³ν (a) ο΄=0 (b) ο΄=1 (c) ο΄=10 (d) ο΄=50 1-10. 맀νΈλ©μ μ΄μ©ν μ€ν νμ΅νμμ λ°λ₯Έ λ‘κ·Έμ°λμ λ³ν Loglikelihood Number of iteration 1-10. 맀νΈλ©μ μ΄μ©ν μ€ν 2μ°¨μ λ°μ΄ν°λ₯Ό μν κ°μ°μμ νΌν© λͺ¨λΈμ EM μκ³ λ¦¬μ¦ % EM μκ³ λ¦¬μ¦μ μν μ΄κΈ°ν ------------------------load data10_2 X=data'; N=size(X,2); M=6; Mu=rand(M,2)*5; for i=1:M % λ°μ΄ν° λΆλ¬μ€κΈ° % X: νμ΅ λ°μ΄ν° % N: λ°μ΄ν°μ μ % M: κ°μ°μμ μ±λΆμ μ % νλΌλ―Έν°μ μ΄κΈ°ν (νκ· ) % νλΌλ―Έν°μ μ΄κΈ°ν (λΆμ°) Sigma(i,1:2,1:2) = [1 0; 0 1]; end alpha=zeros(6,1)+1/6; % νλΌλ―Έν°μ μ΄κΈ°ν (νΌν©κ³μ) drawgraph(X, Mu, Sigma, 1); Maxtau=100; % κ·Έλν그리기 ν¨μ νΈμΆ % μ΅λ λ°λ³΅νμ μ€μ 1-10. 맀νΈλ©μ μ΄μ©ν μ€ν 2μ°¨μ λ°μ΄ν°λ₯Ό μν κ°μ°μμ νΌν© λͺ¨λΈμ EM μκ³ λ¦¬μ¦ % EM μκ³ λ¦¬μ¦μ E-Step ------------------------------for tau=1: Maxtau %%% E-step %%%%%%%%%%%%%%%%%%%%%%%%%% for j=1:M px( j,:) = gausspdf(X, Mu( j,:), reshape(Sigma( j,:,:),2,2)); end sump=px'*alpha for j=1:M r(:,j) = (alpha( j)*px( j,:))'./sump; end L(tau)=sum(log(sump)); %νμ¬ νλΌλ―Έν°μ λ‘κ·Έμ°λ κ³μ° %% M-step %% (λ€μ μ¬λΌμ΄λ)%% end 1-10. 맀νΈλ©μ μ΄μ©ν μ€ν % EM μκ³ λ¦¬μ¦μ M-Step ------------------------------for tau=1: Maxtau %% E-step %%(μ΄μ μ¬λΌμ΄λ)%% %%% M-step %%%%%%%%%%%%%%%%%%%%%%%%% for j=1:M sumr=sum(r(:,j)) Rj=repmat(r(:,j),1,2)'; Mu( j,:) = sum(Rj.*X,2)/sumr % μλ‘μ΄ νκ· rxmu= (X-repmat(Mu( j,:),N,1)').*Rj; % μλ‘μ΄ κ³΅λΆμ° κ³μ° Sigma( j,1:2,1:2)= rxmu*(X-repmat(Mu( j,:),N,1)')'/sumr; alpha( j)=sumr/N; % μλ‘μ΄ νΌν©κ³μ end if (mod(tau,10)==1) % κ·Έλν그리기 ν¨μ νΈμΆ drawgraph(X, Mu, Sigma, ceil(tau/10)+1); end end drawgraph(X, Mu, Sigma, tau); % κ·Έλν그리기 ν¨μ νΈμΆ figure(tau+1); plot(L); % λ‘κ·Έμ°λμ λ³ν κ·Έλν 1-10. 맀νΈλ©μ μ΄μ©ν μ€ν κ°μ°μμ νλ₯ λ°λκ° κ³μ° ν¨μ gausspdf % ν¨μ gausspdf function [out]=gausspdf(X, mu, sigma) % ν¨μ μ μ n=size(X,1); % μ λ ₯ 벑ν°μ μ°¨μ N=size(X,2); % λ°μ΄ν°μ μ Mu=repmat(mu',1,N); % νλ ¬ μ°μ°μ μν μ€λΉ % νλ₯ λ°λκ° κ³μ° ν λ°ν out = (1/((sqrt(2*pi))^n*sqrt(det(sigma)))) *exp(-diag((X-Mu)'*inv(sigma)*(X-Mu))/2); 1-10. 맀νΈλ©μ μ΄μ©ν μ€ν λ°μ΄ν°μ νλΌλ―Έν°λ₯Ό κ·Έλνλ‘ νννλ ν¨μ % ν¨μ drawgraph function drawgraph(X, Mu, Sigma, cnt) % ν¨μ μ μ M=size(Mu,1); % μ±λΆμ μ figure(cnt); % λ°μ΄ν° 그리기 plot(X(1,:), X(2,:), '*'); hold on axis([-0.5 5.5 -0.5 3.5]); grid on plot(Mu(:,1), Mu(:,2), 'r*'); % νκ· νλΌλ―Έν° 그리기 for j=1:M sigma=reshape(Sigma( j,:,:),2,2); % 곡λΆμ°μ λ°λ₯Έ νμ 그리기 t=[-pi:0.1:pi]'; A=sqrt(2)*[cos(t) sin(t)]*sqrtm(sigma)+repmat(Mu( j,:), size(t),1); plot(A(:,1), A(:,2), 'r-', 'linewidth' ,2 ); end