Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 13: Hidden Markov Models Introduction Modeling dependencies in input; no longer iid Sequences: Temporal: In speech; phonemes in a word (dictionary), words in a sentence (syntax, semantics of the language). In handwriting, pen movements Spatial: In a DNA sequence; base pairs 3 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Discrete Markov Process N states: S1, S2, ..., SN State at “time” t, qt = Si First-order Markov P(qt+1=Sj | qt=Si, qt-1=Sk ,...) = P(qt+1=Sj | qt=Si) Transition probabilities aij ≡ P(qt+1=Sj | qt=Si) aij ≥ 0 and Σj=1N aij=1 Initial probabilities πi ≡ P(q1=Si) Σj=1N πi=1 4 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Stochastic Automaton T P O Q | A, P q1 P qt | qt 1 q1aq1q2 aqT 1qT t 2 5 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Example: Balls and Urns Three urns each full of balls of one color S1: red, S2: blue, S3: green 0.4 0.3 0.3 0.5,0.2,0.3 T O S 1 , S 1 , S 3 , S 3 A 0.2 0.6 0.2 0.1 0.1 0.8 P O | A , P S 1 P S 1 | S 1 P S 3 | S 1 P S 3 | S 3 1 a11 a13 a33 0.5 0.4 0.3 0.8 0.048 6 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Balls and Urns: Learning Given K example sequences of length T # se que nc e sstarting with S i ˆi # se que nc es # transitions from S i to S j ˆ aij # transitions from S i k k 1 q S and q k t 1 t i t 1 S j T- 1 k 1 q k t 1 t Si T- 1 k 1 q k 1 Si K 7 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Hidden Markov Models States are not observable Discrete observations {v1,v2,...,vM} are recorded; a probabilistic function of the state Emission probabilities bj(m) ≡ P(Ot=vm | qt=Sj) Example: In each urn, there are balls of different colors, but with different probabilities. For each observation sequence, there are multiple state sequences 8 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) HMM Unfolded in Time 9 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Elements of an HMM N: Number of states M: Number of observation symbols A = [aij]: N by N state transition probability matrix B = bj(m): N by M observation probability matrix Π = [πi]: N by 1 initial state probability vector λ = (A, B, Π), parameter set of HMM 10 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Three Basic Problems of HMMs 1. 2. 3. Evaluation: Given λ, and O, calculate P (O | λ) State sequence: Given λ, and O, find Q* such that P (Q* | O, λ ) = maxQ P (Q | O , λ ) Learning: Given X={Ok}k, find λ* such that P ( X | λ* )=maxλ P ( X | λ ) (Rabiner, 1989) 11 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Evaluation Forward variable: t i P O1 Ot , qt S i | Initializa tion : 1 i i bi O1 Re c ursion: N t 1 j t i aij b j Ot 1 i 1 N P O | T i i 1 12 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Backward variable: t i P Ot 1 OT | qt S i , Initializa tion : T i 1 Re c ursion: N t i aijb j Ot 1 t 1 j j 1 13 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Finding the State Sequence t i P qt Si O , t i t i N j 1 t j t j Choose the state that has the highest probability, for each time step: qt*= arg maxi γt(i) No! 14 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Viterbi’s Algorithm δt(i) ≡ maxq1q2∙∙∙ qt-1 p(q1q2∙∙∙qt-1,qt =Si,O1∙∙∙Ot | λ) Initialization: δ1(i) = πibi(O1), ψ1(i) = 0 Recursion: δt(j) = maxi δt-1(i)aijbj(Ot), ψt(j) = argmaxi δt-1(i)aij Termination: p* = maxi δT(i), qT*= argmaxi δT (i) Path backtracking: qt* = ψt+1(qt+1* ), t=T-1, T-2, ..., 1 15 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Learning t i , j P qt S i , qt 1 S j | O , t i , j t i aijb j Ot 1 t 1 j k a b O l k l t kl l t 1 t 1 Baum - We lc h algorithm (EM) : 1 if qt Si 1 if qt S i and qt 1 S j t z zij 0 othe rwise 0 othe rwise t i 16 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Baum-Welch (EM) E ste p : E zit t i E zijt t i , j M ste p : K ˆi k 1 i k 1 K ˆ aij k 1 K k 1 k 1 Tk 1 k k t t t 1 K Tk 1 k t k 1 t 1 i , j Tk 1 k t t 1 Tk 1 k t 1 j 1O v m i K ˆ b j K t i m 17 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Continuous Observations Discrete: t rm M P Ot | qt S j , b j m m 1 1 if Ot v m r 0 otherwise t m Gaussian mixture (Discretize using k-means): L P Ot | qt S j , P Gjl p Ot | qt S j ,Gl , l 1 Continuous: P Ot | qt S j , ~ N j , 2j Use EM to learn parameters, e.g., ˆ j ~ N l , l j O j t t t Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) t t 18 HMM with Input Input-dependent observations: P Ot | qt S j , x t , ~ N g j x t | j , 2j Input-dependent transitions (Meila and Jordan, 1996; Bengio and Frasconi, 1996): P qt 1 S j | qt Si , x t Time-delay input: xt f Ot ,...,Ot 1 19 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Model Selection in HMM Left-to-right HMMs: a11 a12 a13 0 0 a a a 22 23 24 A 0 0 a33 a34 0 0 0 a 44 In classification, for each Ci, estimate P (O | λi) by a separate HMM and use Bayes’ rule P O | i P i P i | O j P O | j P j 20 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)