ppt

Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 13: Hidden Markov Models Introduction   Modeling dependencies in input; no longer iid Sequences:  Temporal: In speech; phonemes in a word (dictionary), words in a sentence (syntax, semantics of the language). In handwriting, pen movements  Spatial: In a DNA sequence; base pairs 3 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Discrete Markov Process     N states: S1, S2, ..., SN State at “time” t, qt = Si First-order Markov P(qt+1=Sj | qt=Si, qt-1=Sk ,...) = P(qt+1=Sj | qt=Si) Transition probabilities aij ≡ P(qt+1=Sj | qt=Si) aij ≥ 0 and Σj=1N aij=1 Initial probabilities πi ≡ P(q1=Si) Σj=1N πi=1 4 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Stochastic Automaton T P O  Q | A,    P q1  P qt | qt 1   q1aq1q2 aqT 1qT t 2 5 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Example: Balls and Urns  Three urns each full of balls of one color S1: red, S2: blue, S3: green 0.4 0.3 0.3   0.5,0.2,0.3 T O  S 1 , S 1 , S 3 , S 3 A  0.2 0.6 0.2 0.1 0.1 0.8 P O | A ,    P S 1   P S 1 | S 1   P S 3 | S 1   P S 3 | S 3   1  a11  a13  a33  0.5  0.4  0.3  0.8  0.048 6 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Balls and Urns: Learning  Given K example sequences of length T # se que nc e sstarting with S i   ˆi  # se que nc es # transitions from S i to S j  ˆ aij  # transitions from S i    k k 1 q  S and q k t 1 t i t 1  S j T- 1  k 1 q k t 1 t  Si T- 1   k 1 q k 1  Si  K  7 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Hidden Markov Models      States are not observable Discrete observations {v1,v2,...,vM} are recorded; a probabilistic function of the state Emission probabilities bj(m) ≡ P(Ot=vm | qt=Sj) Example: In each urn, there are balls of different colors, but with different probabilities. For each observation sequence, there are multiple state sequences 8 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) HMM Unfolded in Time 9 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Elements of an HMM      N: Number of states M: Number of observation symbols A = [aij]: N by N state transition probability matrix B = bj(m): N by M observation probability matrix Π = [πi]: N by 1 initial state probability vector λ = (A, B, Π), parameter set of HMM 10 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Three Basic Problems of HMMs 1. 2. 3. Evaluation: Given λ, and O, calculate P (O | λ) State sequence: Given λ, and O, find Q* such that P (Q* | O, λ ) = maxQ P (Q | O , λ ) Learning: Given X={Ok}k, find λ* such that P ( X | λ* )=maxλ P ( X | λ ) (Rabiner, 1989) 11 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Evaluation  Forward variable: t i   P O1 Ot , qt  S i |   Initializa tion : 1 i    i bi O1  Re c ursion: N  t 1  j    t i aij b j Ot 1   i 1  N P O |     T i  i 1 12 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)  Backward variable: t i   P Ot 1 OT | qt  S i ,   Initializa tion : T i   1 Re c ursion: N t i    aijb j Ot 1 t 1  j  j 1 13 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Finding the State Sequence  t i   P qt  Si O ,     t i t i  N j 1 t  j t  j  Choose the state that has the highest probability, for each time step: qt*= arg maxi γt(i) No! 14 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Viterbi’s Algorithm δt(i) ≡ maxq1q2∙∙∙ qt-1 p(q1q2∙∙∙qt-1,qt =Si,O1∙∙∙Ot | λ)     Initialization: δ1(i) = πibi(O1), ψ1(i) = 0 Recursion: δt(j) = maxi δt-1(i)aijbj(Ot), ψt(j) = argmaxi δt-1(i)aij Termination: p* = maxi δT(i), qT*= argmaxi δT (i) Path backtracking: qt* = ψt+1(qt+1* ), t=T-1, T-2, ..., 1 15 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Learning t i , j   P qt  S i , qt 1  S j | O ,   t i , j   t i aijb j Ot 1 t 1  j     k a b O  l  k l t kl l t 1 t 1 Baum - We lc h algorithm (EM) : 1 if qt  Si 1 if qt  S i and qt 1  S j t z  zij   0 othe rwise 0 othe rwise t i 16 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Baum-Welch (EM)     E  ste p : E zit  t i  E zijt  t i , j  M  ste p : K ˆi  k   1 i  k 1 K ˆ aij k 1 K k 1 k 1 Tk 1 k k t t t 1 K Tk 1 k t k 1 t 1  i , j  Tk 1 k t t 1 Tk 1 k t 1   j 1O  v    m     i  K ˆ b j      K t i  m 17 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Continuous Observations  Discrete: t rm M P Ot | qt  S j ,     b j m  m 1  1 if Ot  v m r  0 otherwise t m Gaussian mixture (Discretize using k-means): L P Ot | qt  S j ,     P Gjl  p Ot | qt  S j ,Gl ,   l 1  Continuous:  P Ot | qt  S j ,   ~ N  j , 2j  Use EM to learn parameters, e.g., ˆ j ~ N l , l    j O     j  t t t Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) t t 18 HMM with Input  Input-dependent observations:       P Ot | qt  S j , x t ,  ~ N g j x t |  j , 2j  Input-dependent transitions (Meila and Jordan, 1996; Bengio and Frasconi, 1996):  P qt 1  S j | qt  Si , x t   Time-delay input: xt  f Ot  ,...,Ot 1  19 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Model Selection in HMM  Left-to-right HMMs: a11 a12 a13 0  0 a  a a 22 23 24  A 0 0 a33 a34    0 0 0 a 44    In classification, for each Ci, estimate P (O | λi) by a separate HMM and use Bayes’ rule P O | i P i  P i | O   j P O |  j P  j  20 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

ppt

Related documents

Products

Support

ppt

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib