Brief Introduction to Hidden Markov Models

Apaydin slides with a several modifications and additions by Christoph Eick. Introduction  Modeling dependencies in input; no longer iid; e.g the order of observations in a dataset matters:  Temporal Sequences:   In speech; phonemes in a word (dictionary), words in a sentence (syntax, semantics of the language). Stock market (stock values over time)  Spatial Sequences  Base pairs in DNA Sequences Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 2 Discrete Markov Process  N states: S1, S2, ..., SN  First-order Markov State at “time” t, qt = Si P(qt+1=Sj | qt=Si, qt-1=Sk ,...) = P(qt+1=Sj | qt=Si)  Transition probabilities aij ≡ P(qt+1=Sj | qt=Si) aij ≥ 0 and Σj=1N aij=1  Initial probabilities πi ≡ P(q1=Si) Σj=1N πi=1 3 Stochastic Automaton/Markov Chain T P O  Q | A ,    P q 1  P q t | q t  1    q 1 a q 1q 2  a q T  1q T t 2 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 4 Example: Balls and Urns  Three urns each full of balls of one color S1: blue, S2: red, S3: green   0 . 5 , 0 . 2 , 0 . 3  T  0 .4  A  0 .2   0 . 1 0 .3 0 .6 0 .1 0 .3   0 .2  0 . 8  O  S1 , S1 , S 3 , S 3 P O | A ,    P  S 1   P  S 1 | S 1   P  S 3 | S 1   P  S 3 | S 3    1  a 11  a 13  a 33  0 . 5  0 . 4  0 . 3  0 . 8  0 . 048 5 Balls and Urns: Learning  Given K example sequences of length T ˆ i  aˆ ij   # sequences starting # sequences  with S i  # transition s from S i to S j    1q 1  S i  k k K # transition s from S i    k T- 1 t 1 1q t  S i and q t  1  S j  k   k k T- 1 t 1 1q t  S i  k Remark: Extract the probabilities from the observed sequences: s1-s2-s1-s3 s2-s1-s1-s2  1=1/3, 2=2/3, a11=1/3, a12=1/3, a13=1/3, a21=3/4,… s2-s3-s2-s1 6 http://en.wikipedia.org/wiki/Hidden_Markov_model Hidden Markov Models  States are not observable  Discrete observations {v1,v2,...,vM} are recorded; a probabilistic function of the state  Emission probabilities bj(m) ≡ P(Ot=vm | qt=Sj)  Example: In each urn, there are balls of different colors, but with different probabilities.  For each observation sequence, there are multiple state sequences Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 7 http://a-little-book-of-r-for-bioinformatics.readthedocs.org/en/latest/src/chapter10.htm l HMM Unfolded in Time Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 8 Now a more complicated problem 1 2 3 Markov Chains We observe: Hidden Markov Models What urn sequence create it? 1. 1-1-2-2 (somewhat trivial, as states are observable!) 2. (1 or 2)-(1 or 2)-(2 or 3)-(2 or 3) and the potential sequences have different probabilities—e.g drawing a blue ball from urn1 is more likely than from urn2! 9 Another Motivating Example 10 Elements of an HMM  N: Number of states  M: Number of observation symbols  A = [aij]: N by N state transition probability matrix  B = bj(m): N by M observation probability matrix  Π = [πi]: N by 1 initial state probability vector λ = (A, B, Π), parameter set of HMM Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 11 Three Basic Problems of HMMs 1. Evaluation: Given λ, and sequence O, calculate P (O | λ) 2. Most Likely State Sequence: Given λ and sequence O, find state sequence Q* such that P (Q* | O, λ ) = maxQ P (Q | O , λ ) 3. Learning: Given a set of sequence O={O1,…Ok}, find λ* such that λ* is the most like explanation for the sequences in O. P ( O | λ* )=maxλ k P ( Ok | λ ) (Rabiner, 1989) 12 Evaluation Probability of observing O1-…-Ot and additionally being in state i  Forward variable:  t i   P O 1  O t , q t  S i |   Initializa tion :  1 i    i b i O 1  Recursion :  N   t  1  j      t i a ij  b j O t  1   i 1  Using i the probability of the observed sequence can be computed as follows: P O |    N   i  T i 1 Complexity: O(N2*T) 13 Probability of observing Ot+1-…-OT and additionally being in state i  Backward variable:  t  i   P O t  1  O T | q t  S i ,   Initializa tion :  T i   1 Recursion :  t i   N  a b O   j  ij j t 1 t 1 j 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 14 Finding the Most Likely State Sequence  t i   P q t  S i O ,     t i   t i   N j 1 t(i):=Probability of being in state i at step t.  t  j  t  j  Choose the state that has the highest probability, Observe: O1…OtOt+1…OT for each time step: qt*= arg maxi γt(i)  t i   t i  Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 15 Only briefly discussed in 2014! Viterbi’s Algorithm δt(i) ≡ maxq1q2∙∙∙ qt-1 p(q1q2∙∙∙qt-1,qt =Si,O1∙∙∙Ot | λ)  Initialization: δ1(i) = πibi(O1), ψ1(i) = 0  Recursion: δt(j) = maxi δt-1(i)aijbj(Ot), ψt(j) = argmaxi δt-1(i)aij  Termination: p* = maxi δT(i), qT*= argmaxi δT (i)  Path backtracking: qt* = ψt+1(qt+1* ), t=T-1, T-2, ..., 1 Idea: Combines path probability computations with backtracking over competing paths. 16 Baum-Welch Algorithm BaumWelch Algorithm O={O1,…,OK} Model =(A,B,) Hidden State Sequence O Observed Symbol Sequence Learning a Model  from Sequences O An EM-style algorithm is used! E  Step : This is a hidden(latent) variable, measuring the probability of going from state i to state j at step t+1 observing Ot+1, given a model  and an observed sequence O Ok.  t i , j   P q t  S i , q t  1  S j | O ,    t i , j    t i a ij b j O t  1  t  1  j    k  t i    K j 1 l  t  k a kl b l O t  1  t  1 l   t i , j  This is a hidden(latent) variable, measuring the probability of being in state i step t observing given a model  and an observed sequence O Ok. 18 Baum-Welch Algorithm: M-Step M  step : K ˆ i  bˆ j   1 i  k k 1 aˆ ij  K K T k 1 k 1 K t 1 T k 1     k 1 K T k 1 k 1 t 1 K   m     k 1 t k  j 1O tk T k 1 t 1  vm t 1  t i , j  k  t i  k Probability going from i to j Probability being in i   t i  k Remark: k iterates over the observed sequences O1,…,OK; for each individual sequence OrO r and r are computed in the E-step; then, the actual model  is computed in the M-step by averaging over the estimates of i,aij,bj (based on k and k) for each of the K observed sequences. 19 Baum-Welch Algorithm: Summary Estimate initial model   (A, B,  ) REPEAT E - Step : E stimate  t i  and  t i , j  based on model   (A, B,  ) and O M - step : Reestimate   (A, B,  ) based on  t i , j  UNTIL CONVERGENC E For more discussion see: http://www.robots.ox.ac.uk/~vgg/rg/slides/hmm.pdf O={O1,…,OK} BaumWelch Algorithm Model =(A,B,) See also: http://www.digplanet.com/wiki/Baum%E2%80%93Welch_algorithm Generalization of HMM: Continuous Observations The observations generated at each time step are vectors consisting of k numbers; a multivariate Gaussian with k dimensions is associated with each state j, defining the probabilities of k-dimensional vector v generated when being in state j: P O t | q t  S j ,   ~ N   j ,  j  O Hidden State Sequence =(A, (j,j) j=1,…n,B) Observed Vector Sequence 21 Generalization: HMM with Inputs  Input-dependent observations: P O t | q t  S j , x ,   ~ N g j  x |  j ,  t t 2 j   Input-dependent transitions (Meila and Jordan, 1996; Bengio and Frasconi, 1996): P q t 1  S j | q t  S i , x  Time-delay input: t  x  f O t  ,..., O t 1  t Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 22

Brief Introduction to Hidden Markov Models

Related documents

Products

Support

Brief Introduction to Hidden Markov Models

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib