Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group

Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University Stationary and Non-stationary Stationary Process: Its statistical properties do not vary with time Non-stationary Process: The signal properties vary over time Markoviana Reading Group Fatih Gelgi – Feb, 2005 2 HMM Example - Casino Coin 0.9 Fair Two CDF tables 0.2 0.1 State transition Pbbties. Unfair States 0.8 0.5 H 0.5 T 0.7 H 0.3 Symbol emission Pbbties. T Observation Symbols HTHHTTHHHTHTHTHHTHHHHHHTHTHH FFFFFFUUUFFFFFFUUUUUUUFFFFFF Observation Sequence State Sequence Motivation: Given a sequence of H & Ts, can you tell at what times the casino cheated? Markoviana Reading Group Fatih Gelgi – Feb, 2005 3 Properties of an HMM  First-order Markov process  qt only depends on qt-1  Time is discrete Markoviana Reading Group Fatih Gelgi – Feb, 2005 4 Elements of an HMM      a S1 S2 ... SN N, the number of States M, the number of Symbols States S1, S2, … SN Observation Symbols O1, O2, … OM l, the Probability Distributions a, b,  S1 . . . . S2 . . . . ... S N . . . . . . . . Markoviana Reading Group b S1 S2 ... SN O1 . . . . O2 . . . . O3 ... OM . . . . . . . . . . . . Fatih Gelgi – Feb, 2005  S1 . S2 . ... S N . . 5 HMM Basic Problems 1. Given an observation sequence O=O1O2O3…OT and l, find P(O|l)  Forward Algorithm / Backward Algorithm 2. Given O=O1O2O3…OT and l, find most likely state sequence Q=q1q2…qT  Viterbi Algorithm 3. Given O=O1O2O3…OT and l, re-estimate l so that P(O|l) is higher than it is now  Baum-Welch Re-estimation Markoviana Reading Group Fatih Gelgi – Feb, 2005 6 Forward Algorithm Illustration at(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005 7 Forward Algorithm Illustration (cont’d) at(i) is the probability of observing a partial sequence O1O2O3…Ot such State Sj SN NbN(O1) S (a1(i) aiN) bN(O2) … … … S6 6b6(O1) S (a1(i) ai6) b6(O2) S5 5b5(O1) S (a1(i) ai5) b5(O2) S4 4b4(O1) S (a1(i) ai4) b4(O2) S3 3b3(O1) S (a1(i) ai3) b3(O2) S2 2b2(O1) S (a1(i) ai2) b2(O2) S1 1b1(O1) S (a1(i) ai1) b1(O2) at(j) O1 O2 O3 O4 … Total of this column gives solution that the state Si. OT Observations Ot Markoviana Reading Group Fatih Gelgi – Feb, 2005 8 Forward Algorithm Definition: a t (i)  P(O1O2 ...Ot , qt  Si | l ) Initialization: a1 (i)   i bi (O1 ) at(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si. 1 i  N Induction: N  a t 1 ( j )   a t (i) aij  b j (Ot 1 ) 1  t  T  1, 1  j  N  i 1  Problem 1 Answer: N Complexity: O(N2T) P(O | l )   a T (i ) i 1 Markoviana Reading Group Fatih Gelgi – Feb, 2005 9 Backward Algorithm Illustration t(i) is the probability of observing a partial sequence Ot+1Ot+2Ot+3…OT such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005 10 Backward Algorithm Definition: t(i) is the probability of Initialization: observing a partial sequence Ot+1Ot+2Ot+3…OT such that the state Si. Induction: Markoviana Reading Group Fatih Gelgi – Feb, 2005 11 Q2: Optimality Criterion 1 * Maximize the expected number of correct individual states Definition: Initialization: Problem 2 Answer: Markoviana Reading Group t(i) is the probability of being in state Si at time t given the observation sequence O and the model l. Problem: If some aij=0, the optimal state sequence may not even be a valid state sequence. Fatih Gelgi – Feb, 2005 12 Q2: Optimality Criterion 2 * Find the single best state sequence (path), i.e. maximize P(Q|O,l). dt(i) is the highest probability of a state path for the partial observation sequence O1O2O3…Ot such that the state Si. Definition: Markoviana Reading Group Fatih Gelgi – Feb, 2005 13 Viterbi Algorithm The major difference from the forward algorithm: Maximization instead of sum Markoviana Reading Group Fatih Gelgi – Feb, 2005 14 Viterbi Algorithm Illustration dt(i) is the highest probability of a state path for the partial State Sj SN N bN(O1) max [d1(i) aiN] … … … S6 6 b6(O1) max [d1(i) ai6] b6(O2) S5 5 b5(O1) max [d1(i) ai5] b5(O2) S4 4 b4(O1) max [d1(i) ai4] b4(O2) S3 3 b3(O1) max [d1(i) ai3] b3(O2) S2 2 b2(O1) max [d1(i) ai2] b2(O2) S1 1 b1(O1) max [d1(i) ai1] b1(O2) dt(j) O1 O2 bN(O2) O3 Observations Ot Markoviana Reading Group Fatih Gelgi – Feb, 2005 O4 … OT Max of this col indicates traceback start observation sequence O1O2O3…Ot such that the state Si. 15 Relations with DBN Forward Function: at+1(j) bj(Ot+1) aij at(i) Backward Function: t(i) t+1(j) bj(Ot+1) aij T(i)=1 Viterbi Algorithm: dt+1(j) bj(Ot+1) Markoviana Reading Group aij Fatih Gelgi – Feb, 2005 dt(i) 16 Some more definitions t(i) is the probability of being in state Si at time t xt(i,j) is the probability of being in state Si at time t, and Sj at time t+1 Markoviana Reading Group Fatih Gelgi – Feb, 2005 17 Baum-Welch Re-estimation Expectation-Maximization Algorithm Expectation: Markoviana Reading Group Fatih Gelgi – Feb, 2005 18 Baum-Welch Re-estimation (cont’d) Maximization: Markoviana Reading Group Fatih Gelgi – Feb, 2005 19 Notes on the Re-estimation  If the model does not change, it means that it has reached a local maxima.  Depending on the model, many local maxima can exist  Re-estimated probabilities will sum to 1 Markoviana Reading Group Fatih Gelgi – Feb, 2005 20 Implementation issues      Scaling Multiple observation sequences Initial parameter estimation Missing data Choice of model size and type Markoviana Reading Group Fatih Gelgi – Feb, 2005 21 Scaling calculation: Recursion to calculate: Markoviana Reading Group Fatih Gelgi – Feb, 2005 22 Scaling (cont’d) calculation: Desired condition: * Note that Markoviana Reading Group is not true! Fatih Gelgi – Feb, 2005 23 Scaling (cont’d) Markoviana Reading Group Fatih Gelgi – Feb, 2005 24 Maximum log-likelihood Initialization: Recursion: Termination: Markoviana Reading Group Fatih Gelgi – Feb, 2005 25 Multiple observations sequences Problem with re-estimation Markoviana Reading Group Fatih Gelgi – Feb, 2005 26 Initial estimates of parameters  For  and A,  Random or uniform is sufficient  For B (discrete symbol prb.),  Good initial estimate is needed Markoviana Reading Group Fatih Gelgi – Feb, 2005 27 Insufficient training data Solutions:  Increase the size of training data  Reduce the size of the model  Interpolate parameters using another model Markoviana Reading Group Fatih Gelgi – Feb, 2005 28 References      L Rabiner. ‘A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.’ Proceedings of the IEEE 1989. S Russell, P Norvig. ‘Probabilistic Reasoning Over Time’. AI: A Modern Approach, Ch.15, 2002 (draft). V Borkar, K Deshmukh, S Sarawagi. ‘Automatic segmentation of text into structured records.’ ACM SIGMOD 2001. T Scheffer, C Decomain, S Wrobel. ‘Active Hidden Markov Models for Information Extraction.’ Proceedings of the International Symposium on Intelligent Data Analysis 2001. S Ray, M Craven. ‘Representing Sentence Structure in Hidden Markov Models for Information Extraction.’ Proceedings of the 17th International Joint Conference on Artificial Intelligence 2001. Markoviana Reading Group Fatih Gelgi – Feb, 2005 29

Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group

Related documents

Products

Support

Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib