ch8.2 (HMM-Based recognition).ppt

Hidden Markov Model Si aij a ji Sj Observation : O1,O2, . . . O1 , O2 , O3 ,, Ot States in time : q1, q2, . . . q1 , q2 , q3 , , qt All states : s1, s2, . . ., sN 1 Hidden Markov Model (Cont’d) Discrete Markov Model P(qt  s j | qt 1  si , qt 2  sk ,, q1  s z )  P(qt  s j | qt 1  si ) Degree 1 Markov Model 2 Hidden Markov Model (Cont’d) aij : Transition Probability from Si to Sj , 1  i, j  N aij  P(qt  s j | qt 1  si ) 3 Discrete Markov Model Example S1 : The weather is rainy S2 : The weather is cloudy S3 : The weather is sunny rainy cloudy sunny 0.4 0.3 0.3 rainy   A  {aij }  0.2 0.6 0.2 cloudy  0.1 0.1 0.8 sunny   4 Hidden Markov Model Example (Cont’d) Question 1:How much is this probability: Sunny-Sunny-Sunny-Rainy-Rainy-Sunny-Cloudy-Cloudy q1q2 q3 q4 q5 q6 q7 q8 s3 s3 s3 s1s1s3 s2 s2 1.536 10 a33a33a31a11a13a32a22 5 4 Hidden Markov Model Example (Cont’d) The probability of being in state i in time t=1  i  P(q1  si ),1  i  N Question 2:The probability of staying in state Si for d days if we are in state Si? d 1 ii P(si si  si s j i )  a (1  aii )  Pi (d ) d Days 6 Discrete Density HMM Components N : Number Of States M : Number Of Outputs A (NxN) : State Transition Probability Matrix B (NxM): Output Occurrence Probability in each state  (1xN): Initial State Probability   ( A, B,  ) : Set of HMM Parameters 7 Three Basic HMM Problems Recognition Problem: Given an HMM and a sequence of observations O,what is the probability P(O |  ) ? State Decoding Problem: Given a model and a sequence of observations O, what is the most likely state sequence in the model that produced the observations? Training Problem: Given a model and a sequence of observations O, how should we adjust model parameters in order to maximize P(O |  ) ?    8 First Problem Solution T T t 1 t 1 P(O | q,  )   P(Ot | qt ,  )   bqt (Ot ) P(q |  )  q1aq1q2 aq2 q3  aqT 1qT We Know That: And P( x, y)  P( x | y) P( y) P( x, y | z )  P( x | y, z ) P( y | z ) 9 First Problem Solution (Cont’d)  P(O, q |  )  P(O | q,  ) P(q |  )  P(O, q |  )   q1bq1 (O1 )a q1q2 bq2 (O2 )  a qT 1qT bqT (OT )  P(O |  )   P(O, q |  )  q  b (O1 )a q1q2 bq2 (O2 )  a qT 1qT bqT (OT ) q1 q1 q1q2 qT Computation Order : T O(2TN ) 10 Forward Backward Approach  t (i)  P(O1 , O2 ,, Ot , qt  i |  ) Computing  t (i) 1) Initialization 1 (i)  ibi (O1 ), 1 i  N 11 Forward Backward Approach (Cont’d) 2) Induction : N  t 1 ( j )  [  t (i )aij ]b j (Ot 1 ) i 1 1  t  T  1, 3) Termination : 1 j  N N P(O |  )    T (i) i 1 Computation Order : 2 O( N T ) 12 Backward Variable  t (i)  P(Ot 1 , Ot 2 ,, OT | qt  i,  ) 1) Initialization T (i)  1, 1 i  N 2)Induction N  t (i )   aij b j (Ot 1 )  t 1 ( j ) j 1 t  T  1, T  2,  ,1 and 1 i  N 13 Second Problem Solution Finding the most likely state sequence P (O, qt  i |  )  t (i )  P(qt  i | O,  )  P (O |  ) P (O, qt  i |  )  t (i )  t (i )  N  N  P(O, qt  i |  )   t (i)  t (i) i 1 i 1 Individually most likely state : q  arg max [ t (i )], * t 1 t  T i 14 Viterbi Algorithm Define :  t (i)  max P[q1 , q 2 ,, qt 1 , qt  i, O1 , O2 ,, Ot |  ] q1 , q2 ,, qt 1 1 i  N P is the most likely state sequence with this conditions : state i , time t and observation o 15 Viterbi Algorithm (Cont’d)  t 1 ( j )  [max  t (i)aij ].b j (Ot 1 ) i 1) Initialization  1 (i )  ibi (O1 ),1  i  N  1 (i)  0  t (i ) Is the most likely state before state i at time t-1 16 Viterbi Algorithm (Cont’d) 2) Recursion  t ( j )  max [ t 1 (i )aij ]b j (Ot ) 1i  N  t ( j )  arg max [ t 1 (i )aij ] 1i  N 2  t  T, 1 j  N 17 Viterbi Algorithm (Cont’d) 3) Termination: p  max [ T (i )] * 1i  N q  arg max [ T (i )] * T 1i  N 4)Backtracking: q   t 1 (q ), t  T  1, T  2,,1 * t * t 1 18 Third Problem Solution Parameters Estimation using BaumWelch Or Expectation Maximization (EM) Approach Define:  t (i, j )  P (qt  i, qt 1  j | O,  ) P (O, q t  i, qt 1  j |  )  P (O |  )  t (i )aij b j (Ot 1 )  t 1 ( j )  N N   i 1 j 1 t (i )a ij b j (Ot 1 )  t 1 ( j ) 19 Third Problem Solution (Cont’d) N  t (i)    t (i, j ) j 1 T 1   t (i) t 1 T : Expected value of the number of jumps from state i Expected value of the number of   (i, j ):jumps from state i to state j t 1 t 20 Third Problem Solution (Cont’d) T   ( j) T aij    (i, j ) t 1 T 1 t   t (i) t 1 t b j (k )  t 1 ot Vk T   ( j) t 1 t  i   1 (i) 21 Baum Auxiliary Function Q( |  )   P(O, q |  ' ) log P(O, q |  ) ' q '   if : Q( |  )  Q( |  )  P(O |  )  P(O |  ' ) By this approach we will reach to a local optimum 22 Restrictions Of Reestimation Formulas N  i 1 N a j 1 i ij M 1  1,1  i  N  b (k )  1,1  k 1 j jN 23 Continuous Observation Density We have amounts of a PDF instead of b j (k )  P(Ot  Vk | qt  j ) We have M  b j (Ot )   C jk  (Ot ,  jk ,  jk ),  b j (Ot )dOt  1  k 1 Mixture Coefficients Average Variance 24 Continuous Observation Density Mixture in HMM M1|1 M2|1 M1|2 M2|2 M1|3 M2|3 M3|1 M4|1 M3|2 M4|2 M3|3 M4|3 S2 S3 S1 Dominant Mixture: b j (Ot )  Max C jk (Ot ,  jk ,  jk ) k 25 Continuous Observation Density (Cont’d) Model Parameters:   ( A,  , C ,  , ) N×N 1×N N×M N×M×K N×M×K×K N : Number Of States M : Number Of Mixtures In Each State K : Dimension Of Observation Vector 26 Continuous Observation Density (Cont’d) T C jk   t 1 T M t ( j, k )   t 1 k 1 T  jk   t 1 T t  t 1 t ( j, k ) ( j , k )ot t ( j, k ) 27 Continuous Observation Density (Cont’d) T  jk    ( j , k )  (o t 1 t    )  ( o   ) t t jk jk T   ( j, k ) t 1 t  t ( j, k )  Probability of event j’th state and k’th mixture at time t 28 State Duration Modeling aij Si Sj a ji Probability of staying d times in state i : d 1 ii Pi (d )  a (1  aii ) 29 State Duration Modeling (Cont’d) HMM With clear duration Pi (d ) aij ……. ……. Si Pj (d ) a ji Sj 30 State Duration Modeling (Cont’d) HMM consideration with State Duration : – Selecting q1  i using  i ‘s – Selecting d1 using Pq (d ) – Selecting Observation Sequence O1 , O2 ,, Od using bq (O1 , O2 , , Od ) 1 1 in practice we assume the following independence: 1 d1 bq1 (O1 , O2 ,, Od1 )   bq1 (t , Ot ) t 1 – Selecting next state q2  j using transition probabilities aq1q2 . We also have an additional constraint: aq1q1  0 31 Training In HMM Maximum Likelihood (ML) Maximum Mutual Information (MMI) Minimum Discrimination Information (MDI) 32 Training In HMM Maximum Likelihood (ML) P(o | 1 ) P(o | 2 ) P(o | 3 ) . . . P(o | n )  P  Maximum[ P(O | V )] * r Observation Sequence 33 Training In HMM (Cont’d) Maximum Mutual Information (MMI)  P(O , |  ) I  (O , )  log  P (O ) P( )  ,   {v } Mutual Information    I  (O , )  log P(O | v )  v log  P(O | w , w) P( w) w 1  34 Training In HMM (Cont’d) Minimum Discrimination Information (MDI) Observation : O  (O1 , O2 ,, OT ) Auto correlation : R  ( R1 , R2 ,, Rt )  ( R, P )  inf I (Q : P ) Q  (R) q (o ) I (Q : P )   q(o) log do P (o |  ) 35

ch8.2 (HMM-Based recognition).ppt

Related documents

Products

Support

ch8.2 (HMM-Based recognition).ppt

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib