ch8.2 (HMM-Based recognition).ppt

advertisement
Hidden Markov Model
Si
aij
a ji
Sj
Observation : O1,O2, . . .
O1 , O2 , O3 ,, Ot
States in time : q1, q2, . . .
q1 , q2 , q3 , , qt
All states : s1, s2, . . ., sN
1
Hidden Markov Model (Cont’d)
Discrete Markov Model
P(qt  s j | qt 1  si , qt 2  sk ,, q1  s z )
 P(qt  s j | qt 1  si )
Degree 1 Markov Model
2
Hidden Markov Model (Cont’d)
aij : Transition Probability from Si to Sj ,
1  i, j  N
aij  P(qt  s j | qt 1  si )
3
Discrete Markov Model
Example
S1 : The weather is rainy
S2 : The weather is cloudy
S3 : The weather is sunny
rainy cloudy sunny
0.4 0.3 0.3 rainy


A  {aij }  0.2 0.6 0.2 cloudy
 0.1 0.1 0.8 sunny


4
Hidden Markov Model Example
(Cont’d)
Question 1:How much is this probability:
Sunny-Sunny-Sunny-Rainy-Rainy-Sunny-Cloudy-Cloudy
q1q2 q3 q4 q5 q6 q7 q8
s3 s3 s3 s1s1s3 s2 s2 1.536 10
a33a33a31a11a13a32a22
5
4
Hidden Markov Model Example
(Cont’d)
The probability of being in state i in time t=1
 i  P(q1  si ),1  i  N
Question 2:The probability of staying in state Si for d
days if we are in state Si?
d 1
ii
P(si si  si s j i )  a (1  aii )  Pi (d )
d Days
6
Discrete Density HMM
Components
N : Number Of States
M : Number Of Outputs
A (NxN) : State Transition Probability
Matrix
B (NxM): Output Occurrence Probability in
each state
 (1xN): Initial State Probability
  ( A, B,  )
: Set of HMM Parameters
7
Three Basic HMM Problems
Recognition Problem:
Given an HMM and a sequence of
observations O,what is the probability P(O |  ) ?
State Decoding Problem:
Given a model and a sequence of
observations O, what is the most likely state
sequence in the model that produced the
observations?
Training Problem:
Given a model and a sequence of
observations O, how should we adjust model
parameters in order to maximize P(O |  ) ?



8
First Problem Solution
T
T
t 1
t 1
P(O | q,  )   P(Ot | qt ,  )   bqt (Ot )
P(q |  )  q1aq1q2 aq2 q3  aqT 1qT
We Know That:
And
P( x, y)  P( x | y) P( y)
P( x, y | z )  P( x | y, z ) P( y | z )
9
First Problem Solution (Cont’d)
 P(O, q |  )  P(O | q,  ) P(q |  )
 P(O, q |  ) 
 q1bq1 (O1 )a q1q2 bq2 (O2 )  a qT 1qT bqT (OT )
 P(O |  )   P(O, q |  ) 
q

b (O1 )a q1q2 bq2 (O2 )  a qT 1qT bqT (OT )
q1 q1
q1q2 qT
Computation Order :
T
O(2TN )
10
Forward Backward Approach
 t (i)  P(O1 , O2 ,, Ot , qt  i |  )
Computing 
t
(i)
1) Initialization
1 (i)  ibi (O1 ),
1 i  N
11
Forward Backward Approach
(Cont’d)
2) Induction :
N
 t 1 ( j )  [  t (i )aij ]b j (Ot 1 )
i 1
1  t  T  1,
3) Termination :
1 j  N
N
P(O |  )    T (i)
i 1
Computation Order :
2
O( N T )
12
Backward Variable
 t (i)  P(Ot 1 , Ot 2 ,, OT | qt  i,  )
1) Initialization
T (i)  1,
1 i  N
2)Induction
N
 t (i )   aij b j (Ot 1 )  t 1 ( j )
j 1
t  T  1, T  2,  ,1
and
1 i  N
13
Second Problem Solution
Finding the most likely state sequence
P (O, qt  i |  )
 t (i )  P(qt  i | O,  ) 
P (O |  )
P (O, qt  i |  )
 t (i )  t (i )
 N
 N
 P(O, qt  i |  )   t (i)  t (i)
i 1
i 1
Individually most likely state :
q  arg max [ t (i )],
*
t
1 t  T
i
14
Viterbi Algorithm
Define :
 t (i) 
max P[q1 , q 2 ,, qt 1 , qt  i, O1 , O2 ,, Ot |  ]
q1 , q2 ,, qt 1
1 i  N
P is the most likely state sequence with this
conditions : state i , time t and observation o
15
Viterbi Algorithm (Cont’d)
 t 1 ( j )  [max  t (i)aij ].b j (Ot 1 )
i
1) Initialization
 1 (i )  ibi (O1 ),1  i  N
 1 (i)  0
 t (i ) Is the most likely state before state i
at time t-1
16
Viterbi Algorithm (Cont’d)
2) Recursion
 t ( j )  max [ t 1 (i )aij ]b j (Ot )
1i  N
 t ( j )  arg max [ t 1 (i )aij ]
1i  N
2  t  T,
1 j  N
17
Viterbi Algorithm (Cont’d)
3) Termination:
p  max [ T (i )]
*
1i  N
q  arg max [ T (i )]
*
T
1i  N
4)Backtracking:
q   t 1 (q ), t  T  1, T  2,,1
*
t
*
t 1
18
Third Problem Solution
Parameters Estimation using BaumWelch Or Expectation Maximization
(EM) Approach
Define:  t (i, j )  P (qt  i, qt 1  j | O,  )
P (O, q t  i, qt 1  j |  )

P (O |  )
 t (i )aij b j (Ot 1 )  t 1 ( j )

N
N
 
i 1 j 1
t
(i )a ij b j (Ot 1 )  t 1 ( j )
19
Third Problem Solution
(Cont’d)
N
 t (i)    t (i, j )
j 1
T 1
  t (i)
t 1
T
: Expected value of the number of
jumps from state i
Expected value of the number of
  (i, j ):jumps
from state i to state j
t 1
t
20
Third Problem Solution
(Cont’d)
T
  ( j)
T
aij 
  (i, j )
t 1
T 1
t
  t (i)
t 1
t
b j (k ) 
t 1
ot Vk
T
  ( j)
t 1
t
 i   1 (i)
21
Baum Auxiliary Function
Q( |  )   P(O, q |  ' ) log P(O, q |  )
'
q
'


if : Q( |  )  Q( |  )
 P(O |  )  P(O |  ' )
By this approach we will reach to a local
optimum
22
Restrictions Of
Reestimation Formulas
N

i 1
N
a
j 1
i
ij
M
1
 1,1  i  N
 b (k )  1,1 
k 1
j
jN
23
Continuous Observation
Density
We have amounts of a PDF instead of
b j (k )  P(Ot  Vk | qt  j )
We have
M

b j (Ot )   C jk  (Ot ,  jk ,  jk ),  b j (Ot )dOt  1

k 1
Mixture
Coefficients
Average
Variance
24
Continuous Observation
Density
Mixture in HMM
M1|1 M2|1
M1|2 M2|2
M1|3 M2|3
M3|1 M4|1
M3|2 M4|2
M3|3 M4|3
S2
S3
S1
Dominant Mixture:
b j (Ot )  Max C jk (Ot ,  jk ,  jk )
k
25
Continuous Observation
Density (Cont’d)
Model Parameters:
  ( A,  , C ,  , )
N×N
1×N
N×M N×M×K N×M×K×K
N : Number Of States
M : Number Of Mixtures In Each State
K : Dimension Of Observation Vector
26
Continuous Observation
Density (Cont’d)
T
C jk 

t 1
T
M
t
( j, k )
 
t 1 k 1
T

jk


t 1
T
t

t 1
t
( j, k )
( j , k )ot
t
( j, k )
27
Continuous Observation
Density (Cont’d)
T
 jk 
  ( j , k )  (o
t 1
t



)

(
o


)
t
t
jk
jk
T
  ( j, k )
t 1
t
 t ( j, k )  Probability of event j’th state
and k’th mixture at time t
28
State Duration Modeling
aij
Si
Sj
a ji
Probability of staying d times in state i :
d 1
ii
Pi (d )  a
(1  aii )
29
State Duration Modeling
(Cont’d)
HMM With clear duration
Pi (d )
aij
…….
…….
Si
Pj (d )
a ji
Sj
30
State Duration Modeling
(Cont’d)
HMM consideration with State Duration :
– Selecting q1  i
using  i ‘s
– Selecting d1 using Pq (d )
– Selecting Observation Sequence O1 , O2 ,, Od
using bq (O1 , O2 , , Od )
1
1
in practice we assume the following independence:
1
d1
bq1 (O1 , O2 ,, Od1 )   bq1 (t , Ot )
t 1
– Selecting next state q2  j using transition
probabilities aq1q2 . We also have an additional
constraint:
aq1q1  0
31
Training In HMM
Maximum Likelihood (ML)
Maximum Mutual Information (MMI)
Minimum Discrimination Information
(MDI)
32
Training In HMM
Maximum Likelihood (ML)
P(o | 1 )
P(o | 2 )
P(o | 3 )
.
.
.
P(o | n )
 P  Maximum[ P(O | V )]
*
r
Observation
Sequence
33
Training In HMM (Cont’d)
Maximum Mutual Information (MMI)

P(O , |  )
I  (O , )  log

P (O ) P( )

,   {v }
Mutual Information


 I  (O , )  log P(O | v ) 
v
log  P(O | w , w) P( w)
w 1

34
Training In HMM (Cont’d)
Minimum Discrimination Information
(MDI)
Observation : O  (O1 , O2 ,, OT )
Auto correlation : R  ( R1 , R2 ,, Rt )
 ( R, P )  inf I (Q : P )
Q  (R)
q (o )
I (Q : P )   q(o) log
do
P (o |  )
35
Download