Tutorial6HHM

advertisement
Hidden Markov Models
Outline
Markov models
Hidden Markov models
Forward/Backward algorithm
Viterbi algorithm
Baum-Welch estimation algorithm
236607 Visual Recognition Tutorial
1
Markov Models
236607 Visual Recognition Tutorial
2
Observable Markov Models
Observable states: S1 ,S2 ,…,SN Below we will designate
them simply as
1,2,…,N
Actual state at time t is qt.
q1 ,q2 ,…,qt ,…,qT
First order Markov assumption:
P(qt  j | qt 1  i, qt 2  k ,...)  P(qt  j | qt 1  i )
Stationarity:
P(qt  j | qt 1  i)  P(qt l  j | qt l 1  i)
236607 Visual Recognition Tutorial
3
Markov Models
State transition matrix A:
 a11
a
 21

A
 ai1


 aN 1
a12
... a1 j
a22
... a2 j
ai 2
...
aij
aN 2 ... aNj
... a1N 
... a2 N 


... aiN 


... aNN 
aij  P(qt  j | qt 1  i )1  i, j  N
where
Constraints on aij :
aij  0,i, j
N
a
j 1
ij
 1,i
236607 Visual Recognition Tutorial
4
Markov Models: Example
States:
1.
2.
3.
Rainy (R)
Cloudy (C)
Sunny (S)
State transition probability matrix:
0.4 
A  
 
Compute the probability of observing SSRRSCS given that
today is S.
236607 Visual Recognition Tutorial
5
Markov Models: Example
Basic conditional probability rule:
P( A, B)  P( A | B) P( B)
The Markov chain rule:
P(q1 , q2 ,..., qT )
 P(qT | q1 , q2 ,..., qT 1 ) P(q1 , q2 ,..., qT 1 )
 P(qT | qT 1 ) P(q1 , q2 ,..., qT 1 )
 P(qT | qT 1 ) P(qT 1 | qT  2 ) P(q1 , q2 ,..., qT  2 )
 P(qT | qT 1 ) P(qT 1 | qT  2 )...P(q2 | q1 ) P(q1 )
236607 Visual Recognition Tutorial
6
Markov Models: Example
Observation sequence O:
O  ( S , S , S , R, R, S , C , S )
Using the chain rule we get:
P(O | model)
 P( S , S , S , R, R, S , C , S | model)
 P( S ) P( S | S ) P( S | S ) P ( R | S ) P ( R | R ) 
P( S | R) P(C | S ) P( S | C )
  a33a33a31a11a13a32 a23
 (1)(0.8) 2 (0.1)(0.4)(0.3)(0.1)(0.2)  1.536 10 4
236607 Visual Recognition Tutorial
7
Markov Models: Example
What is the probability that the sequence remains in state i
for exactly d time units?
pi (d )  P(q1  i, q2  i,..., qd  i, qd 1  i,...)
  i (aii )d 1 (1  aii )
Exponential Markov chain duration density.
What is the expected value of the duration d in state i?

di   dpi (d )
d 1

  d (aii ) d 1 (1  aii )
d 1
236607 Visual Recognition Tutorial
8
Markov Models: Example

 (1  aii ) d (aii )
d 1
d 1

 (1  aii )
aii
 (a )

 (1  aii )
aii
 aii 


 1  aii 

d 1
d
ii
1

1  aii
236607 Visual Recognition Tutorial
9
Markov Models: Example
• Avg. number of consecutive sunny days =
1
1

5
1  a33 1  0.8
• Avg. number of consecutive cloudy days = 2.5
• Avg. number of consecutive rainy days = 1.67
236607 Visual Recognition Tutorial
10
Hidden Markov Models
States are not observable
Observations are probabilistic functions of state
State transitions are still probabilistic
236607 Visual Recognition Tutorial
11
Urn and Ball Model
N urns containing colored balls
M distinct colors of balls
Each urn has a (possibly) different distribution of colors
Sequence generation algorithm:
1.
2.
3.
4.
Pick initial urn according to some random process.
Randomly pick a ball from the urn and then replace it
Select another urn according a random selection process asociated
with the urn
Repeat steps 2 and 3
236607 Visual Recognition Tutorial
12
The Trellis
N
4
3
2
1
1
2
o1
o2
t-1
t
t+1 t+2
TIME
ot-1 ot
ot+1
ot+2
OBSERVATION
236607 Visual Recognition Tutorial
T-1
T
oT-1
oT
13
Elements of Hidden Markov Models
N – the number of hidden states
Q – set of states Q={1,2,…,N}
M – the number of symbols
V – set of symbols V ={1,2,…,M}
A – the state-transition probability matrix
ai , j  P(qt 1  j | qt  i )1  i, j  N
B – Observation probability distribution:
B j (k )  P(ot  vk | qt  j )1  i, j  M
 - the initial state distribution:
 i  P(q1  i)1  i  N
l– the entire model
l  ( A, B,  )
236607 Visual Recognition Tutorial
14
Three Basic Problems
1.
EVALUATION – given observation O=(o1 , o2 ,…,oT ) and
P (O | l ).
l  ( A, B, , )efficiently compute
model
Hidden states complicate the evaluation
Given two models l1 and l1, this can be used to choose the better
one.
2.
DECODING - given observation O=(o1 , o2 ,…,oT ) and
model l find the optimal state sequence q=(q1 , q2 ,…,qT ) .
Optimality criterion has to be decided (e.g. maximum likelihood)
“Explanation” of the data.
3.
LEARNING – given O=(o1 , o2 ,…,oT ), estimate model
parameters l  ( A, B,  ) that maximize P(O | l ).
236607 Visual Recognition Tutorial
15
Solution to Problem 1
Problem: Compute P(o1 , o2 ,…,oT |l).
Algorithm:
–
–
Let q=(q1 , q2 ,…,qT ) be a state sequence.
Assume the observations are independent:
T
P(O | q, l )   P(ot |qt , l )
i 1
 bq1 (o1 )bq 2 (o2 )...bqT (oT )
–
Probability of a particular state sequence is:
P(q | l )   q1aq1q 2 aq 2 q 3 ...aqT 1qT
–
Also,
P(O, q | l )  P(O | q, l ) P(q | l )
236607 Visual Recognition Tutorial
16
Solution to Problem 1
–
Enumerate paths and sum probabilities:
P(O | l )   P(O | q, l ) P(q | l )
q
N T state sequences and O(T) calculations.
Complexity: O(T N T) calculations.
236607 Visual Recognition Tutorial
17
Forward Procedure: Intuition
N
aNk
a3k
3
2
k
v
a1k
1
t
t+1
TIME
236607 Visual Recognition Tutorial
18
Forward Algorithm
Define forward variable  t (i ) as:
t (i)  P(o1 , o1 ,..., ot , qt  i | l )
 t (i ) is the probability of observing the partial sequence
(o1 , o1 ,..., ot ) such that the state qt is i.
Induction:
1.
2.
Initialization:
Induction:
3.
Termination:
4.
Complexity:
1 (i)   i bi (o1 )
N

 t 1 ( j )   t (i)aij  b j (ot 1 )
 iN1

P(O | l )   T (i )
i 1
O( N 2T )
236607 Visual Recognition Tutorial
19
Example
Consider the following coin-tossing experiment:
P(H)
P(T)
–
–
State 1
State2
State3
0.5
0.5
0.75
0.25
0.25
0.75
state-transition probabilities equal to 1/3
initial state probabilities equal to 1/3
236607 Visual Recognition Tutorial
20
Example
1.
3.
You observe O=(H,H,H,H,T,H,T,T,T,T). What state
sequence q, is most likely? What is the joint probability,
P (O, q | l ), of the observation sequence and the state
sequence?
What is the probability that the observation sequence came
entirely of state 1?
Consider the observation sequence
4.
O  ( H , T , T , H , T , H , H , T , T , H ).
How would your answers to parts 1 and 2 change?
2.
236607 Visual Recognition Tutorial
21
Example
4.
If the state transition probabilities were:
0.90.45


A '  0.050.1 
0.050.45 
How would the new model l’ change your answers to
parts 1-3?
236607 Visual Recognition Tutorial
22
Backward Algorithm
N
4
3
2
1
1
2
o1
o2
t-1
t
t+1 t+2
TIME
ot-1 ot
ot+1
ot+2
OBSERVATION
236607 Visual Recognition Tutorial
T-1
T
oT-1
oT
23
Backward Algorithm
Define backward variable  t (i ) as:
t (i)  P(ot 1 , ot 2 ,..., oT | qt  i, l )
 t (i ) is the probability of observing the partial sequence
(ot 1 , ot  2 ,..., oT ) such that the state qt is i.
Induction:
1. Initialization: T (i)  1
2. Induction:
N
t (i)   aij b j (ot 1 ) t 1 ( j ),1  i  N ,t  T  1,...,1
j 1
236607 Visual Recognition Tutorial
24
Solution to Problem 2
Choose the most likely path
Find the path (q1 , q2 ,…,qT ) that maximizes the likelihood:
P(q1 , q2 ,..., qT | O, l )
Solution by Dynamic Programming
Define
 t (i)  max P(q1, q2 ,..., qt  i, o1, o1,..., ot | l)
q1 , q2 ,..., qt 1
 t (i) is the highest prob. path ending in state i .
By induction we have:
 t 1 ( j )  max[ t (i)aij ]  b j (ot 1 )
i
236607 Visual Recognition Tutorial
25
Viterbi Algorithm
N
aNk
k
4
3
2
a1k
1
1
2
o1
o2
t-1
t
t+1 t+2
TIME
ot-1 ot
ot+1
ot+2
OBSERVATION
236607 Visual Recognition Tutorial
T-1
T
oT-1
oT
26
Viterbi Algorithm
Initialization:
1 (i)   i bi (o1 ),1  i  N
 1 (i)  0
Recursion:
 t ( j )  max[ t 1 (i)aij ]b j (ot )
1i  N
 t ( j )  arg max[ t 1 (i)aij ]
1i  N
Termination:
  t  T ,1  j  N 
PT*  max[ T (i )]
1i  N
qT*  arg max[ T (i )]
1 i  N
Path (state sequence) backtracking:
qt*   t 1  qt*1 t  T  1, T  2,...,1
236607 Visual Recognition Tutorial
27
Solution to Problem 3
Estimate l  ( A, B,  ) to maximize P(O | l )
No analytic method because of complexity – iterative
solution.
Baum-Welch Algorithm (actually EM algorithm) :
1. Let initial model be l
2. Compute new l based on l and observation O.
 If log P(O | l )  log P(O | l0 )  DELTAstop
4. Else set lland go to step 
236607 Visual Recognition Tutorial
28
Baum-Welch: Preliminaries
t (i, j )  P(qt  i, qt 1  j | O, l )
236607 Visual Recognition Tutorial
29
Baum-Welch: Preliminaries
Define  (i, j ) as the probability of being in state i at time
t, and in state j at time t+1.
 t (i )aij b j (ot 1 )  t 1 ( j )
 (i, j ) 
P(O | l )
 t (i)aij b j (ot 1 )  t 1 ( j )

 
N
N
i 1
j 1
 t (i)aij b j (ot 1 )  t 1 ( j )
Define  t (i) as the probability of being in state i at time t,
given the observation sequence
N
 t (i)   t (i, j )
j 1
236607 Visual Recognition Tutorial
30
Baum-Welch: Preliminaries

T

 (i) is the expected number of times state i is visited.
t 1 t
T 1
t (i, j ) is the expected number of transitions from state i
to state j.
t 1
236607 Visual Recognition Tutorial
31
Baum-Welch: Update Rules
_
_ i  expected frequency in state i at time (t=1)   1 (i)
aij  (expected number of transitions from state i to state j) /
_
expected number of transitions from state i):
_
t (i, j )

aij 
  t (i)
bi (k )  (expected number of times in state j and observing
symbol k) / (expected number of times in state j ):
 ( j)

b (k ) 
  ( j)
_
j
t , ot  k
t
t
236607 Visual Recognition Tutorial
32
References
L. Rabiner and B. Juang. An introduction to hidden Markov
models. IEEE ASSP Magazine, p. 4--16, Jan. 1986.
Thad Starner, Joshua Weaver, and Alex Pentland
Machine Vision Recognition of American Sign Language Using
Hidden Markov Model
Thad Starner, Alex Pentland
Visual Recognition of American Sign Language Using
Hidden Markov Models. In International Workshop on
Automatic Face and Gesture Recognition, pages 189--194,
1995. http://citeseer.nj.nec.com/starner95visual.html
236607 Visual Recognition Tutorial
33
Download