HMM-EM

advertisement
HMM - Part 2
 Review of the last lecture
 The EM algorithm
 Continuous density HMM
1
Three Basic Problems for HMMs
 Given an observation sequence O=(o1,o2,…,oT), and
an HMM =(A,B,)
– Problem 1:
How to compute P(O|) efficiently ?
 The forward algorithm
P(up, up, up, up, up|)?
*  arg max P(O | i )
i
– Problem 2:
How to choose an optimal state sequence Q=(q1,q2,……, qT)
which best explains the observations? Q*  arg max P(O, Q |  )
Q
 The Viterbi algorithm
– Problem 3:
How to adjust the model parameters =(A,B,) to maximize
P(O|)?
 The Baum-Welch (forward-backward) algorithm
cf. The segmental K-means algorithm maximizes P(O, Q* |)
2
The Forward Algorithm
 The forward variable: t i   Po1o2...ot , qt  i λ 
– Probability of o1,o2,…,ot being observed and the state at time t
being i, given model λ
 The forward algorithm
1. Initialization α1  i   P(o1 , q1  i |  )  πi bi  o1  ,1  i  N
N

2. Induction αt 1  j     αt  i  aij  b j  ot 1  , 1  t  T - 1, 1  j  N
 i 1

N
3.Termination P  O λ    αT  i 
i 1
3
The Viterbi Algorithm
1. Initialization
1 i   πi bi o1 , 1  i  N
1 (i )  0, 1  i  N
2. Induction
 t 1  j   max[ t i aij ]b j ot 1 , 1  t  T-1,1  j  N
1i  N
t 1 ( j )  arg max[ t i aij ], 1  t  T-1,1  j  N
1i  N
3. Termination


cf. αt 1  j     αt i aij  b j ot 1 


N
P O,Q * λ  max  T i 
1 i  N
qT*  arg max  T i 
i 1
P O λ    αT i 
1 i  N
4. Backtracking

N
i 1
q*t   t 1 (qt*1 ), t  T  1.T  2,...,1
Q*  (q1* , q2* ,...,qT* ) is the best state sequence
4
The Segmental K-means Algorithm
 Assume that we have a training set of observations and an initial
estimate of model parameters
– Step 1 : Segment the training data
The set of training observation sequences is segmented into states,
based on the current model, by the Viterbi Algorithm
– Step 2 : Re-estimate the model parameters
N
Number of times q1  i
ˆ i  1
ˆ

i 
i

1
Number of training sequences
aˆij 
Number of transitions from state i to state j
Number of transitions from state i
bˆ j  k  
Number of "k " in state j
Number of times in state j
N
 aˆij  1
j 1
M
 bˆ j (k )  1
k 1
– Step 3: Evaluate the model
If the difference between the new and current model scores exceeds a
threshold, go back to Step 1; otherwise, return
5
Segmental K-means vs. Baum-Welch
ˆi 
Number of times q1  i
Number of training sequences
aˆij 
Number of transitions from state i to state j
Number of transitions from state i
ˆi 
aˆij 
bˆ j  k  
Number of "k " in state j
Number of times in state j
Expected number of times q1  i
Number of training sequences
Expected number of transitions from state i to state j
Expected number of transitions from state i
Expected number of " k " in state j
bˆ j  k  
Expected number of times in state j
6
The Backward Algorithm
 The backward variable: t i   Pot 1, ot 2 ,...,oT qt  i, λ
– Probability of ot+1,ot+2,…,oT being observed, given the state at
time t being i and model 
 The backward algorithm
1. Initialization βT  i   1, 1  i  N
N
2. Induction t  i    aij b j  ot 1  t 1  j , 1  t  T - 1, 1  j  N
j 1


N
3. Termination P O λ   1 (i)bi (o1 ) i
i 1
 
cf. P O λ   αT i 
N
i 1
7
The Forward-Backward Algorithm
 Relation between the forward and backward variables
 t i   P o1o2 ...ot , qt  i λ 
 t i   Pot 1ot  2 ...oT qt  i, λ 
N
N
 t i   [   t 1  j a ji ]bi (ot )
 t i    aij b j (ot 1 )  t 1  j 
j 1
j 1
ot
t i t (i)  PO, qt  i λ
PO λ  iN1t i t (i)
(Huang et al., 2001)
8
The Baum-Welch Algorithm (1/3)
 Define two new variables:
t(i)= P(qt = i | O, )
– Probability of being in state i at time t, given O and 
 t i  
P(O, qt  i |  )  t i t i 
 i t i 

 Nt
PO λ 
PO λ 
  t i t i 
i 1
N
 t i    t i, j 
t( i, j )=P(qt = i, qt+1 = j | O, )
j 1
– Probability of being in state i at time t and state j at time t+1, given O
and 
t i, j  
Pqt  i, qt 1  j, O λ 
PO λ 

t i aijb j ot 1 t 1  j 
   t mamn bn ot 1 t 1 n
N N
m1n 1
9
The Baum-Welch Algorithm (2/3)
t(i)= P(qt = i | O, )
– Probability of being in state i at time t, given O and 
L Tl 1
l
   t (i)  expected number of transitions from statei
l 1 t 1
L
l
  1 (i)  expected number of times q1  i
l 1
t( i, j )=P(qt = i, qt+1 = j | O, )
– Probability of being in state i at time t and state j at time t+1, given O
and 
L Tl 1
l
  t (i, j )  expected number of transitions from statei to state j
l 1 t 1
10
The Baum-Welch Algorithm (3/3)
 Re-estimation formulae for  , A, and B are
L
ˆ i 
Expectednumber of times q1  i

Number of trainingsequences
  1 (i )
l
l 1
L
L Tl 1
aˆij 
Expectednumber of transitions from statei to state j

Expectednumber of transitions from statei
l
   t (i, j )
l 1 t 1
L Tl 1
l
   t (i)
l 1 t 1
Tl
L

How do you know
 tl ( j )
t 1
s.t. ot  vk
L Tl
l
  t (
l 1 t 1
l 1
Expectednumber of " k " in state j
bˆ j (k ) 

Expectednumber of times in state j

j)
P (O|ˆ )  P (O| ) ?
11
Maximum Likelihood Estimation for HMM
ML  arg max L( ) ( L( )  P(O |  ))

 arg max l ( ) (l ( )  log P(O |  ))

 arg max log  P(O, Q |  )

Q
However, we cannot find the solution directly.
An alternative way is to find a sequence:
s.t.
0 , 1 ,...,t ,....
l ( )  l ( )  ...  l ( )  ....
0
1
t
12
l ( )  l (t )  log P(O |  )  log P(O | t )
 log  P(O, Q |  )  log P(O | t )
Q
 log 
Q
P(O, Q |  )
P(O | t )
P(O, Q |  ) P(O, Q | t )
 log 

t
Q P (O |  )
P(O, Q | t )
P(O, Q |  )
]
t
t
P(O, Q |  )

P(O, Q |  )
 arg max  P(Q | O, t )  log
Q
P(O, Q | t )

(t 1)  arg max EP (Q|O, ) [log
 arg max  P(Q | O, t )  log P(O, Q |  )

Q
P(O, Q |  ) P(O, Q |  )
 arg max EP (Q|O ,t ) log P(O, Q |  )


Q
P(O | t )
P(O, Q | t )
Q function
P(O, Q |  )
t
 log  P(Q | O,  ) 
Q
P(O, Q | t )
Solvable and can be proved that
P(O, Q |  )
 log EP ( Q|O ,t ) [
]
t
l ( t 1 )  l ( t )
P(O, Q |  )
t 1
t
P(O, Q |  ) Jensen’s inequality
P
(
O
|

)

P
(
O
|

)
 EP (Q|O ,t ) [log
]
t
If f is a concave function,
P(O, Q |  )
and X is a r.v., then
E[f(X)]≤ f(E[X])
13
 log 
t
The EM Algorithm
 EM: Expectation Maximization
– Why EM?
• Simple optimization algorithms for likelihood functions rely on the
intermediate variables, called latent data
For HMM, the state sequence is the latent data
• Direct access to the data necessary to estimate the parameters is
impossible or difficult
For HMM, it is almost impossible to estimate (A, B, ) without
considering the state sequence
– Two Major Steps :
• E step: compute the expectation of the likelihood by including the
latent variables as if they were observed
Q P(Q | O, λ ) log P(O, Q |  )
• M step: compute the maximum likelihood estimates of the parameters
by maximizing the expected likelihood found in the E step
14
Three Steps for EM
Step 1. Draw a lower bound
– Use the Jensen’s inequality
Step 2. Find the best lower bound  auxiliary function
– Let the lower bound touch the objective function at
the current guess
Step 3. Maximize the auxiliary function
– Obtain the new guess
– Go to Step 2 until converge
[Minka 1998]
15
Form an Initial Guess of =(A,B,)
objective function
F ( )
*  arg max P(O |  )

Given the current guess  ,
the goal is to find a new guess NEW
such that F ( )  F (NEW )


current guess
16
Step 1. Draw a Lower Bound
objective function
F ( )
g ( )  F ( )
lower bound function
g ( )


17
Step 2. Find the Best Lower Bound
objective function
F ( )
auxiliary function
g ( ,  )
lower bound function
g ( )


18
Step 3. Maximize the Auxiliary Function
objective function
F ( )
F ( NEW )  F ( )
g ( ,  )
auxiliary function
 NEW


19
Update the Model
objective function
F ( )
 
NEW


20
Step 2. Find the Best Lower Bound
objective function
F ( )
g ( ,  )
auxiliary function


21
Step 3. Maximize the Auxiliary Function
objective function
F ( )
F ( NEW )  F ( )
 NEW

22
Step 1. Draw a Lower Bound (cont’d)
log P(O |  )  log P(O, Q |  )
Objective function
Q
P(O, Q |  )
 log p(Q)
p(Q)
Q
If f is a concave function,
and X is a r.v., then
E[f(X)]≤ f(E[X])
p(Q): an arbitrary
probability distribution
Apply Jensen’s Inequality
P (O , Q |  )
  p (Q) log
p (Q)
Q
A lower bound function of
F ( )
23
Step 2. Find the Best Lower Bound (cont’d)
– Find p (Q) that makes
the lower bound function
touch the objective function
at the current guess 
P(O, Q |  )
 We want to maximize  p(Q) log
w.r.t p(Q) at   
p(Q)
Q
P(O, Q |  )
 The best p (Q)  arg max  p(Q) log
p (Q )
p(Q)
Q
*
24
Step 2. Find the Best Lower Bound (cont’d)
Since  p (Q)  1, we introducea Lagrangemultiplier here
Q


  1   p (Q)   p (Q) log P (O, Q |  )   p (Q) log p (Q)
Q
 Q
 Q
Take the derivative w.r.t p (Q) and set it to zero
   log P (O, Q |  )  log p (Q)  1  0
 log p (Q)    log P (O, Q |  )  1
e  P (O, Q |  )
 p (Q) 
  p (Q)  e 
e
Q
e

 P (O, Q |  )
Q
e
1
e
P (O, Q |  )
P (O, Q |  )

 p (Q) 

 P (Q | O,  )
P (O |  )
 P (O, Q |  )
 P (O, Q |  )
Q
Q
25
Step 2. Find the Best Lower Bound (cont’d)
P(O, Q |  )
 g ( ,  )
 P(Q | O,  ) log
P(Q | O,  )
Q
Define
We can check g ( ,  )   P(Q | O,  ) log
Q
  P(Q | O,  ) log
Q
P(O, Q |  )
P(Q | O,  )
P(Q | O,  ) P(O |  )
P(Q | O,  )
  P(Q | O,  ) log P(O |  )  log P(O |  )
Q
g ( ,  )   P(Q | O,  ) log P(O, Q |  )   P(Q | O,  ) log P(Q | O,  )
Q
Q
Q function
26
EM for HMM Training
 Basic idea
– Assume we have  and the probability that each Q occurred in
the generation of O
i.e., we have in fact observed a complete data pair (O,Q) with
frequency proportional to the probability P(O,Q|)
– We then find a new   ˆ that maximizes
Q P(Q | O, λ ) log P(O, Q |  )
– It can be guaranteed that
Expectation
P(O | ˆ)  P(O | λ )
 EM can discover parameters of model  to maximize the
log-likelihood of the incomplete data, logP(O|), by
iteratively maximizing the expectation of the loglikelihood of the complete data, logP(O,Q|)
27
Solution to Problem 3 - The EM Algorithm
 The auxiliary function
  


Q λ, λ   P Q O, λ log P O, Q 
Q



 P O, Q λ
 
log P O, Q 
Q POλ


 








where P O, Q λ and log P O, Q 


can be expressed as

T 1
T

P O, Q λ   q1   aqt qt 1    bqt ot 
 t 1
 t 1



T 1
T
t 1
t 1
log P O, Q   log  q1   log aqt qt 1   log bqt ot 
28
Solution to Problem 3 - The EM Algorithm
(cont’d)
 The auxiliary function can be rewritten as


 P O,Q λ

T 1
T
Q λ, λ   
[log  q1   log aqt qt 1   log bqt ot ]

all Q  P O λ
t 1
t 1


 
 
 
 
 
example
 Qπ π, λ  Qa a, λ  Qb b, λ

 1 i 
yi
wi

 P O,q  i λ

1
Qπ π, λ   
log  i 

POλ
i 1




N
 
t (i, j )
 
N
N T 1  P
Qa a, λ     
i 1 j 1 t 1 



N
M
Qb b, λ   

O,q
j 1 k 1 tot  vk
wj
 i, qt 1  j λ
t
 
POλ

 t ( j)
 log a
yj


ij



 P O,q  j λ

t

log b j k 


POλ


wk
yk
 
29
Solution to Problem 3 - The EM Algorithm
(cont’d)
 The auxiliary function is separated into three
independent terms, each respectively corresponds
to  i , aij , and b j k 
– Maximization procedure on Qλ, λ  can be done by maximizing
the individual terms separately subject to probability constraints
iN1 i  1, Nj1aij  1  i, kM1b j (k )  1  j
– All these terms have the following form
N
N
j 1
j 1
F y   g  y1, y2 ,..., , y N    w j log y j , where  y j  1, and y j  0
F y  has a maximumvalue when : y j 
wj
N
 wn
n 1
30
Solution to Problem 3 - The EM Algorithm
(cont’d)
 Proof: Apply Lagrange Multiplier
By applyingLagrangeMult iplier
N

Suppose t hat F   w j log y j   w j log y j    y j  1
j 1
j 1
 j 1

F
Let t ing
0
Constraint
y j
N
T hen
wj
 0  
yj
N
wj
yj
e  lim 1  h 1 / h  2.71828...
j
h 0
N
N
N
N
j 1
j 1
j 1
j 1
  y j   y j    w j      w j
 yj 
wj
d ln x
ln(x  h)  ln(x)
ln x  h  / x 
 lim
 lim
h 0
h 0
dx
h
h
h 0
N
 wn
n 1

 lim ln1  h / x 1 / h  lim ln 1  h / x x / h

h 0

1/ x
1
1
1
lim ln1  h / x x / h  ln e 
x h / x0
x
x
31
Solution to Problem 3 - The EM Algorithm
(cont’d)


wi
yi

N  P O,q1  i λ
Qπ π, λ   
log  i 

POλ
i 1


 
ˆi 

P O, q1  i 
 
P O
 
wi
yi 
   i
N
 wn
n 1
1
N

 t (i)  P qt  i O, λ

 wn
n 1

N
P O,q1  n λ
n 1
POλ
 
 

1
32
Solution to Problem 3 - The EM Algorithm
(cont’d)
 

N N T 1  P O,qt
Qa a, λ     
i 1 j 1 t 1 

wj
 i, qt 1  j λ
 
POλ
yj
log a 
ij

wj
 yj  N
 wn
n1
T 1
aˆij 

 P O, qt  i, qt 1  j 
t 1
T 1

 P O, qt  i 
t 1


T 1

 t i, j 
t 1
T 1
  t i 
t 1
33
Solution to Problem 3 - The EM Algorithm
(cont’d)


wk
yk
 P O,q  j λ

N M
t
Qb b, λ     
log b j k 

POλ
j 1 k 1 tot vk 


 
 
yk 
wk
N
 wn
n 1
 P  O, q
T
bˆ j  k  
t
t 1
s.t. ot  vk
T
 P  O, q
t 1
t
 j
 j

 
T

t 1
s.t. ot  vk
T
t  j
  j
t 1
t
34
Solution to Problem 3 - The EM Algorithm
(cont’d)

ˆ , Bˆ
 The new model parameter set ˆ = πˆ , A
expressed as:
ˆi 

P O, q1  i 
aˆij 
t 1
t 1
T 1
 P  O, q
t 1
 P  O, q
T
bˆ j  k  
t
t 1
s.t. ot  vk
T
 P  O, q
t 1
t
t i, j   P qt  i, qt 1  j O, λ
 j
i 
t



1
T 1
t

can be
 t (i )  P qt  i O, λ
   i 
 
 P  O, q  i , q
P O

    i, j 


t 1
T 1
t
  i 
t
t 1
 j
 j
T 1


T


t 1
s.t. ot  vk
T
t  j
  j
t 1
t
35
Discrete vs. Continuous Density HMMs
 Two major types of HMMs according to the observations
– Discrete and finite observation:
• The observations that all distinct states generate are finite in number,
i.e., V={v1, v2, v3, ……, vM}, vkRL
• In this case, the observation probability distribution in state j,
B={bj(k)}, is defined as bj(k)=P(ot=vk|qt=j), 1kM, 1jN
ot : observation at time t, qt : state at time t
 bj(k) consists of only M probability values
– Continuous and infinite observation:
• The observations that all distinct states generate are infinite and
continuous, i.e., V={v| vRL}
• In this case, the observation probability distribution in state j,
B={bj(v)}, is defined as bj(v)=f(ot=v|qt=j), 1jN
ot : observation at time t, qt : state at time t
 bj(v) is a continuous probability density function (pdf) and is
often a mixture of Multivariate Gaussian (Normal) Distributions
36
Gaussian Distribution
 A continuous random variable X is said to have a
Gaussian distribution with mean μ and variance σ2(σ>0) if
X has a continuous pdf in the following form:
 x   2 

f ( X  x | μ,  ) 
exp 
1/ 2
2
2    2 
2
1
37
Multivariate Gaussian Distribution
 If X=(X1,X2,X3,…,Xd) is an d-dimensional random vector with a
multivariate Gaussian distribution with mean vector  and
covariance matrix , then the pdf can be expressed as
f ( X  x)  N (x; μ, Σ) 
μ  Ex

1
2 d / 2 Σ 1/ 2
 1

exp  x  μ T Σ 1 x  μ 
 2

  
Σ  E (x  μ)(x  μ)T  E xx T  μμ T Σ : determinant of Σ

 ij2  E ( xi  i )( x j   j )

 If X1,X2,X3,…,Xd are independent random variables, the covariance
2
matrix is reduced to diagonal, i.e.,  ij  0, i  j
 xi  i 2 


f (X  x | μ, Σ)  
exp
1/ 2

i 1 2   ii
2 ii2 

d
1
38
Multivariate Mixture Gaussian Distribution
 An d-dimensional random vector X=(X1,X2,X3,…,Xd) is
with a multivariate mixture Gaussian distribution if
M
f (x)   wk N (x; μ k , Σk ),
k 1
M
 wk  1
k 1
 In CDHMM, bj(v) is a continuous probability density
function (pdf) and is often a mixture of multivariate
Gaussian distributions



1
 1

T
1
b j v    c jk 
exp  v  μ jk Σ jk v  μ jk  
1/ 2
d
/
2
k 1
 2

 2  Σ jk


Observation vector
Mean vector
M
Covariance matrix
of the k-th mixture
c jk  0 and  c jk  1 of the k-th mixture
of the j-th state
k 1
of the j-th state
M




39
Solution to Problem 3 –
The Segmental K-means Algorithm for CDHMM
 Assume that we have a training set of observations and an initial
estimate of model parameters
– Step 1 : Segment the training data
The set of training observation sequences is segmented into states,
based on the current model, by Viterbi Algorithm
– Step 2 : Re-estimate the model parameters
By partitioning the observation vectors within each state j into M clusters
number of vectors classified into cluster m of state j
number of vectors in state j
 sample mean of vectors classified into cluster m of state j
cˆ jm 
μˆ jm
Σˆ jm  sample covariance matrix of vectors classified into cluster m of state j
Number of times q1  i
Number of transitions from state i to state j
aˆij 
Number of training sequences
Number of transitions from state i
– Step 3: Evaluate the model
If the difference between the new and current model scores exceeds a
threshold, go back to Step 1; otherwise, return
ˆi 
40
Solution to Problem 3 –
The Segmental K-means Algorithm for CDHMM
(cont’d)
 3 states and 4 Gaussian mixtures per state
State
s3
s3
s3
s3
s3
s3
s3
s3
s3
s2
s2
s2
s2
s2
s2
s2
s2
s2
s1
s1
s1
s1
s1
s1
s1
s1
s1
1
2
t
O1
O2
Ot
{12,12,c12}
K-means
Global mean
{11,11,c11}
Cluster 1 mean
Cluster 2mean
{13,13,c13}
{14,14,c14}
41
Solution to Problem 3 –
The Baum-Welch Algorithm for CDHMM
 Define a new variable t(j,k)
– Probability of being in state j at time t with the k-th mixture
component accounting for ot, given O and 
 t  j , k   Pqt  j, mt  k O, λ   Pqt  j O, λ Pmt  k qt  j , O, λ 
  t  j Pmt  k qt  j , O, λ    t  j 
  t  j
Pmt  k , o t qt  j , λ 
Pmt  k , O qt  j , λ 
PO qt  j , λ 
Po t qt  j , λ 



  t  j  t  j    c jk N o t ; μ jk , Σ jk  
N
 M

   t s  t s    c jm N o t ; μ jm , Σ jm 
 s 1
  m 1

Pmt  k , o1 ,...,ot 1 , ot , ot 1 ,...,oT qt  j, λ
Po1 ,...,ot 1 , ot , ot 1 ,...,oT qt  j, λ 
Observation-independent assumption
Pmt  k qt  j, λ Pot qt  j, mt  k , λ 
Pot qt  j, λ 
42
Solution to Problem 3 –
The Baum-Welch Algorithm for CDHMM (cont’d)
 Re-estimation formulae for c jk , μ jk , Σ jk
are
T
Expected number of times in state j and mixture k
cˆ jk 
Expected number of times in state j

   j,k 
t 1
T M
t
   j,m 
t 1 m 1
T
   j,k 

t
t 1
T
  j
t
t
t 1
T
μˆ jk  Weighted average (mean) of observations in state j and mixture k 
   j, k   o
t
t 1
t
T
   j, k 
t 1
t
Σˆ jk  Weighted covariance of observations in state j and mixture k
  t  j, k    ot  μˆ jk  ot  μˆ jk 
T

T
t 1
T
   j, k 
t 1
t
43
 t i  
A Simple Example

The Forward/Backward Procedure
State
S1
S1
S2
S2
S2

2
3
o1
o2
o3
P qt  i, O λ 
 P qt  j , O λ 
N
 t i   t i 
  t  j  t  j 
N
j 1
 t i, j  
3 2 3 2
1 2 1 2
 2 2  2 2
1
P O λ 
j 1
 2 1  2 1
 3 1  3 1
1 1 1 1
S1
P qt  i, O λ 
Time


Pqt  i, qt 1  j , O λ 
PO λ 
P qt  i, qt 1  j , O λ 
  P qt  i, qt 1  j , O λ 
N
N
i 1
j 1
 t i  aijb j (ot 1 )  t 1  j 
   t i aijb j (ot 1 )  t 1  j44
N
N
i 1
j 1
A Simple Example (cont’d)
1 1
start
2
1
a12
2
v4
p (O, q | λ )
a11
1
a 21
2 a 2
22
v7
v4
log p (O , q |  )
q: 1 1 1
log 1  logb1,4  loga11  logb1,7  loga11  logb1,4
1  b1,4  a11  b1,7  a12  b2,4
q: 1 1 2
log1  logb1,4  loga11  logb1,7  loga12  logb2,4
3
1  b1,4  a12  b2,7  a21  b1,4
log1  logb1,4  loga12  logb2,7  loga21  logb1,4
4
1  b1,4  a12  b2,7  a22  b2,4
log1  logb1,4  loga12  logb2,7  loga22  logb2,4
5
 2  b2,4  a21  b1,7  a11  b1,4
log 2  logb2,4  loga21  logb1,7  loga11  logb1,4
6
 2  b2,4  a21  b1,7  a12  b2,4
log 2  logb2,4  loga21  logb1,7  loga12  logb2,4
7
 2  b2,4  a22  b2,7  a21  b1,4
log 2  logb2,4  loga22  logb2,7  loga21  logb1,4
8
 2  b2,4  a22  b2,7  a22  b2,4
log 2  logb2,4  loga22  logb2,7  loga22  logb2,4
1
1  b1,4  a11  b1,7  a11  b1,4
2
Total 8 paths
45
A Simple Example (cont’d)
all  1  2  3  4  5  6  7  8 all paths


 P O,Q λ

T 1
T
Q λ, λ   
[log  q1   log aqt qt 1   log bqt ot ]

all Q  P O λ
t 1
t 1


 
 
 
 
 

  
 Qπ π, λ  Qa a, λ  Qb b, λ
P O ,q1  i λ / P O λ   1 (1)
 1 ( 2)
1  2  3  4 
5  6  7  8

log


log 2
1



all
all





   
  1 (1)  log  1   1 ( 2)  log  2
  
P O ,q1  1, q 2  1 λ / P O λ  P O ,q2  1, q3  1 λ / P O λ  1 (1,1)   2 (1,1)
 1  2 1  5 

3  4 2  6 
 

log a11  

log a12 


all 
all 
 all
  all

 5  6 3  7 

7  8 4  8
 

log
a


log
a
21
22 

 all

all
all
all






...
  (1,1)  log a
t
11
t
  (i, j )  log a
t
ij
t
back
46
Download