Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group

advertisement
Hidden Markov Models (HMM)
Rabiner’s Paper
Markoviana Reading Group
Computer Eng. & Science Dept.
Arizona State University
Stationary and Non-stationary
Stationary Process:
Its statistical properties do not vary
with time
Non-stationary Process:
The signal properties vary over time
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
2
HMM Example - Casino Coin
0.9
Fair
Two CDF tables
0.2
0.1
State transition Pbbties.
Unfair
States
0.8
0.5
H
0.5
T
0.7
H
0.3
Symbol emission Pbbties.
T
Observation Symbols
HTHHTTHHHTHTHTHHTHHHHHHTHTHH
FFFFFFUUUFFFFFFUUUUUUUFFFFFF
Observation Sequence
State Sequence
Motivation: Given a sequence of H & Ts, can you tell at what times
the casino cheated?
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
3
Properties of an HMM
 First-order Markov process
 qt only depends on qt-1
 Time is discrete
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
4
Elements of an HMM





a
S1
S2
...
SN
N, the number of States
M, the number of Symbols
States S1, S2, … SN
Observation Symbols O1, O2, … OM
l, the Probability Distributions a, b, 
S1
.
.
.
.
S2
.
.
.
.
... S N
.
.
.
.
.
.
.
.
Markoviana Reading Group
b
S1
S2
...
SN
O1
.
.
.
.
O2
.
.
.
.
O3 ... OM
.
.
.
.
.
.
.
.
.
.
.
.
Fatih Gelgi – Feb, 2005

S1
.
S2
.
... S N
.
.
5
HMM Basic Problems
1. Given an observation sequence
O=O1O2O3…OT and l, find P(O|l)
 Forward Algorithm / Backward Algorithm
2. Given O=O1O2O3…OT and l, find most likely
state sequence Q=q1q2…qT
 Viterbi Algorithm
3. Given O=O1O2O3…OT and l, re-estimate l
so that P(O|l) is higher than it is now
 Baum-Welch Re-estimation
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
6
Forward Algorithm Illustration
at(i) is the probability of
observing a partial
sequence O1O2O3…Ot
such that the state Si.
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
7
Forward Algorithm Illustration (cont’d)
at(i) is the probability of observing a partial sequence O1O2O3…Ot such
State
Sj
SN
NbN(O1)
S (a1(i) aiN) bN(O2)
…
…
…
S6
6b6(O1)
S (a1(i) ai6) b6(O2)
S5
5b5(O1)
S (a1(i) ai5) b5(O2)
S4
4b4(O1)
S (a1(i) ai4) b4(O2)
S3
3b3(O1)
S (a1(i) ai3) b3(O2)
S2
2b2(O1)
S (a1(i) ai2) b2(O2)
S1
1b1(O1)
S (a1(i) ai1) b1(O2)
at(j)
O1
O2
O3
O4
…
Total of this column gives solution
that the state Si.
OT
Observations Ot
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
8
Forward Algorithm
Definition:
a t (i)  P(O1O2 ...Ot , qt  Si | l )
Initialization:
a1 (i)   i bi (O1 )
at(i) is the probability of
observing a partial
sequence O1O2O3…Ot
such that the state Si.
1 i  N
Induction:
N

a t 1 ( j )   a t (i) aij  b j (Ot 1 ) 1  t  T  1, 1  j  N
 i 1

Problem 1 Answer:
N
Complexity: O(N2T)
P(O | l )   a T (i )
i 1
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
9
Backward Algorithm Illustration
t(i) is the probability of
observing a partial
sequence
Ot+1Ot+2Ot+3…OT such
that the state Si.
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
10
Backward Algorithm
Definition:
t(i) is the probability of
Initialization:
observing a partial
sequence
Ot+1Ot+2Ot+3…OT such
that the state Si.
Induction:
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
11
Q2: Optimality Criterion 1
* Maximize the expected number of correct individual states
Definition:
Initialization:
Problem 2 Answer:
Markoviana Reading Group
t(i) is the probability of
being in state Si at time
t given the observation
sequence O and the
model l.
Problem: If some aij=0,
the optimal state
sequence may not even
be a valid state
sequence.
Fatih Gelgi – Feb, 2005
12
Q2: Optimality Criterion 2
* Find the single best state sequence (path), i.e. maximize
P(Q|O,l).
dt(i) is the highest probability of a state path for the partial
observation sequence O1O2O3…Ot such that the state Si.
Definition:
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
13
Viterbi Algorithm
The major difference from
the forward algorithm:
Maximization instead of sum
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
14
Viterbi Algorithm Illustration
dt(i) is the highest probability of a state path for the partial
State
Sj
SN
N bN(O1)
max [d1(i) aiN]
…
…
…
S6
6 b6(O1)
max [d1(i) ai6]
b6(O2)
S5
5 b5(O1)
max [d1(i) ai5]
b5(O2)
S4
4 b4(O1)
max [d1(i) ai4]
b4(O2)
S3
3 b3(O1)
max [d1(i) ai3]
b3(O2)
S2
2 b2(O1)
max [d1(i) ai2]
b2(O2)
S1
1 b1(O1)
max [d1(i) ai1]
b1(O2)
dt(j)
O1
O2
bN(O2)
O3
Observations Ot
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
O4
…
OT
Max of this col indicates traceback start
observation sequence O1O2O3…Ot such that the state Si.
15
Relations with DBN
Forward Function:
at+1(j)
bj(Ot+1)
aij
at(i)
Backward Function:
t(i)
t+1(j)
bj(Ot+1)
aij
T(i)=1
Viterbi Algorithm:
dt+1(j)
bj(Ot+1)
Markoviana Reading Group
aij
Fatih Gelgi – Feb, 2005
dt(i)
16
Some more definitions
t(i) is the probability of
being in state Si at time t
xt(i,j) is the probability of
being in state Si at time t,
and Sj at time t+1
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
17
Baum-Welch Re-estimation
Expectation-Maximization Algorithm
Expectation:
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
18
Baum-Welch Re-estimation (cont’d)
Maximization:
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
19
Notes on the Re-estimation
 If the model does not change, it means that it has
reached a local maxima.
 Depending on the model, many local maxima can
exist
 Re-estimated probabilities will sum to 1
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
20
Implementation issues





Scaling
Multiple observation sequences
Initial parameter estimation
Missing data
Choice of model size and type
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
21
Scaling
calculation:
Recursion to calculate:
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
22
Scaling (cont’d)
calculation:
Desired condition:
* Note that
Markoviana Reading Group
is not true!
Fatih Gelgi – Feb, 2005
23
Scaling (cont’d)
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
24
Maximum log-likelihood
Initialization:
Recursion:
Termination:
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
25
Multiple observations sequences
Problem with re-estimation
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
26
Initial estimates of parameters
 For  and A,
 Random or uniform is sufficient
 For B (discrete symbol prb.),
 Good initial estimate is needed
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
27
Insufficient training data
Solutions:
 Increase the size of training data
 Reduce the size of the model
 Interpolate parameters using another
model
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
28
References





L Rabiner. ‘A Tutorial on Hidden Markov Models and Selected
Applications in Speech Recognition.’ Proceedings of the IEEE
1989.
S Russell, P Norvig. ‘Probabilistic Reasoning Over Time’. AI: A
Modern Approach, Ch.15, 2002 (draft).
V Borkar, K Deshmukh, S Sarawagi. ‘Automatic segmentation
of text into structured records.’ ACM SIGMOD 2001.
T Scheffer, C Decomain, S Wrobel. ‘Active Hidden Markov
Models for Information Extraction.’ Proceedings of the
International Symposium on Intelligent Data Analysis 2001.
S Ray, M Craven. ‘Representing Sentence Structure in Hidden
Markov Models for Information Extraction.’ Proceedings of the
17th International Joint Conference on Artificial Intelligence
2001.
Markoviana Reading Group
Fatih Gelgi – Feb, 2005
29
Download