B - STAT 115

advertisement
Hidden Markov Model
Xiaole Shirley Liu
STAT115, STAT215
1
Outline
• Markov Chain
• Hidden Markov Model
– Observations, hidden states, initial, transition
and emission probabilities
• Three problems
– Pb(observations): forward, backward procedure
– Infer hidden states: forward-backward, Viterbi
– Estimate parameters: Baum-Welch
2
iid process
• iid: independently and identically
distributed
– Events are not correlated to each other
– Current event has no predictive power of future
event
– E.g.
Pb(girl | boy) = Pb(girl),
Pb(coin H | H) = Pb(H)
Pb(dice 1 | 5) = pb(1)
3
Discrete Markov Chain
• Discrete Markov process
– Distinct states: S1, S2, …Sn
– Regularly spaced discrete times: t = 1, 2,…
– Markov chain: future state only depends on
present state, but not the path to get here
P[qt  S j | qt 1  Si , qt 2  Sk ...]  P[qt  S j | qt 1  Si ]
– aij transition probability  aij , aij  0,
4
N
a
j 1
ij
1
Markov Chain Example 1
• States: exam grade 1 – Pass, 2 – Fail
• Discrete times: exam #1, #2, # 3, …
• State transition probability
0.8 0.2
A  {aij }  

0.5 0.5
• Given PPPFFF, pb of pass in the next exam
P(q7  S1 | A, O  {S1 , S1 , S1 , S 2 , S 2 , S 2 })
 P(q7  S1 | A, q6  S 2 )
 0.5
5
Markov Chain Example 2
• States: 1 – rain; 2 – cloudy; 3 – sunny
• Discrete times: day 1, 2, 3, …
• State transition probability
0.4 0.3 0.3
A  {aij }  0.2 0.6 0.2


 0.1 0.1 0.8
• Given 3 at t=1
P(O  {S3 , S3 , S3 , S1 , S1 , S3 , S2 , S3} | A, q1  S3 )
 a33  a33  a31  a11  a13  a32  a23
 0.8  0.8  0.1  0.4  0.3  0.1  0.2  1.536 104
6
Markov Chain Example 3
•
•
•
•
States: fair coin F, unfair (biased) coin B
Discrete times: flip 1, 2, 3, …
Initial probability: F = 0.6, B = 0.4
Transition probability
0.1
• Prob(FFBBFFFB)
0.9
F
B
0.3
0.7
P(O = {FFBBFFFB} | A, p )
= p F × aFF × aFB × aBB × aBF × aFF × aFF × aFB
= 0.6 ´ 0.9 ´ 0.1´ 0.7 ´ 0.3´ 0.9 ´ 0.9 ´ 0.1 = 9.19 ´10-4
7
Hidden Markov Model
• Coin toss example
• Coin transition is a Markov chain
• Probability of H/T depends on the coin used
• Observation of H/T is a hidden Markov
chain (coin state is hidden)
8
Hidden Markov Model
• Elements of an HMM (coin toss)
–
–
–
–

N, the number of states (F / B)
M, the number of distinct observation (H / T)
A = {aij} state transition probability
0.9 0.1
A 

0
.
3
0
.
7


B = {bj(k)} emission probability
bF ( H )  0.5, bF (T )  0.5
bB ( H )  0.8, bB (T )  0.2
–  ={i} initial state distribution
• F = 0.4, B = 0.6
9
HMM Applications
• Stock market: bull/bear market hidden Markov
chain, stock daily up/down observed, depends on
big market trend
• Speech recognition: sentences & words hidden
Markov chain, spoken sound observed (heard),
depends on the words
• Digital signal processing: source signal (0/1)
hidden Markov chain, arrival signal fluctuation
observed, depends on source
• Bioinformatics!!
10
Basic Problems for HMM
1. Given , how to compute P(O|) observing
sequence O = O1O2…OT
•
•
Probability of observing HTTHHHT …
Forward procedure, backward procedure
2. Given observation sequence O = O1O2…OT and
, how to choose state sequence Q = q1q2…qt
•
•
What is the hidden coin behind each flip
Forward-backward, Viterbi
3. How to estimate  =(A,B,) so as to maximize
P(O| )
•
•
11
How to estimate coin parameters 
Baum-Welch (Expectation maximization)
Problem 1: P(O|)
• Suppose we know the state sequence Q
P(O | Q,  )  bq1(O1 )bq2 (O2 )...bqT (OT )
–O=H T
–Q=F F
T
B
H
F
H
F
H
B
T
B
P(O | Q,  )  bF ( H )bF (T )bB (T )bF ( H )bF ( H )bB ( H )bB (T )
 0.5  0.5  0.2  0.5  0.5  0.8  0.2
–Q=B F
B
F
B
B
B
P(O | Q,  )  bB ( H )bF (T )bB (T )bF ( H )bB ( H )bB ( H )bB (T )
 0.8  0.5  0.2  0.5  0.8  0.8  0.2
• Each given path Q has a probability for O
12
Problem 1: P(O|)
• What is the prob of this path Q?
P(Q |  )
–Q=F F
B
F
F
B
B
P(Q | l ) = p F aFF aFB aBF aFF aFB aBB
= 0.4 ´ 0.9 ´ 0.1´ 0.3´ 0.9 ´ 0.1´ 0.7
–Q=B F
B
F
B
B
B
P(Q | l ) = p B aBF aFB aBF aFB aBB aBB
= 0.6 ´ 0.3´ 0.1 ´ 0.3´ 0.1´ 0.7 ´ 0.7
• Each given path Q has its own probability
13
Problem 1: P(O|)
• Therefore, total pb of O = HTTHHHT
• Sum over all possible paths Q: each Q with
its own pb multiplied by the pb of O given
Q
P(O |  )   P(O, Q |  )   P(Q |  ) P(O | Q,  )
all Q
all Q
• For path of N long and T hidden states,
there are TN paths, unfeasible calculation
14
Solution to Prob1:
Forward Procedure
• Use dynamic
programming
• Summing at every time
point
• Keep previous
subproblem solution to
speed up current
calculation
15
Forward Procedure
• Coin toss, O = HTTHHHT
• Initialization 1 (i)   i bi (O1 )
a1(F) = p F bF (H) = 0.4 ´ 0.5 = 0.2
a1(B) = p B bB (H) = 0.6 ´ 0.8 = 0.48
– Pb of seeing H1 from F1 or B1
F
H
16
B
T
T
H
…
Forward Procedure
• Coin toss, O = HTTHHHT
• Initialization 1 (i)   i bi (O1 )
• Induction  ( j)  [ (i) a ]b (O
N
t 1
i 1
t
ij
j
t 1
)
– Pb of seeing T2 from F2 or B2
F
B1
F
F2 could come from F1 or
+
Each has its pb, add them
17
up
H
B
+
BT
T
H
…
Forward Procedure
• Coin toss, O = HTTHHHT
• Initialization 1 (i)   i bi (O1 )
• Induction  ( j)  [ (i) a ]b (O
N
t 1
i 1
t
ij
j
t 1
)
2 ( F )  (1 ( F )aFF  1 ( B)aBF )bF (T )  (0.2  0.9  0.48  0.3)  0.5  0.162
2 ( B)  (1 ( F )aFB  1 ( B)aBB )bB (T )  (0.2  0.1  0.48  0.7)  0.2  0.0712
F
F
+
H
18
B
T
+
B
T
H
…
Forward Procedure
• Coin toss, O = HTTHHHT
• Initialization 1 (i)   i bi (O1 )
• Induction  ( j)  [ (i) a ]b (O
N
t 1
t
i 1
ij
j
t 1
)
3 ( F )  [0.162 0.9  0.0712 0.3]  0.5  0.08358
3 ( B)  [0.162 0.1  0.0712 0.7]  0.2  0.013208
F
F
+
H
19
B
F
+
T
+
B
F
+
T
+
B
H
+
B
…
Forward Procedure
• Coin toss, O = HTTHHHT
• Initialization 1 (i)   i bi (O1 )
• Induction  ( j)  [ (i) a ]b (O
N
t 1
t
i 1
ij
N
• Termination P(O |  )   (i)
i 1
F
F
+
H
20
B
+
B
t 1
)
  4 ( F )   4 ( B)
F
T
+
t
j
F
+
T
+
B
H
+
B
…
Solution to Prob1:
Backward Procedure
• Coin toss, O = HTTHHHT
• Initialization T * (i )  1
T * ( F )  T * ( B )  1
• Pb of coin to see certain flip after it
F
...H
21
H
H
T
B
Backward Procedure
• Coin toss, O = HTTHHHT
• Initialization T * (i )  1
N
• Induction t (i)   aijbj (Ot 1 )t 1( j)
j 1
• Pb of coin to see certain flip after it
22
Backward Procedure
• Coin toss, O = HTTHHHT
• Initialization T * (i )  1
N
• Induction t (i)   aijbj (Ot 1 )t 1( j)
j 1
T 1 ( F )  aFF bF (T )  1  aFBbB (T )  1  0.9  0.5  0.1  0.2  0.47
T 1 ( B)  aBF bF (T )  1  aBBbB (T )  1  0.3  0.5  0.7  0.2  0.29
F
...H
23
H
H
+
+
T
B
?
Backward Procedure
• Coin toss, O = HTTHHHT
• Initialization T * (i )  1
N
• Induction t (i)   aijbj (Ot 1 )t 1( j)
j 1
T 2 ( F )  aFF bF ( H )  T 1 ( F )  aFBbB ( H )  T 1 ( B)  0.9  0.5  0.47  0.1  0.8  0.29  0.2347
T 2 ( B)  aBF bF ( H )  T 1 ( F )  aBBbB ( H )  T 1 ( B)  0.3  0.5  0.47  0.7  0.8  0.29  0.2329
F
...H
24
+
+
H
F
+
H
+
B
B
F
+
+
T
B
Backward Procedure
• Coin toss, O = HTTHHHT
• Initialization T * (i )  1
N
• Induction t (i)   aijbj (Ot 1 )t 1( j)
j 1
• Termination 0 (*)   F bF (H )1 (F )   BbB (H )1 (B)
• Both forward and backward could be used to
solve problem 1, which should give identical
results
25
Solution to Problem 2
Forward-Backward Procedure
• First run forward and backward separately
• Keep track of the scores at every point
• Coin toss
– α: pb of this coin for seeing all the flips now and before
– β: pb of this coin for seeing all the flips after
H
T
T
H
H
H
T
α1(F) α2(F) α3(F) α4(F) α5(F) α6(F) α7(F)
α1(B) α2(B) α3(B) α4(B) α5(B) α6(B) α7(B)
β1(F) β2(F) β3(F) β4(F) β5(F) β6(F) β7(F)
26
β1(B) β2(B) β3(B) β4(B) β5(B) β6(B) β7(B)
Solution to Problem 2
Forward-Backward Procedure
 t (i ) 
 t ( i )  t (i )
N
 ( j ) ( j )
j 1
• Coin toss
t
t
3 ( F ) 3 ( F )
 3(F ) 
3 ( F ) 3 ( F )  3 ( B) 3 ( B)
e. g. 
0.0025 0.0047
 0.659
0.0025 0.0047 0.0016 0.0038
• Gives probabilistic prediction at every time point
• Forward-backward maximizes the expected
number of correctly predicted states (coins)
27
Solution to Problem 2
Viterbi Algorithm
• Report the path that is most likely to give the
observations  (i )   b (O )
1
i i
1
• Initiation
 1 (i )  0
• Recursion
 t ( j )  max[ t 1 (i )aij ]b j (Ot )
1i  N
 t ( j )  arg max[ t 1 (i )aij ]
1i  N
• Termination
P*  max[ T (i )]
1i  N
qT*  arg max[ T (i )]
1i  N
• Path (state sequence) backtracking
qt*   t 1 (qt*1 ), t  T  1, T  2,...,1
28
Viterbi Algorithm
• Observe: HTTHHHT
1 (i )   i bi (O1 )
• Initiation
 1 (i )  0
1 ( F )  0.4  0.5  0.2
1 ( B)  0.6  0.8  0.48
29
Viterbi Algorithm
•
F
H
B
30
T
T
H
Viterbi Algorithm
• Observe: HTTHHHT
1 (i )   i bi (O1 )
• Initiation
 1 (i )  0
 t ( j )  max[ t 1 (i )aij ]b j (Ot )
i  N
• Recursion  ( j)  1arg
max[
t
1i  N
t 1
(i )aij ]
Max instead of +,
keep track path
 2 ( F )  [max(1 ( F )aFF , 1 ( B )aBF )]bF (T )  max(0.18,0.144)  0.5  0.09
i
 2 ( B )  [max(1 ( F )aFB , 1 ( B )aBB )]bB (T )  max(0.02,0.336)  0.2  0.0672
i
 2 ( F )  F , 2 ( B)  B
31
Viterbi Algorithm
• Max instead of +, keep track of path
• Best path (instead of all path) up to here
32
F
F
H
T
B
B
T
H
Viterbi Algorithm
• Observe: HTTHHHT
1 (i )   i bi (O1 )
• Initiation
 1 (i )  0
 t ( j )  max[ t 1 (i )aij ]b j (Ot )
1i  N
• Recursion  ( j)  arg max[
t
1i  N
t 1
(i )aij ]
Max instead of +,
keep track path
 3 ( F )  [max( 2 ( F )aFF ,  2 ( B )aBF )]bF (T )
i
 max(0.09  0.9,0.0672 0.3)  0.5  0.0405
 3 ( B )  [max( 2 ( F )aFB ,  2 ( B )aBB )]bB (T )
i
33
 max(0.09  0.1,0.0672 0.7)  0.2  0.0094
3 ( F )  F ,3 ( B)  B
Viterbi Algorithm
• Max instead of +, keep track of path
• Best path (instead of all path) up to here
34
F
F
F
F
H
T
T
H
B
B
B
B
Viterbi Algorithm
• Terminate, pick state that gives final best δ
score, and backtrack to get path
F
F
F
F
H
T
T
H
B
B
B
B
• BFBB most likely to give HTTH
35
Solution to Problem 3
• No optimal way to do this, so find local
maximum
• Baum-Welch algorithm (equivalent to
expectation-maximization)
– Random initialize  =(A,B,)
– Run Viterbi based on  and O
– Update  =(A,B,)
• : % of F vs B on Viterbi path
• A: frequency of F/B transition on Viterbi path
• B: frequency of H/T emitted by F/B
36
HMM in Bioinformatics
•
•
•
•
•
•
•
37
Gene prediction
Sequence motif finding
Protein structure prediction
ChIP-chip peak calling
Chromatin domains
Nucleosome positioning on tiling arrays
Copy number variation
38
39
Summary
• Markov Chain
• Hidden Markov Model
– Observations, hidden states, initial, transition and
emission probabilities
• Three problems
– Pb(observations): forward, backward procedure (give
same results)
– Infer hidden states: forward-backward (pb prediction at
each state), Viterbi (best path)
– Estimate parameters: Baum-Welch
40
Download