B - STAT 115

Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215 1 Outline • Markov Chain • Hidden Markov Model – Observations, hidden states, initial, transition and emission probabilities • Three problems – Pb(observations): forward, backward procedure – Infer hidden states: forward-backward, Viterbi – Estimate parameters: Baum-Welch 2 iid process • iid: independently and identically distributed – Events are not correlated to each other – Current event has no predictive power of future event – E.g. Pb(girl | boy) = Pb(girl), Pb(coin H | H) = Pb(H) Pb(dice 1 | 5) = pb(1) 3 Discrete Markov Chain • Discrete Markov process – Distinct states: S1, S2, …Sn – Regularly spaced discrete times: t = 1, 2,… – Markov chain: future state only depends on present state, but not the path to get here P[qt  S j | qt 1  Si , qt 2  Sk ...]  P[qt  S j | qt 1  Si ] – aij transition probability  aij , aij  0, 4 N a j 1 ij 1 Markov Chain Example 1 • States: exam grade 1 – Pass, 2 – Fail • Discrete times: exam #1, #2, # 3, … • State transition probability 0.8 0.2 A  {aij }    0.5 0.5 • Given PPPFFF, pb of pass in the next exam P(q7  S1 | A, O  {S1 , S1 , S1 , S 2 , S 2 , S 2 })  P(q7  S1 | A, q6  S 2 )  0.5 5 Markov Chain Example 2 • States: 1 – rain; 2 – cloudy; 3 – sunny • Discrete times: day 1, 2, 3, … • State transition probability 0.4 0.3 0.3 A  {aij }  0.2 0.6 0.2    0.1 0.1 0.8 • Given 3 at t=1 P(O  {S3 , S3 , S3 , S1 , S1 , S3 , S2 , S3} | A, q1  S3 )  a33  a33  a31  a11  a13  a32  a23  0.8  0.8  0.1  0.4  0.3  0.1  0.2  1.536 104 6 Markov Chain Example 3 • • • • States: fair coin F, unfair (biased) coin B Discrete times: flip 1, 2, 3, … Initial probability: F = 0.6, B = 0.4 Transition probability 0.1 • Prob(FFBBFFFB) 0.9 F B 0.3 0.7 P(O = {FFBBFFFB} | A, p ) = p F × aFF × aFB × aBB × aBF × aFF × aFF × aFB = 0.6 ´ 0.9 ´ 0.1´ 0.7 ´ 0.3´ 0.9 ´ 0.9 ´ 0.1 = 9.19 ´10-4 7 Hidden Markov Model • Coin toss example • Coin transition is a Markov chain • Probability of H/T depends on the coin used • Observation of H/T is a hidden Markov chain (coin state is hidden) 8 Hidden Markov Model • Elements of an HMM (coin toss) – – – –  N, the number of states (F / B) M, the number of distinct observation (H / T) A = {aij} state transition probability 0.9 0.1 A   0 . 3 0 . 7   B = {bj(k)} emission probability bF ( H )  0.5, bF (T )  0.5 bB ( H )  0.8, bB (T )  0.2 –  ={i} initial state distribution • F = 0.4, B = 0.6 9 HMM Applications • Stock market: bull/bear market hidden Markov chain, stock daily up/down observed, depends on big market trend • Speech recognition: sentences & words hidden Markov chain, spoken sound observed (heard), depends on the words • Digital signal processing: source signal (0/1) hidden Markov chain, arrival signal fluctuation observed, depends on source • Bioinformatics!! 10 Basic Problems for HMM 1. Given , how to compute P(O|) observing sequence O = O1O2…OT • • Probability of observing HTTHHHT … Forward procedure, backward procedure 2. Given observation sequence O = O1O2…OT and , how to choose state sequence Q = q1q2…qt • • What is the hidden coin behind each flip Forward-backward, Viterbi 3. How to estimate  =(A,B,) so as to maximize P(O| ) • • 11 How to estimate coin parameters  Baum-Welch (Expectation maximization) Problem 1: P(O|) • Suppose we know the state sequence Q P(O | Q,  )  bq1(O1 )bq2 (O2 )...bqT (OT ) –O=H T –Q=F F T B H F H F H B T B P(O | Q,  )  bF ( H )bF (T )bB (T )bF ( H )bF ( H )bB ( H )bB (T )  0.5  0.5  0.2  0.5  0.5  0.8  0.2 –Q=B F B F B B B P(O | Q,  )  bB ( H )bF (T )bB (T )bF ( H )bB ( H )bB ( H )bB (T )  0.8  0.5  0.2  0.5  0.8  0.8  0.2 • Each given path Q has a probability for O 12 Problem 1: P(O|) • What is the prob of this path Q? P(Q |  ) –Q=F F B F F B B P(Q | l ) = p F aFF aFB aBF aFF aFB aBB = 0.4 ´ 0.9 ´ 0.1´ 0.3´ 0.9 ´ 0.1´ 0.7 –Q=B F B F B B B P(Q | l ) = p B aBF aFB aBF aFB aBB aBB = 0.6 ´ 0.3´ 0.1 ´ 0.3´ 0.1´ 0.7 ´ 0.7 • Each given path Q has its own probability 13 Problem 1: P(O|) • Therefore, total pb of O = HTTHHHT • Sum over all possible paths Q: each Q with its own pb multiplied by the pb of O given Q P(O |  )   P(O, Q |  )   P(Q |  ) P(O | Q,  ) all Q all Q • For path of N long and T hidden states, there are TN paths, unfeasible calculation 14 Solution to Prob1: Forward Procedure • Use dynamic programming • Summing at every time point • Keep previous subproblem solution to speed up current calculation 15 Forward Procedure • Coin toss, O = HTTHHHT • Initialization 1 (i)   i bi (O1 ) a1(F) = p F bF (H) = 0.4 ´ 0.5 = 0.2 a1(B) = p B bB (H) = 0.6 ´ 0.8 = 0.48 – Pb of seeing H1 from F1 or B1 F H 16 B T T H … Forward Procedure • Coin toss, O = HTTHHHT • Initialization 1 (i)   i bi (O1 ) • Induction  ( j)  [ (i) a ]b (O N t 1 i 1 t ij j t 1 ) – Pb of seeing T2 from F2 or B2 F B1 F F2 could come from F1 or + Each has its pb, add them 17 up H B + BT T H … Forward Procedure • Coin toss, O = HTTHHHT • Initialization 1 (i)   i bi (O1 ) • Induction  ( j)  [ (i) a ]b (O N t 1 i 1 t ij j t 1 ) 2 ( F )  (1 ( F )aFF  1 ( B)aBF )bF (T )  (0.2  0.9  0.48  0.3)  0.5  0.162 2 ( B)  (1 ( F )aFB  1 ( B)aBB )bB (T )  (0.2  0.1  0.48  0.7)  0.2  0.0712 F F + H 18 B T + B T H … Forward Procedure • Coin toss, O = HTTHHHT • Initialization 1 (i)   i bi (O1 ) • Induction  ( j)  [ (i) a ]b (O N t 1 t i 1 ij j t 1 ) 3 ( F )  [0.162 0.9  0.0712 0.3]  0.5  0.08358 3 ( B)  [0.162 0.1  0.0712 0.7]  0.2  0.013208 F F + H 19 B F + T + B F + T + B H + B … Forward Procedure • Coin toss, O = HTTHHHT • Initialization 1 (i)   i bi (O1 ) • Induction  ( j)  [ (i) a ]b (O N t 1 t i 1 ij N • Termination P(O |  )   (i) i 1 F F + H 20 B + B t 1 )   4 ( F )   4 ( B) F T + t j F + T + B H + B … Solution to Prob1: Backward Procedure • Coin toss, O = HTTHHHT • Initialization T * (i )  1 T * ( F )  T * ( B )  1 • Pb of coin to see certain flip after it F ...H 21 H H T B Backward Procedure • Coin toss, O = HTTHHHT • Initialization T * (i )  1 N • Induction t (i)   aijbj (Ot 1 )t 1( j) j 1 • Pb of coin to see certain flip after it 22 Backward Procedure • Coin toss, O = HTTHHHT • Initialization T * (i )  1 N • Induction t (i)   aijbj (Ot 1 )t 1( j) j 1 T 1 ( F )  aFF bF (T )  1  aFBbB (T )  1  0.9  0.5  0.1  0.2  0.47 T 1 ( B)  aBF bF (T )  1  aBBbB (T )  1  0.3  0.5  0.7  0.2  0.29 F ...H 23 H H + + T B ? Backward Procedure • Coin toss, O = HTTHHHT • Initialization T * (i )  1 N • Induction t (i)   aijbj (Ot 1 )t 1( j) j 1 T 2 ( F )  aFF bF ( H )  T 1 ( F )  aFBbB ( H )  T 1 ( B)  0.9  0.5  0.47  0.1  0.8  0.29  0.2347 T 2 ( B)  aBF bF ( H )  T 1 ( F )  aBBbB ( H )  T 1 ( B)  0.3  0.5  0.47  0.7  0.8  0.29  0.2329 F ...H 24 + + H F + H + B B F + + T B Backward Procedure • Coin toss, O = HTTHHHT • Initialization T * (i )  1 N • Induction t (i)   aijbj (Ot 1 )t 1( j) j 1 • Termination 0 (*)   F bF (H )1 (F )   BbB (H )1 (B) • Both forward and backward could be used to solve problem 1, which should give identical results 25 Solution to Problem 2 Forward-Backward Procedure • First run forward and backward separately • Keep track of the scores at every point • Coin toss – α: pb of this coin for seeing all the flips now and before – β: pb of this coin for seeing all the flips after H T T H H H T α1(F) α2(F) α3(F) α4(F) α5(F) α6(F) α7(F) α1(B) α2(B) α3(B) α4(B) α5(B) α6(B) α7(B) β1(F) β2(F) β3(F) β4(F) β5(F) β6(F) β7(F) 26 β1(B) β2(B) β3(B) β4(B) β5(B) β6(B) β7(B) Solution to Problem 2 Forward-Backward Procedure  t (i )   t ( i )  t (i ) N  ( j ) ( j ) j 1 • Coin toss t t 3 ( F ) 3 ( F )  3(F )  3 ( F ) 3 ( F )  3 ( B) 3 ( B) e. g.  0.0025 0.0047  0.659 0.0025 0.0047 0.0016 0.0038 • Gives probabilistic prediction at every time point • Forward-backward maximizes the expected number of correctly predicted states (coins) 27 Solution to Problem 2 Viterbi Algorithm • Report the path that is most likely to give the observations  (i )   b (O ) 1 i i 1 • Initiation  1 (i )  0 • Recursion  t ( j )  max[ t 1 (i )aij ]b j (Ot ) 1i  N  t ( j )  arg max[ t 1 (i )aij ] 1i  N • Termination P*  max[ T (i )] 1i  N qT*  arg max[ T (i )] 1i  N • Path (state sequence) backtracking qt*   t 1 (qt*1 ), t  T  1, T  2,...,1 28 Viterbi Algorithm • Observe: HTTHHHT 1 (i )   i bi (O1 ) • Initiation  1 (i )  0 1 ( F )  0.4  0.5  0.2 1 ( B)  0.6  0.8  0.48 29 Viterbi Algorithm • F H B 30 T T H Viterbi Algorithm • Observe: HTTHHHT 1 (i )   i bi (O1 ) • Initiation  1 (i )  0  t ( j )  max[ t 1 (i )aij ]b j (Ot ) i  N • Recursion  ( j)  1arg max[ t 1i  N t 1 (i )aij ] Max instead of +, keep track path  2 ( F )  [max(1 ( F )aFF , 1 ( B )aBF )]bF (T )  max(0.18,0.144)  0.5  0.09 i  2 ( B )  [max(1 ( F )aFB , 1 ( B )aBB )]bB (T )  max(0.02,0.336)  0.2  0.0672 i  2 ( F )  F , 2 ( B)  B 31 Viterbi Algorithm • Max instead of +, keep track of path • Best path (instead of all path) up to here 32 F F H T B B T H Viterbi Algorithm • Observe: HTTHHHT 1 (i )   i bi (O1 ) • Initiation  1 (i )  0  t ( j )  max[ t 1 (i )aij ]b j (Ot ) 1i  N • Recursion  ( j)  arg max[ t 1i  N t 1 (i )aij ] Max instead of +, keep track path  3 ( F )  [max( 2 ( F )aFF ,  2 ( B )aBF )]bF (T ) i  max(0.09  0.9,0.0672 0.3)  0.5  0.0405  3 ( B )  [max( 2 ( F )aFB ,  2 ( B )aBB )]bB (T ) i 33  max(0.09  0.1,0.0672 0.7)  0.2  0.0094 3 ( F )  F ,3 ( B)  B Viterbi Algorithm • Max instead of +, keep track of path • Best path (instead of all path) up to here 34 F F F F H T T H B B B B Viterbi Algorithm • Terminate, pick state that gives final best δ score, and backtrack to get path F F F F H T T H B B B B • BFBB most likely to give HTTH 35 Solution to Problem 3 • No optimal way to do this, so find local maximum • Baum-Welch algorithm (equivalent to expectation-maximization) – Random initialize  =(A,B,) – Run Viterbi based on  and O – Update  =(A,B,) • : % of F vs B on Viterbi path • A: frequency of F/B transition on Viterbi path • B: frequency of H/T emitted by F/B 36 HMM in Bioinformatics • • • • • • • 37 Gene prediction Sequence motif finding Protein structure prediction ChIP-chip peak calling Chromatin domains Nucleosome positioning on tiling arrays Copy number variation 38 39 Summary • Markov Chain • Hidden Markov Model – Observations, hidden states, initial, transition and emission probabilities • Three problems – Pb(observations): forward, backward procedure (give same results) – Infer hidden states: forward-backward (pb prediction at each state), Viterbi (best path) – Estimate parameters: Baum-Welch 40

B - STAT 115

Related documents

Products

Support

B - STAT 115

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib