Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215 1 Outline • Markov Chain • Hidden Markov Model – Observations, hidden states, initial, transition and emission probabilities • Three problems – Pb(observations): forward, backward procedure – Infer hidden states: forward-backward, Viterbi – Estimate parameters: Baum-Welch 2 iid process • iid: independently and identically distributed – Events are not correlated to each other – Current event has no predictive power of future event – E.g. Pb(girl | boy) = Pb(girl), Pb(coin H | H) = Pb(H) Pb(dice 1 | 5) = pb(1) 3 Discrete Markov Chain • Discrete Markov process – Distinct states: S1, S2, …Sn – Regularly spaced discrete times: t = 1, 2,… – Markov chain: future state only depends on present state, but not the path to get here P[qt S j | qt 1 Si , qt 2 Sk ...] P[qt S j | qt 1 Si ] – aij transition probability aij , aij 0, 4 N a j 1 ij 1 Markov Chain Example 1 • States: exam grade 1 – Pass, 2 – Fail • Discrete times: exam #1, #2, # 3, … • State transition probability 0.8 0.2 A {aij } 0.5 0.5 • Given PPPFFF, pb of pass in the next exam P(q7 S1 | A, O {S1 , S1 , S1 , S 2 , S 2 , S 2 }) P(q7 S1 | A, q6 S 2 ) 0.5 5 Markov Chain Example 2 • States: 1 – rain; 2 – cloudy; 3 – sunny • Discrete times: day 1, 2, 3, … • State transition probability 0.4 0.3 0.3 A {aij } 0.2 0.6 0.2 0.1 0.1 0.8 • Given 3 at t=1 P(O {S3 , S3 , S3 , S1 , S1 , S3 , S2 , S3} | A, q1 S3 ) a33 a33 a31 a11 a13 a32 a23 0.8 0.8 0.1 0.4 0.3 0.1 0.2 1.536 104 6 Markov Chain Example 3 • • • • States: fair coin F, unfair (biased) coin B Discrete times: flip 1, 2, 3, … Initial probability: F = 0.6, B = 0.4 Transition probability 0.1 • Prob(FFBBFFFB) 0.9 F B 0.3 0.7 P(O = {FFBBFFFB} | A, p ) = p F × aFF × aFB × aBB × aBF × aFF × aFF × aFB = 0.6 ´ 0.9 ´ 0.1´ 0.7 ´ 0.3´ 0.9 ´ 0.9 ´ 0.1 = 9.19 ´10-4 7 Hidden Markov Model • Coin toss example • Coin transition is a Markov chain • Probability of H/T depends on the coin used • Observation of H/T is a hidden Markov chain (coin state is hidden) 8 Hidden Markov Model • Elements of an HMM (coin toss) – – – – N, the number of states (F / B) M, the number of distinct observation (H / T) A = {aij} state transition probability 0.9 0.1 A 0 . 3 0 . 7 B = {bj(k)} emission probability bF ( H ) 0.5, bF (T ) 0.5 bB ( H ) 0.8, bB (T ) 0.2 – ={i} initial state distribution • F = 0.4, B = 0.6 9 HMM Applications • Stock market: bull/bear market hidden Markov chain, stock daily up/down observed, depends on big market trend • Speech recognition: sentences & words hidden Markov chain, spoken sound observed (heard), depends on the words • Digital signal processing: source signal (0/1) hidden Markov chain, arrival signal fluctuation observed, depends on source • Bioinformatics!! 10 Basic Problems for HMM 1. Given , how to compute P(O|) observing sequence O = O1O2…OT • • Probability of observing HTTHHHT … Forward procedure, backward procedure 2. Given observation sequence O = O1O2…OT and , how to choose state sequence Q = q1q2…qt • • What is the hidden coin behind each flip Forward-backward, Viterbi 3. How to estimate =(A,B,) so as to maximize P(O| ) • • 11 How to estimate coin parameters Baum-Welch (Expectation maximization) Problem 1: P(O|) • Suppose we know the state sequence Q P(O | Q, ) bq1(O1 )bq2 (O2 )...bqT (OT ) –O=H T –Q=F F T B H F H F H B T B P(O | Q, ) bF ( H )bF (T )bB (T )bF ( H )bF ( H )bB ( H )bB (T ) 0.5 0.5 0.2 0.5 0.5 0.8 0.2 –Q=B F B F B B B P(O | Q, ) bB ( H )bF (T )bB (T )bF ( H )bB ( H )bB ( H )bB (T ) 0.8 0.5 0.2 0.5 0.8 0.8 0.2 • Each given path Q has a probability for O 12 Problem 1: P(O|) • What is the prob of this path Q? P(Q | ) –Q=F F B F F B B P(Q | l ) = p F aFF aFB aBF aFF aFB aBB = 0.4 ´ 0.9 ´ 0.1´ 0.3´ 0.9 ´ 0.1´ 0.7 –Q=B F B F B B B P(Q | l ) = p B aBF aFB aBF aFB aBB aBB = 0.6 ´ 0.3´ 0.1 ´ 0.3´ 0.1´ 0.7 ´ 0.7 • Each given path Q has its own probability 13 Problem 1: P(O|) • Therefore, total pb of O = HTTHHHT • Sum over all possible paths Q: each Q with its own pb multiplied by the pb of O given Q P(O | ) P(O, Q | ) P(Q | ) P(O | Q, ) all Q all Q • For path of N long and T hidden states, there are TN paths, unfeasible calculation 14 Solution to Prob1: Forward Procedure • Use dynamic programming • Summing at every time point • Keep previous subproblem solution to speed up current calculation 15 Forward Procedure • Coin toss, O = HTTHHHT • Initialization 1 (i) i bi (O1 ) a1(F) = p F bF (H) = 0.4 ´ 0.5 = 0.2 a1(B) = p B bB (H) = 0.6 ´ 0.8 = 0.48 – Pb of seeing H1 from F1 or B1 F H 16 B T T H … Forward Procedure • Coin toss, O = HTTHHHT • Initialization 1 (i) i bi (O1 ) • Induction ( j) [ (i) a ]b (O N t 1 i 1 t ij j t 1 ) – Pb of seeing T2 from F2 or B2 F B1 F F2 could come from F1 or + Each has its pb, add them 17 up H B + BT T H … Forward Procedure • Coin toss, O = HTTHHHT • Initialization 1 (i) i bi (O1 ) • Induction ( j) [ (i) a ]b (O N t 1 i 1 t ij j t 1 ) 2 ( F ) (1 ( F )aFF 1 ( B)aBF )bF (T ) (0.2 0.9 0.48 0.3) 0.5 0.162 2 ( B) (1 ( F )aFB 1 ( B)aBB )bB (T ) (0.2 0.1 0.48 0.7) 0.2 0.0712 F F + H 18 B T + B T H … Forward Procedure • Coin toss, O = HTTHHHT • Initialization 1 (i) i bi (O1 ) • Induction ( j) [ (i) a ]b (O N t 1 t i 1 ij j t 1 ) 3 ( F ) [0.162 0.9 0.0712 0.3] 0.5 0.08358 3 ( B) [0.162 0.1 0.0712 0.7] 0.2 0.013208 F F + H 19 B F + T + B F + T + B H + B … Forward Procedure • Coin toss, O = HTTHHHT • Initialization 1 (i) i bi (O1 ) • Induction ( j) [ (i) a ]b (O N t 1 t i 1 ij N • Termination P(O | ) (i) i 1 F F + H 20 B + B t 1 ) 4 ( F ) 4 ( B) F T + t j F + T + B H + B … Solution to Prob1: Backward Procedure • Coin toss, O = HTTHHHT • Initialization T * (i ) 1 T * ( F ) T * ( B ) 1 • Pb of coin to see certain flip after it F ...H 21 H H T B Backward Procedure • Coin toss, O = HTTHHHT • Initialization T * (i ) 1 N • Induction t (i) aijbj (Ot 1 )t 1( j) j 1 • Pb of coin to see certain flip after it 22 Backward Procedure • Coin toss, O = HTTHHHT • Initialization T * (i ) 1 N • Induction t (i) aijbj (Ot 1 )t 1( j) j 1 T 1 ( F ) aFF bF (T ) 1 aFBbB (T ) 1 0.9 0.5 0.1 0.2 0.47 T 1 ( B) aBF bF (T ) 1 aBBbB (T ) 1 0.3 0.5 0.7 0.2 0.29 F ...H 23 H H + + T B ? Backward Procedure • Coin toss, O = HTTHHHT • Initialization T * (i ) 1 N • Induction t (i) aijbj (Ot 1 )t 1( j) j 1 T 2 ( F ) aFF bF ( H ) T 1 ( F ) aFBbB ( H ) T 1 ( B) 0.9 0.5 0.47 0.1 0.8 0.29 0.2347 T 2 ( B) aBF bF ( H ) T 1 ( F ) aBBbB ( H ) T 1 ( B) 0.3 0.5 0.47 0.7 0.8 0.29 0.2329 F ...H 24 + + H F + H + B B F + + T B Backward Procedure • Coin toss, O = HTTHHHT • Initialization T * (i ) 1 N • Induction t (i) aijbj (Ot 1 )t 1( j) j 1 • Termination 0 (*) F bF (H )1 (F ) BbB (H )1 (B) • Both forward and backward could be used to solve problem 1, which should give identical results 25 Solution to Problem 2 Forward-Backward Procedure • First run forward and backward separately • Keep track of the scores at every point • Coin toss – α: pb of this coin for seeing all the flips now and before – β: pb of this coin for seeing all the flips after H T T H H H T α1(F) α2(F) α3(F) α4(F) α5(F) α6(F) α7(F) α1(B) α2(B) α3(B) α4(B) α5(B) α6(B) α7(B) β1(F) β2(F) β3(F) β4(F) β5(F) β6(F) β7(F) 26 β1(B) β2(B) β3(B) β4(B) β5(B) β6(B) β7(B) Solution to Problem 2 Forward-Backward Procedure t (i ) t ( i ) t (i ) N ( j ) ( j ) j 1 • Coin toss t t 3 ( F ) 3 ( F ) 3(F ) 3 ( F ) 3 ( F ) 3 ( B) 3 ( B) e. g. 0.0025 0.0047 0.659 0.0025 0.0047 0.0016 0.0038 • Gives probabilistic prediction at every time point • Forward-backward maximizes the expected number of correctly predicted states (coins) 27 Solution to Problem 2 Viterbi Algorithm • Report the path that is most likely to give the observations (i ) b (O ) 1 i i 1 • Initiation 1 (i ) 0 • Recursion t ( j ) max[ t 1 (i )aij ]b j (Ot ) 1i N t ( j ) arg max[ t 1 (i )aij ] 1i N • Termination P* max[ T (i )] 1i N qT* arg max[ T (i )] 1i N • Path (state sequence) backtracking qt* t 1 (qt*1 ), t T 1, T 2,...,1 28 Viterbi Algorithm • Observe: HTTHHHT 1 (i ) i bi (O1 ) • Initiation 1 (i ) 0 1 ( F ) 0.4 0.5 0.2 1 ( B) 0.6 0.8 0.48 29 Viterbi Algorithm • F H B 30 T T H Viterbi Algorithm • Observe: HTTHHHT 1 (i ) i bi (O1 ) • Initiation 1 (i ) 0 t ( j ) max[ t 1 (i )aij ]b j (Ot ) i N • Recursion ( j) 1arg max[ t 1i N t 1 (i )aij ] Max instead of +, keep track path 2 ( F ) [max(1 ( F )aFF , 1 ( B )aBF )]bF (T ) max(0.18,0.144) 0.5 0.09 i 2 ( B ) [max(1 ( F )aFB , 1 ( B )aBB )]bB (T ) max(0.02,0.336) 0.2 0.0672 i 2 ( F ) F , 2 ( B) B 31 Viterbi Algorithm • Max instead of +, keep track of path • Best path (instead of all path) up to here 32 F F H T B B T H Viterbi Algorithm • Observe: HTTHHHT 1 (i ) i bi (O1 ) • Initiation 1 (i ) 0 t ( j ) max[ t 1 (i )aij ]b j (Ot ) 1i N • Recursion ( j) arg max[ t 1i N t 1 (i )aij ] Max instead of +, keep track path 3 ( F ) [max( 2 ( F )aFF , 2 ( B )aBF )]bF (T ) i max(0.09 0.9,0.0672 0.3) 0.5 0.0405 3 ( B ) [max( 2 ( F )aFB , 2 ( B )aBB )]bB (T ) i 33 max(0.09 0.1,0.0672 0.7) 0.2 0.0094 3 ( F ) F ,3 ( B) B Viterbi Algorithm • Max instead of +, keep track of path • Best path (instead of all path) up to here 34 F F F F H T T H B B B B Viterbi Algorithm • Terminate, pick state that gives final best δ score, and backtrack to get path F F F F H T T H B B B B • BFBB most likely to give HTTH 35 Solution to Problem 3 • No optimal way to do this, so find local maximum • Baum-Welch algorithm (equivalent to expectation-maximization) – Random initialize =(A,B,) – Run Viterbi based on and O – Update =(A,B,) • : % of F vs B on Viterbi path • A: frequency of F/B transition on Viterbi path • B: frequency of H/T emitted by F/B 36 HMM in Bioinformatics • • • • • • • 37 Gene prediction Sequence motif finding Protein structure prediction ChIP-chip peak calling Chromatin domains Nucleosome positioning on tiling arrays Copy number variation 38 39 Summary • Markov Chain • Hidden Markov Model – Observations, hidden states, initial, transition and emission probabilities • Three problems – Pb(observations): forward, backward procedure (give same results) – Infer hidden states: forward-backward (pb prediction at each state), Viterbi (best path) – Estimate parameters: Baum-Welch 40