Hidden Markov Models Pairwise Alignments Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic programming algorithms for pairwise alignment Basis for a probabilistic modelling of the gapped alignment process by converting the FSA into HMM Advantages: 1) use resulting probabilistic model to explore reliability of the alignment and explore alternative alignments 2) weighting all alternative alignments probabilistically yields scores of similarity independent of any specific alignment Hidden Markov Models X qxi δδ δ B τ 1-ε -τ τ M pxiyj 1-2δ - τ E τ 1-ε -τ δ ε δ Y qyj ε Hidden Markov Models Pair Hidden Markov Models generate an aligned pair of sequences Start in the Begin state B and cycle over the following two steps: 1) pick the next state according to the transition probability distributions leaving the current state 2) pick a symbol pair to be added to the alignment according to the emission probability distribution in the new state Stop when a transition into the End state E is made Hidden Markov Models State M has emission probability distribution pab for emitting an aligned pair a:b States X ynd Y have distributions qxi for emitting symbol xi from sequence x against a gap The transition probability from M to an insert state X or Y is denoted δ and the probability of staying in an insert state by ε The probability for transition into an end state is denoted τ All algorithms discussed so far carry across to pair HMMs The total probability of generating a particular alignment is just the product of the probabilities of each individual step. Hidden Markov Models Viterbi Algorithm for pair HMMs Initialisation: v M 0, 0 1, v i, 0 v 0, j 0 i, j Recurrence: i 1,...., n j 1,....., m v M i, j p xi y j 1 2 v M i 1, j 1 max 1 v X i 1, j 1 1 vY i 1, j 1 M v i 1, j X v i, j qxi max X v i 1, j vY i , j q y j M v i, j 1 max Y v i, j 1 Termination: v E max v M n, m , v X n, m , vY n, m Hidden Markov Models probabilistic model for a random alignment 1-η η η X qxi η 1-η Y qyj 1-η E B η 1-η Hidden Markov Model The main states X and Y emit the two sequences independently The silent state does not emit any symbol but gathers input from the X and Begin states The probability of a pair of sequences according to the random model is P x, y | R 1 n n q 1 q m xi i 1 1 m nm j 1 n m q q i 1 xi j 1 xj xj Hidden Markov Model Allocate the terms in this expression to those that make up the probability of the Viterbi alignment, so that the log-odds ratio is the sum of the individual log-odds terms Allocate one factor of (1-η) and the corresponding qa factor to each residue that is emitted in a Viterbi step So the match transitions will be allocated (1-η)2qaqb where a and b are the residues matched The insert states will be allocated (1-η)qa where a is the residue inserted As the Viterbi path must account for all residues, exactly (n+m) terms will be used Hidden Markov Model We can now compute in terms of an additive model with log-odds emission scores and log-odds transition scores. In practice this is the most practical way to implement pair HMMs Merge the emission scores with the transitions to produce scores that correspond to the standard terms used in sequence alignment by dynamic programming Now the log-odds version of the Viterbi alignment algorithm can be given in a form that looks like standard pairwise dynamic programming Hidden Markov Models 1 2 pab s a, b log log 2 qa qb 1 1 d log 1 1 2 e log 1 Hidden Markov Model Optimal log-odds alignment Initialisation: V M 0, 0 2 log , V X 0, 0 V Y 0, 0 V i, 1 V 1, j i, j Recursion: i 0,..., n VM j 0,......, m except (0, 0) V M i 1, j 1 i, j s xi , y j max V X i 1, j 1 V Y i 1, j 1 V M i 1, j d V i, j max X V i 1, j e X M V i, j 1 d V i, j max Y V i, j 1 e Y Termination: V max V M n, m , V X n, m c, V Y n, m c Hidden Markov Model The constant c in the termination has the value c log 1 2 log 1 The procedure shows how for any pair HMM we can derive an equivalent finite state automaton for obtaining the most probable alignment