Question Thomas Jellema & Wouter Van Gool 1 Answer Thomas Jellema & Wouter Van Gool 2 Pairwise alignment using HMMs Wouter van Gool and Thomas Jellema Thomas Jellema & Wouter Van Gool 3 Pairwise alignment using HMMs Contents • • • • • • • • Most probable path Probability of an alignment Sub-optimal alignments Pause Posterior probability that xi is aligned to yi Pair HMMs versus FSAs for searching Conclusion and summary Questions Thomas Jellema & Wouter Van Gool Thomas Thomas Thomas Wouter Wouter Wouter 4 4.1 Most probable path Model that emits a single sequene Thomas Jellema & Wouter Van Gool 5 4.1 Most probable path Begin and end state Thomas Jellema & Wouter Van Gool 6 4.1 Most probable path Model that emits a pairwise alignment Thomas Jellema & Wouter Van Gool 7 4.1 Most probable path Example of a sequence Seq1: A C T _ C Seq2: T _ G G C All : M X M Y M Thomas Jellema & Wouter Van Gool 8 4.1 Most probable path Begin and end state Thomas Jellema & Wouter Van Gool 9 4.1 Most probable path Finding the most probable path - The path you choose is the path that has the highest probability of being the correct alignment. - The state we choose to be part of the alignment has to be the state with the highest probability of being correct. - We calculate the probability of the state being a M, X or Y and choose the one with the highest probability - If the probability of ending the alignment is higher then the next state being a M, X or Y then we end the alignment Thomas Jellema & Wouter Van Gool 10 4.1 Most probable path The probability of emmiting an M is the highest probability of: 1 previous state X new state M 2 previous state Y new state M 3 previous state M new state M Thomas Jellema & Wouter Van Gool 11 4.1 Most probable path Probability of going to the M state Thomas Jellema & Wouter Van Gool 12 4.1 Most probable path Viterbi algorithm for pair HMMs Thomas Jellema & Wouter Van Gool 13 4.1 Most probable path Finding the most probable path using FSAs -The most probable path is also the optimal FSA alignment Thomas Jellema & Wouter Van Gool 14 4.1 Most probable path Finding the most probable path using FSAs Thomas Jellema & Wouter Van Gool 15 4.1 Most probable path Recurrence relations Thomas Jellema & Wouter Van Gool 16 4.1 Most probable path The log odds scoring function We wish to know if the alignment score is above or below the score of random alignment. The log-odds ratio s(a,b) = log (pab / qaqb). log (pab / qaqb)>0 iff the probability that a and b are related by our model is larger than the probability that they are picked at random. Thomas Jellema & Wouter Van Gool 17 4.1 Most probable path Random model Thomas Jellema & Wouter Van Gool 18 4.1 Most probable path M 1-2δ M τ X Y END δ δ τ X 1-ε -τ ε Y “Random” 1-ε -τ τ ε τ X Y X 1- η η Y EN D 1 “Model” END 1- η η EN D Thomas Jellema & Wouter Van Gool 1 19 4.1 Most probable path Transitions Thomas Jellema & Wouter Van Gool 20 4.1 Most probable path Transitions Thomas Jellema & Wouter Van Gool 21 4.1 Most probable path Optimal log-odds alignment Thomas Jellema & Wouter Van Gool 22 4.1 Most probable path A pair HMM for local alignment Thomas Jellema & Wouter Van Gool 23 Pairwise alignment using HMMs Contents • • • • • • • • Most probable path Probability of an alignment Sub-optimal alignments Pause Posterior probability that xi is aligned to yi Pair HMMs versus FSAs for searching Conclusion and summary Questions Thomas Jellema & Wouter Van Gool Thomas Thomas Thomas Wouter Wouter Wouter 24 4.2 Probability of an allignment Probability that a given pair of sequences are related. Thomas Jellema & Wouter Van Gool 25 4.2 Probability of an allignment Summing the probabilities Thomas Jellema & Wouter Van Gool 26 4.2 Probability of an allignment Thomas Jellema & Wouter Van Gool 27 Pairwise alignment using HMMs Contents • • • • • • • • Most probable path Probability of an alignment Sub-optimal alignments Pause Posterior probability that xi is aligned to yi Pair HMMs versus FSAs for searching Conclusion and summary Questions Thomas Jellema & Wouter Van Gool Thomas Thomas Thomas Wouter Wouter Wouter 28 4.3 Suboptimal alignment Finding suboptimal alignments How to make sample alignments? Thomas Jellema & Wouter Van Gool 29 4.3 Suboptimal alignment Finding distinct suboptimal alignments Thomas Jellema & Wouter Van Gool 30 Pairwise alignment using HMMs Contents • • • • • • • • • Most probable path Probability of an alignment Sub-optimal alignments Pause Posterior probability that xi is aligned to yi Example Pair HMMs versus FSAs for searching Conclusion or summary Questions Thomas Jellema & Wouter Van Gool Thomas Thomas Thomas Wouter Wouter Wouter Wouter 31 Pairwise alignment using HMMs Contents • • • • • • • • Most probable path Probability of an alignment Sub-optimal alignments Pause Posterior probability that xi is aligned to yi Pair HMMs versus FSAs for searching Conclusion and summary Questions Thomas Jellema & Wouter Van Gool Thomas Thomas Thomas Wouter Wouter Wouter 32 Posterior probability that xi is aligned to yi Local accuracy of an alignment? Reliability measure for each part of an alignment HMM as a local alignment measure Idea: P(all alignments trough (xi,yi)) P(all alignments of (x,y)) Thomas Jellema & Wouter Van Gool 33 Posterior probability that xi is aligned to yi Notation: xi ◊ yi means xi is aligned to yi Thomas Jellema & Wouter Van Gool 34 Posterior probability that xi is aligned to yi Thomas Jellema & Wouter Van Gool 35 Posterior probability that xi is aligned to yi Thomas Jellema & Wouter Van Gool 36 Probability alignment Miyazawa: it seems attractive to find alignment by maximising P(xi ◊ yi ) May lead to inconsistencies: e.g. pairs (i1,i1) & (i2,j2) i2 > i1 and j1 < j2 Restriction to pairs (i,j) for which P(xi ◊ yi )>0.5 Thomas Jellema & Wouter Van Gool 37 Posterior probability that xi is aligned to yi The expected accuracy of an alignment Expected overlap between π and paths sampled from the posterior distribution A ( ) P (x y i j) (i,j) Dynamic programming A(i 1, j 1) P ( xi y j ) A(i , j ) max A(i 1, j ) A(i , j 1) Thomas Jellema & Wouter Van Gool 38 Pairwise alignment using HMMs Contents • • • • • • • • Most probable path Probability of an alignment Sub-optimal alignments Pause Posterior probability that xi is aligned to yi Pair HMMs versus FSAs for searching Conclusion and summary Questions Thomas Jellema & Wouter Van Gool Thomas Thomas Thomas Wouter Wouter Wouter 39 Pairwise alignment using HMMs Contents • • • • • • • • Most probable path Probability of an alignment Sub-optimal alignments Pause Posterior probability that xi is aligned to yi Pair HMMs versus FSAs for searching Conclusion and summary Questions Thomas Jellema & Wouter Van Gool Thomas Thomas Thomas Wouter Wouter Wouter 40 Pair HMMs versus FSAs for searching P(D | M) > P(M | D) HMM: maximum data likelihood by giving the same parameters (i.e. transition and emission probabilities) Bayesian model comparison with random model R Thomas Jellema & Wouter Van Gool 41 Pair HMMs versus FSAs for searching Problems: 1. Most algorithms do not compute full probability P(x,y | M) but only best match or Viterbi path 2. FSA parameters may not be readily translated into probabilities Thomas Jellema & Wouter Van Gool 42 Pair HMMs vs FSAs for searching Example: a model whose parameters match the data need not be the best model α S qa PS(abac) = α4qaqbqaqc 1 1-α PB(abac) = 1-α B a 1 b 1 a 1 c Model comparison using the best match rather than the total probability Thomas Jellema & Wouter Van Gool 43 Pair HMMs vs FSAs for searching Problem: no fixed scaling procedure can make the scores of this model into the log probabilities of an HMM Thomas Jellema & Wouter Van Gool 44 Pair HMMs vs FSAs for searching Bayesian model comparision: both HMMs have same log-odds ratio as previous FSA Thomas Jellema & Wouter Van Gool 45 Pair HMMs vs FSAs for searching Conversion FSA into probabilistic model – Probabilistic models may underperform standard alignment methods if Viterbi is used for database searching. – Buf if forward algorithm is used, it would be better than standard methods. Thomas Jellema & Wouter Van Gool 46 Pairwise alignment using HMMs Contents • • • • • • • • • Most probable path Probability of an alignment Sub-optimal alignments Pause Posterior probability that xi is aligned to yi Example Pair HMMs versus FSAs for searching Conclusion and summary Questions Thomas Jellema & Wouter Van Gool Thomas Thomas Thomas Wouter Wouter Wouter Wouter 47 Why try to use HMMs? Many complicated alignment algorithms can be described as simple Finite State Machines. HMMs have many advantages: - Parameters can be trained to fit the data: no need for PAM/BLOSSUM matrices - HMMs can keep track of all alignments, not just the best one Thomas Jellema & Wouter Van Gool 48 New things HMMs we can do with pair HMMs Compute probability over all alignments. Compute relative probability of Viterbi alignment (or any other alignment). Sample over all alignments in proportion to their probability. Find distinct sub-optimal alignments. Compute reliability of each part of the best alignment. Compute the maximally reliable alignment. Thomas Jellema & Wouter Van Gool 49 Conclusion Pairs-HMM work better for sequence alignment and database search than penalty score based alignment algorithms. Unfortunately both approaches are O(mn) and hence too slow for large database searches! Thomas Jellema & Wouter Van Gool 50 Pairwise alignment using HMMs Contents • • • • • • • • Most probable path Probability of an alignment Sub-optimal alignments Pause Posterior probability that xi is aligned to yi Pair HMMs versus FSAs for searching Conclusion or summary Questions Thomas Jellema & Wouter Van Gool Thomas Thomas Thomas Wouter Wouter Wouter 51