Lecture 7

advertisement
Question
Thomas Jellema & Wouter Van Gool
1
Answer
Thomas Jellema & Wouter Van Gool
2
Pairwise alignment using
HMMs
Wouter van Gool and Thomas Jellema
Thomas Jellema & Wouter Van Gool
3
Pairwise alignment using
HMMs
Contents
•
•
•
•
•
•
•
•
Most probable path
Probability of an alignment
Sub-optimal alignments
Pause
Posterior probability that xi is aligned to yi
Pair HMMs versus FSAs for searching
Conclusion and summary
Questions
Thomas Jellema & Wouter Van Gool
Thomas
Thomas
Thomas
Wouter
Wouter
Wouter
4
4.1 Most probable path
Model that emits a single sequene
Thomas Jellema & Wouter Van Gool
5
4.1 Most probable path
Begin and end state
Thomas Jellema & Wouter Van Gool
6
4.1 Most probable path
Model that emits a pairwise alignment
Thomas Jellema & Wouter Van Gool
7
4.1 Most probable path
Example of a sequence
Seq1: A C T _ C
Seq2: T _ G G C
All : M X M Y M
Thomas Jellema & Wouter Van Gool
8
4.1 Most probable path
Begin and end state
Thomas Jellema & Wouter Van Gool
9
4.1 Most probable path
Finding the most probable path
- The path you choose is the path that has the highest
probability of being the correct alignment.
- The state we choose to be part of the alignment has to be
the state with the highest probability of being correct.
- We calculate the probability of the state being a M, X or Y
and choose the one with the highest probability
- If the probability of ending the alignment is higher
then the next state being a M, X or Y then we end
the alignment
Thomas Jellema & Wouter Van Gool
10
4.1 Most probable path
The probability of emmiting an M is the highest probability
of:
1 previous state X new state M
2 previous state Y new state M
3 previous state M new state M
Thomas Jellema & Wouter Van Gool
11
4.1 Most probable path
Probability of going to the M state
Thomas Jellema & Wouter Van Gool
12
4.1 Most probable path
Viterbi algorithm for pair HMMs
Thomas Jellema & Wouter Van Gool
13
4.1 Most probable path
Finding the most probable path using FSAs
-The most probable path is also the optimal
FSA alignment
Thomas Jellema & Wouter Van Gool
14
4.1 Most probable path
Finding the most probable path using FSAs
Thomas Jellema & Wouter Van Gool
15
4.1 Most probable path
Recurrence relations
Thomas Jellema & Wouter Van Gool
16
4.1 Most probable path
The log odds scoring function

We wish to know if the alignment score is above or
below the score of random alignment.
 The log-odds ratio s(a,b) = log (pab / qaqb).
 log (pab / qaqb)>0 iff the probability that a and b
are related by our model is larger than the
probability that they are picked at random.
Thomas Jellema & Wouter Van Gool
17
4.1 Most probable path
Random model
Thomas Jellema & Wouter Van Gool
18
4.1 Most probable path
M
1-2δ M
τ
X
Y
END
δ
δ
τ
X 1-ε -τ
ε
Y
“Random”
1-ε -τ
τ
ε
τ
X
Y
X 1- η
η
Y
EN
D
1
“Model”
END
1- η
η
EN
D
Thomas Jellema & Wouter Van Gool
1
19
4.1 Most probable path
Transitions
Thomas Jellema & Wouter Van Gool
20
4.1 Most probable path
Transitions
Thomas Jellema & Wouter Van Gool
21
4.1 Most probable path
Optimal log-odds alignment
Thomas Jellema & Wouter Van Gool
22
4.1 Most probable path
A pair HMM for local alignment
Thomas Jellema & Wouter Van Gool
23
Pairwise alignment using
HMMs
Contents
•
•
•
•
•
•
•
•
Most probable path
Probability of an alignment
Sub-optimal alignments
Pause
Posterior probability that xi is aligned to yi
Pair HMMs versus FSAs for searching
Conclusion and summary
Questions
Thomas Jellema & Wouter Van Gool
Thomas
Thomas
Thomas
Wouter
Wouter
Wouter
24
4.2 Probability of an allignment
Probability that a given pair of sequences are related.
Thomas Jellema & Wouter Van Gool
25
4.2 Probability of an allignment
Summing the probabilities
Thomas Jellema & Wouter Van Gool
26
4.2 Probability of an allignment
Thomas Jellema & Wouter Van Gool
27
Pairwise alignment using
HMMs
Contents
•
•
•
•
•
•
•
•
Most probable path
Probability of an alignment
Sub-optimal alignments
Pause
Posterior probability that xi is aligned to yi
Pair HMMs versus FSAs for searching
Conclusion and summary
Questions
Thomas Jellema & Wouter Van Gool
Thomas
Thomas
Thomas
Wouter
Wouter
Wouter
28
4.3 Suboptimal alignment
Finding suboptimal alignments
How to make sample alignments?
Thomas Jellema & Wouter Van Gool
29
4.3 Suboptimal alignment
Finding distinct suboptimal alignments
Thomas Jellema & Wouter Van Gool
30
Pairwise alignment using
HMMs
Contents
•
•
•
•
•
•
•
•
•
Most probable path
Probability of an alignment
Sub-optimal alignments
Pause
Posterior probability that xi is aligned to yi
Example
Pair HMMs versus FSAs for searching
Conclusion or summary
Questions
Thomas Jellema & Wouter Van Gool
Thomas
Thomas
Thomas
Wouter
Wouter
Wouter
Wouter
31
Pairwise alignment using
HMMs
Contents
•
•
•
•
•
•
•
•
Most probable path
Probability of an alignment
Sub-optimal alignments
Pause
Posterior probability that xi is aligned to yi
Pair HMMs versus FSAs for searching
Conclusion and summary
Questions
Thomas Jellema & Wouter Van Gool
Thomas
Thomas
Thomas
Wouter
Wouter
Wouter
32
Posterior probability that xi is
aligned to yi

Local accuracy of an alignment?
 Reliability measure for each part of an
alignment
 HMM as a local alignment measure
 Idea: P(all alignments trough (xi,yi))
P(all alignments of (x,y))
Thomas Jellema & Wouter Van Gool
33
Posterior probability that xi is
aligned to yi
Notation: xi ◊ yi means xi is aligned to yi
Thomas Jellema & Wouter Van Gool
34
Posterior probability that xi is
aligned to yi
Thomas Jellema & Wouter Van Gool
35
Posterior probability that xi is
aligned to yi
Thomas Jellema & Wouter Van Gool
36
Probability alignment

Miyazawa: it seems attractive to find
alignment by maximising P(xi ◊ yi )
 May lead to inconsistencies:
e.g. pairs (i1,i1) & (i2,j2)
i2 > i1 and j1 < j2
Restriction to pairs (i,j) for which
P(xi ◊ yi )>0.5
Thomas Jellema & Wouter Van Gool
37
Posterior probability that xi is
aligned to yi
The expected accuracy of an alignment
Expected overlap between π and paths sampled from the
posterior distribution
A
(

)
P
(x
y
i
j)
(i,j)


Dynamic programming
 A(i 1, j 1)  P ( xi y j )

A(i , j )  max  A(i  1, j )
 A(i , j  1)

Thomas Jellema & Wouter Van Gool
38
Pairwise alignment using
HMMs
Contents
•
•
•
•
•
•
•
•
Most probable path
Probability of an alignment
Sub-optimal alignments
Pause
Posterior probability that xi is aligned to yi
Pair HMMs versus FSAs for searching
Conclusion and summary
Questions
Thomas Jellema & Wouter Van Gool
Thomas
Thomas
Thomas
Wouter
Wouter
Wouter
39
Pairwise alignment using
HMMs
Contents
•
•
•
•
•
•
•
•
Most probable path
Probability of an alignment
Sub-optimal alignments
Pause
Posterior probability that xi is aligned to yi
Pair HMMs versus FSAs for searching
Conclusion and summary
Questions
Thomas Jellema & Wouter Van Gool
Thomas
Thomas
Thomas
Wouter
Wouter
Wouter
40
Pair HMMs versus FSAs for
searching
P(D | M) > P(M | D)
 HMM: maximum data likelihood by
giving the same parameters (i.e.
transition and emission probabilities)
 Bayesian model comparison with
random model R

Thomas Jellema & Wouter Van Gool
41
Pair HMMs versus FSAs for
searching
Problems:
1. Most algorithms do not compute full
probability P(x,y | M) but only best
match
or Viterbi path
2. FSA parameters may not be readily
translated into probabilities
Thomas Jellema & Wouter Van Gool
42
Pair HMMs vs FSAs for
searching
Example: a model whose parameters match
the data need not be the best model
α
S
qa
PS(abac) = α4qaqbqaqc
1
1-α
PB(abac) = 1-α
B
a
1
b
1
a
1
c
Model comparison using the best
match rather than the total
probability
Thomas Jellema & Wouter Van Gool
43
Pair HMMs vs FSAs for searching
Problem: no fixed scaling procedure can make
the scores of this model into the log
probabilities of an HMM
Thomas Jellema & Wouter Van Gool
44
Pair HMMs vs FSAs for
searching
Bayesian model comparision: both HMMs
have same log-odds ratio as previous FSA
Thomas Jellema & Wouter Van Gool
45
Pair HMMs vs FSAs for
searching

Conversion FSA into probabilistic model
– Probabilistic models may underperform
standard alignment methods if Viterbi is used
for database searching.
– Buf if forward algorithm is used, it would be
better than standard methods.
Thomas Jellema & Wouter Van Gool
46
Pairwise alignment using
HMMs
Contents
•
•
•
•
•
•
•
•
•
Most probable path
Probability of an alignment
Sub-optimal alignments
Pause
Posterior probability that xi is aligned to yi
Example
Pair HMMs versus FSAs for searching
Conclusion and summary
Questions
Thomas Jellema & Wouter Van Gool
Thomas
Thomas
Thomas
Wouter
Wouter
Wouter
Wouter
47
Why try to use HMMs?
Many complicated alignment algorithms
can be described as simple Finite State
Machines.
 HMMs have many advantages:
- Parameters can be trained to fit the data: no need

for PAM/BLOSSUM matrices
- HMMs can keep track of all alignments, not just
the best one
Thomas Jellema & Wouter Van Gool
48
New things HMMs we can do
with pair HMMs

Compute probability over all alignments.

Compute relative probability of Viterbi
alignment (or any other alignment).

Sample over all alignments in proportion to their
probability.

Find distinct sub-optimal alignments.

Compute reliability of each part of the best
alignment.

Compute the maximally reliable alignment.
Thomas Jellema & Wouter Van Gool
49
Conclusion

Pairs-HMM work better for sequence
alignment and database search than
penalty score based alignment
algorithms.

Unfortunately both approaches are
O(mn) and hence too slow for large
database searches!
Thomas Jellema & Wouter Van Gool
50
Pairwise alignment using
HMMs
Contents
•
•
•
•
•
•
•
•
Most probable path
Probability of an alignment
Sub-optimal alignments
Pause
Posterior probability that xi is aligned to yi
Pair HMMs versus FSAs for searching
Conclusion or summary
Questions
Thomas Jellema & Wouter Van Gool
Thomas
Thomas
Thomas
Wouter
Wouter
Wouter
51
Download