Hidden Markov Models

advertisement
Hidden Markov Models
Pairwise Alignments
Hidden Markov Models



Finite state automata with multiple states as a convenient
description of complex dynamic programming algorithms for
pairwise alignment
Basis for a probabilistic modelling of the gapped alignment
process by converting the FSA into HMM
Advantages:
1) use resulting probabilistic model to explore reliability of the
alignment and explore alternative alignments
2) weighting all alternative alignments probabilistically yields
scores of similarity independent of any specific alignment
Hidden Markov Models
X
qxi
δδ
δ
B
τ
1-ε -τ
τ
M
pxiyj
1-2δ - τ
E
τ
1-ε -τ
δ
ε
δ
Y
qyj
ε
Hidden Markov Models



Pair Hidden Markov Models generate an aligned pair of
sequences
Start in the Begin state B and cycle over the following two
steps:
1) pick the next state according to the transition probability
distributions leaving the current state
2) pick a symbol pair to be added to the alignment according to
the emission probability distribution in the new state
Stop when a transition into the End state E is made
Hidden Markov Models






State M has emission probability distribution pab for emitting an
aligned pair a:b
States X ynd Y have distributions qxi for emitting symbol xi from
sequence x against a gap
The transition probability from M to an insert state X or Y is
denoted δ and the probability of staying in an insert state by ε
The probability for transition into an end state is denoted τ
All algorithms discussed so far carry across to pair HMMs
The total probability of generating a particular alignment is just
the product of the probabilities of each individual step.
Hidden Markov Models
Viterbi Algorithm for pair HMMs


Initialisation: v M  0, 0   1, v  i, 0   v  0, j   0  i, j
Recurrence: i  1,...., n j  1,....., m
v M  i, j   p xi y j
1  2    v M  i  1, j  1

max  1      v X  i  1, j  1
 1      vY  i  1, j  1

M


v
i  1, j 


X
v  i, j   qxi max  X

  v  i  1, j 
vY  i , j   q y j


M

 v  i, j  1
max  Y

  v  i, j  1
Termination: v E   max v M n, m , v X n, m , vY n, m



 


Hidden Markov Models
probabilistic model for a random alignment
1-η
η
η
X
qxi
η
1-η
Y
qyj
1-η
E
B
η
1-η
Hidden Markov Model



The main states X and Y emit the two sequences independently
The silent state does not emit any symbol but gathers input from the X
and Begin states
The probability of a pair of sequences according to the random model is
P  x, y | R    1   
n
n
 q  1    q
m
xi
i 1
  1   
m
nm
j 1
n
m
q q
i 1
xi
j 1
xj
xj
Hidden Markov Model





Allocate the terms in this expression to those that make up the
probability of the Viterbi alignment, so that the log-odds ratio is
the sum of the individual log-odds terms
Allocate one factor of (1-η) and the corresponding qa factor to
each residue that is emitted in a Viterbi step
So the match transitions will be allocated (1-η)2qaqb where a
and b are the residues matched
The insert states will be allocated (1-η)qa where a is the residue
inserted
As the Viterbi path must account for all residues, exactly (n+m)
terms will be used
Hidden Markov Model




We can now compute in terms of an additive model with log-odds
emission scores and log-odds transition scores.
In practice this is the most practical way to implement pair HMMs
Merge the emission scores with the transitions to produce scores
that correspond to the standard terms used in sequence alignment
by dynamic programming
Now the log-odds version of the Viterbi alignment algorithm can be
given in a form that looks like standard pairwise dynamic
programming
Hidden Markov Models
1  2   

pab
s  a, b   log
 log
2
qa qb
1   
 1     
d   log
1   1  2   
e   log

1 
Hidden Markov Model
Optimal log-odds alignment

Initialisation: V M  0, 0   2 log  , V X  0, 0   V Y  0, 0   
V   i, 1  V   1, j     i, j

Recursion:
i  0,..., n
VM
j  0,......, m except (0, 0)
V M  i  1, j  1

 i, j   s  xi , y j   max V X  i  1, j  1
 V Y  i  1, j  1

V M  i  1, j   d

V  i, j   max  X

 V  i  1, j   e
X
M

V  i, j  1  d
V  i, j   max  Y

 V  i, j  1  e
Y

Termination:
V  max V M  n, m  , V X  n, m   c, V Y  n, m   c 
Hidden Markov Model

The constant c in the termination has the value
c  log 1  2    log 1     

The procedure shows how for any pair HMM we can derive an
equivalent finite state automaton for obtaining the most
probable alignment
Download