Hidden Markov Models

advertisement
Markov Chains
1
Hidden Markov Models
2
Review

Markov Chain can solve the CpG island finding
problem
Positive model, negative model

Length? Solution: Using a combined model

3
Hidden Markov Models






The essential difference between a Markov chain and a hidden
Markov model is that for a hidden Markov model there is not a
one-to-one correspondence between the states and the symbols
(Why Hidden?).
It is no longer possible to tell what state the model was in when
xi was generated just by looking at xi.
In the previous example, there is no way to tell by looking at a
single symbol C in isolation whether it was emitted by state C+
or C-.
Many states to one letter.
Many letters to one state.
We now have to distinguish the sequence of states from the
sequences of symbols
4
Hidden Markov Models

States


Observable symbols



akl=P(pi=l|pi-1=k)
Emission probabilities


A, C, G, T
X=x1,x2,…,xn
Transition probabilities


A path of state: p=p1, p2,…,pn
ek(b)=P(xi=b|pi=k)
Decouple states and observable symbols
5
Hidden Markov Models


We can think of a HMM as a generative model, that
generates or emits sequences.
First state p1 is selected (either randomly or according
to some prior probabilities), then symbol x1 is emitted
at state p1 with possibility e1(x1). Then it transits to
state p2 with possibility a12 , etc.
6
Hidden Markov Models





1.
2.

X: G C A T A G C G G C T A G C T G A A T A G G A …
P: G+ C+ A+ T+ A+ G+ C+ G+ G+ C+ T+ A+ G+ C+ T- G- A- A- T- A- G- G- A- …
Now it is the path of hidden states that we want to find out
Many paths can be used to generate X, we want to find out the
most likely one.
There are several ways to do this
Brute Force method
Dynamic programming
We will talk about them later
7
The occasionally dishonest casino


A casino uses a fair die most of the time, but
occasionally switches to a loaded one
 Fair die: Prob(1) = Prob(2) = . . . = Prob(6) = 1/6
 Loaded die: Prob(1) = Prob(2) = . . . = Prob(5) =
1/10, Prob(6) = ½
 These are the emission probabilities at the two
states, loaded and fair.
Transition probabilities



Prob(Fair  Loaded) = 0.01
Prob(Loaded  Fair) = 0.2
Transitions between states obey a Markov process
8
A HMM for the occasionally dishonest
casino
9
The occasionally dishonest casino


The casino won’t tell you when they use the fair or loaded die.
Known:



Hidden: What the casino did


FFFFFLLLLLLLFFFF...
Observable: The series of die tosses


The structure of the model
The transition probabilities
3415256664666153...
What we must infer:


When the fair die was used?
When the loaded die was used?

The answer is a sequence of states
FFFFFFFLLLLLLFFF...
10
Making the inference

Model assigns a probability to each
explanation of the observation:
P(326|FFL)
= P(3|F)·P(FF)·P(2|F)·P(FL)·P(6|L)
= 1/6 · 0.99 · 1/6 · 0.01 · ½
11
Notation

x is the sequence of symbols emitted by model


A path, p, is a sequence of states



xi is the symbol emitted at time i
The i-th state in p is pi
akr is the probability of making a transition from state
akr = Pr(p i = r | p i 1 = k )
k to state r:
ek(b) is the probability that symbol b is emitted when
in state k ek (b ) = Pr(xi = b | p i = k )
12
A path of a sequence
0
1
1
1
…
1
2
2
2
…
2
…
…
…
K
K
K
x1
x2
x3
0
…
…
L
Pr(x , p ) = a0p 1  ep i (xi )  ap i p i 1
i =1
K
xL
13
The occasionally dishonest casino
x = x1, x2, x3 = 6,2,6
p (1) = FFF
p
( 2)
= LLL
Pr(x , p (1) ) = a0F eF (6)aFF eF (2)aFF eF (6)
1
1
1
 0.99   0.99 
6
6
6
 0.00227
= 0 .5 
Pr(x , p (2) ) = a0LeL (6)aLLeL (2)aLLeL (6)
= 0.5  0.5  0.8  0.1  0.8  0.5
= 0.008
p
( 3)
= LFL
Pr(x , p (3) ) = a0 LeL (6)aLF eF (2)aFL eL (6)aL 0
1
= 0.5  0.5  0.2   0.01  0.5
6
 0.0000417
14
The most probable path
The most likely path p* satisfies
p * = arg max Pr(x , p )
p
To find p*, consider all possible ways the last
symbol of x could have been emitted
Let
v k (i ) = Prob. of path p 1 , , p i most likely
Then
to emit x1 , , xi such that p i = k
v k (i ) = ek (xi ) max v r (i  1)ark 
r
15
The Viterbi Algorithm


1.
2.
3.

Viterbi Algorithm is a dynamic programming algorithm for
finding the most likely sequence of hidden states – called
Veterbi Path – that results in a sequence of observed
symbols
Assumptions:
Both the observed symbols and hidden states must be in a
sequence
These two sequences need to be aligned, and an observed
symbol needs to correspond to exactly one hidden state
Computing the most likely sequence of hidden states (path)
up to a certain point t must depend only on the observed
symbol at point t , and the most likely sequence of hidden
states (path) up to point t − 1
These assumptions are all satisfied in a first-order hidden
Markov model.
16
The Viterbi Algorithm

Initialization
(i = 0)
v 0 (0) = 1, vk (0) = 0 for k  0

Recursion (i = 1, . . . , L): For each state k
v k (i ) = ek (xi ) max v r (i  1)ark 
r
ptri (l ) = arg max k (vk (i  1)akl )

Termination:
P( x, p * ) = max vk ( L)ak 0 
k
p (l ) = arg max k (v k ( L)a k 0 )
*
L
17
To find p*, use trace-back(i=L…1), as in dynamic programming
Viterbi: Example
B
p
F
L
x

6
2
6
1
0
0
0
0
(1/2)(1/6)
= 1/12
(1/6)max{(1/12)0.99,
(1/4)0.2}
= 0.01375
(1/6)max{0.013750.99,
0.020.2}
= 0.00226875
0
(1/2) (1/2)
= 1/4
(1/10)max{(1/12)0.01,
(1/4)0.8}
= 0.02
(1/2)max{0.013750.01,
0.020.8}
= 0.08
v k (i ) = ek (xi ) max v r (i  1)ark 
r
18
Viterbi gets it right more often
than not
19
Hidden Markov Models
20
Total probability
Many different paths can result in
observation x.
The probability that our model will emit x is
Pr(x ) =  Pr(x , p )
Total
Probability
p
If HMM models a family of objects, we
want total probability to peak at members
of the family. (Training)
21
Total probability
Pr(x) can be computed in the same way as
probability of most likely path.
Let
fk (i ) = Prob. of observing x1 ,, xi
assuming that πi = k
Then
and
fk (i ) = ek (xi )fr (i  1)ark
r
Pr(x ) = fk (L)ak 0
k
22
The Forward Algorithm


Initialization (i = 0)
f0 (0) = 1, fk (0) = 0 for k  0
Recursion (i = 1, . . . , L): For each state k
fk (i ) = ek (xi )fr (i  1)ark
r

Termination:
Pr( x) =  f k ( L)ak 0
k
23
Hidden Markov Models




Decoding
Viterbi: Maximum Likelihood: Determine which
explanation is most likely
 Find the path most likely to have produced the
observed sequence
Forward: Total probability: Determine probability
that observed sequence was produced by the HMM
 Consider all paths that could have produced the
observed sequence
Forward and Backward : the probability that xi
came from state k given the observed sequence,
i.e. P(pi=k|x)
24
The Backward Algorithm
Pr(x) can be computed in the same way as
probability of most likely path.
Let
bk (i) = Prob. of observing xi ,, xL
assuming that πi = k
Then i=L-1, ...,1
bk (i) =  akl el ( xi  1)bl (i  1)
l
and
P( x) =  a0l el ( x1 )bl (1)
l
25
The Backward Algorithm


Initialization (i= L)
bk(L)=ako for all k
Recursion (i = L-1, . . . , 1): For each state
bk (i) =  akl el ( xi  1)bl (i  1)

Termination:
l
bk (i) =  akl el ( xi  1)bl (i  1)
l
26
Posterior state probabilities




The probability that xi came from state k
given the observed sequence, i.e. P(pi=k|x)
P(x,pi=k)=P(x1…xi,pi=k) P(xi+1…xL|x1…xi, pi=k)
=P(x1…xi,pi=k) P(xi+1…xL| pi=k)
=fk(i) bk(i)
P(pi=k|x)=fk(i)bk(i)/P(x)
Posterior decoding: Assign xi the state k that maximize
P(pi=k|x)=fk(i)bk(i)/P(x)
27
Estimating the probabilities
28
Estimating the probabilities
(“training”)

Baum-Welch algorithm


Start with initial guess at transition probabilities
Refine guess to improve the total probability of the training
data in each step



May get stuck at local optimum
Special case of expectation-maximization (EM) algorithm
Viterbi training



Derive probable paths for training data using Viterbi
algorithm
Re-estimate transition probabilities based on Viterbi path
Iterate until paths stop changing
29
30
Download