Viterbi training Initialize emission and transition probabilities to . while (true)

Viterbi training
Initialize emission and transition probabilities to random
while (true)
Do Viterbi decoding using current parameters
Save current parameters as previous parameters.
Re-estimate emission and transition parameters from the state path
decoded by Viterbi. (add pseduocounts, see next page).
if sum of absolute difference between current and previous parameters
is tiny (e.g., < 0.00001), break;
print current parameter and P(sequence, viterbi path)
Repeat the above procedure several times (with different
random seed), and compare P(sequence, viterbi path).
Report the parameters learned that give the largest P.
Re-estimate parameters with
• Count number of transitions, n_xy, where
x, y = {a, b}
• t_xy = (n_xy+c) / sum_x(n_xy+c)
– e.g. t_ab = (n_ab +1) / (n_ab + n_aa + 2)
• Count number of symbols in each state,
N_aX and N_bX, where X = A, C, G, T
• e_aX = (N_aX + 1) / (sum_X N_aX + 4)
• e_bX = (N_bX + 1) / (sum_X N_bX + 4)
Backward-Forward algorithm:
Compute sum of probabilities in log space
• Two probabilities x and y, x < y
• lx = log(x), ly = log(y),
(lx < ly)
• z = x + y = y (1 + x/y)
lz = log(z) = log(x+y)
= log(y) + log(1 + x/y)
= ly + log(1 + exp(log(x)-log(y))
= ly + log(1 + exp(lx – ly))
Also see page 4 in this doc:
and page 77 of the handouts.