Probability Theory and Basic Alignment of String Sequences

Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7 Elze de Groot 1 Overview • Parameter estimation for HMMs – Baum-Welch algorithm • HMM model structure • More complex Markov chains • Numerical stability of HMM algorithms Elze de Groot 2 Specifying a HMM model • Most difficult problem using HMMs is specifying the model – Design of the structure – Assignment of parameter values Elze de Groot 3 Specifying a HMM model • Most difficult problem using HMMs is specifying the model – Design of the structure – Assignment of parameter values Elze de Groot 4 Parameter estimation for HMMs • Estimate transition and emission probabilities akl and ek(b) • Two ways of learning: – Estimation when state sequence is known – Estimation when paths are unknown • Assume that we have a set of example sequences (training sequences x1, …xn) Elze de Groot 5 Parameter estimation for HMMs • Assume that x1…xn independent. • joint probability n P( x ,..., x |  )   P( x j |  ) 1 n j 1 • Log space Since log ab = log a + logb Elze de Groot 6 Estimation when state sequence is known • Easier than estimation when paths unknown Akl akl   Akl ' l' Ek (b) ek (b)   Ek (b' ) b' • Akl = number of transitions k to l in trainingdata + rkl • Ek(b) = number of emissions of b from k in training data + rk(b) Elze de Groot 7 Estimation when paths are unknown • More complex than when paths are known • We can’t use maximum likelihood estimators • Instead, an iterative algorithm is used – Baum-Welch Elze de Groot 8 The Baum-Welch algorithm • We don’t know real values of Akl and Ek(b) 1. Estimate Akl and Ek(b) 2. Update akl and ek(b) 3. Repeat with new model parameters akl and ek(b) Elze de Groot 9 Baum-Welch algorithm Forward value Elze de Groot Backward value 10 Baum-Welch algorithm • Now that we have estimated Akl and Ek(b), use maximum likelihood estimators to compute akl and ek(b) • We use these values to estimate Akl and Ek(b) in the next iteration • Continue doing this iteration until change is very small or max number of iterations is exceeded Elze de Groot 11 Baum-Welch algorithm Elze de Groot 12 Example • Estimated model with 300 rolls and 30.000 rolls Elze de Groot 13 Drawbacks • ML estimators – Vulnerable to overfitting if not enough data – Estimations can be undefined if never used in training set (so use of pseudocounts) • Baum-Welch – Many local maximums instead of global maximum can be found, depending on starting values of parameters – This problem will be worse for large HMMs Elze de Groot 14 Viterbi Training • Most probable path derived using viterbi algorithm • Continue until none of paths change • Finds value of θ that maximises contribution to likelihood • Performs less well than baum welch Elze de Groot 15 Modelling of labelled sequences • Only -- and ++ are calculated • Better than using ML estimators, when many different classes are present Elze de Groot 16 Specifying a HMM model • Most difficult problem using HMMs is specifying the model – Design of the structure – Assignment of parameter values Elze de Groot 17 Design of the structure • Design: how to connect states by transitions • A good HMM is based on the knowledge about the problem under investigation • Local maxima are biggest disadvantage in models that are fully connected • After deleting a transition from model BaumWelch will still work: set transition probability to zero Elze de Groot 18 Example 1 • Geometric distribution p 1-p l 1 P(l residues)  (1  p) p Elze de Groot 19 Example 2 • Model distribution of length between 2 and 10 Elze de Groot 20 Example 3 • Negative binomial distribution  l  1  l n n P(l )    p (1 p)  n  1 • p=0.99 • n≤5 Elze de Groot 21 Silent states • States that do not emit symbols  B • Also in other places in HMM Elze de Groot 22 Example Silent states Elze de Groot 23 Silent states • Advantage: – Less estimations of transition probabilities needed • Drawback: – Limits the possibilities of defining a model Elze de Groot 24 Silent states • • • • Change in forward algorithm For ‘real’ states the same For silent states set fl (i  1) to k fk (i  1)akl Starting from lowest numbered silent state l add k fk (i  1)akl to fl (i  1) for all silent states k<l Elze de Groot 25 More complex Markov chains • So far, we assumed that probability of a symbol in a sequence depends only on the probability of the previous symbol • More complex – High order Markov chains – Inhomogeneous Markov chains Elze de Groot 26 High order Markov chains • An nth order Markov process P( xi | xi  1, xi  2,..., x1)  P( xi | xi  1, xi  2,..., xi  n) • Probability of a symbol in a sequence depends on the probability of the previous n symbols • An nth order Markov chain over some alphabet A is equivalent to a first order Markov chain over the alphabet An of n-tuples, because: P(AB|B) = P(A|B) Elze de Groot 27 Example • A second order Markov chain with two different symbols {A,B} • This can be translated into a first order Markov chain of 2-tuples {AA, AB, BA, BB} Sometimes the framework of high order model is convenient Elze de Groot 28 Finding prokaryotic genes • Gene candidates in DNA: -sequence of triplets of nucleotides: startcodon nr. of non-stopcodons stopcodon -open reading frame (ORF) • An ORF can be either a gene or a noncoding ORF (NORF) Elze de Groot 29 Finding prokaryotic genes • Experiment: – DNA from bacterium E.coli – Dataset contains 1100 genes (900 used for training, 200 for testing) • Two models: – Normal model with first order Markov chains – Also first order Markov chains, but codons instead of nucleotides are used as symbol Elze de Groot 30 Finding prokaryotic genes • Outcomes: Elze de Groot 31 Inhomogeneous Markov chains • Using the position information in the codon – Three models for position 1, 2 and 3 1 2 3 1 2 x1 x 2 x2 x3 x3x4 x4 x5 x5 x6 a a a a a ... 123123 CAT GCA Homogeneous P(C)aCA aAT aTG aGC aCA Inhomogeneous P(C)a2CA a3AT a1TG a2GC a3CA Elze de Groot 32 Numerical Stability of HMM algorithms • Multiplying many probabilities can cause numerical problems: – Underflow errors – Wrong numbers are calculated • Solutions: – Log transformation – Scaling of probabilities Elze de Groot 33 The log transformation • Compute log probabilities – Log 10-100000 = -100000 – Underflow problem is essentially solved • Sum operation is often faster than product operation • In the Viterbi algorithm: Vl (i  1)  el ( xi 1)  max (Vk (i)  akl ) k Elze de Groot 34 Scaling of probabilities • Scale f and b variables • Forward variable: – For each i a scaling variable si is defined – New f variables are defined: fl (i ) fl (i )  i  j 1 sj – New forward recursion: fl (i  1)  1 el ( x ) fk (i)akl i 1 si1 Elze de Groot k 35 Scaling of probabilities • Backward variable – Scaling has to be with same numbers as forward variable – New backward recursion: 1 bk (i )   aklbl (i  1)el ( xi 1) si l • This normally works well, however underflow errors can still occur in models with many silent states (chapter 5) Elze de Groot 36 Summary • Hidden Markov Models • Parameter estimation – State sequence known – State sequence unknown • Model structure – Silent states • More complex Markov chains • Numerical stability Elze de Groot 37

Probability Theory and Basic Alignment of String Sequences

Related documents

Products

Support

Probability Theory and Basic Alignment of String Sequences

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib