Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot Marjolijn Elsinga & Elze de Groot 1 Andrei A. Markov Born: 14 June 1856 in Ryazan, Russia Died: 20 July 1922 in Petrograd, Russia Graduate of Saint Petersburg University (1878) Work: number theory and analysis, continued fractions, limits of integrals, approximation theory and the convergence of series Marjolijn Elsinga & Elze de Groot 2 Todays topics Markov chains Hidden Markov models - Viterbi Algorithm - Forward Algorithm - Backward Algorithm - Posterior Probabilities Marjolijn Elsinga & Elze de Groot 3 Markov Chains (1) Emitting states Marjolijn Elsinga & Elze de Groot 4 Markov Chains (2) Transition probabilities Probability of the sequence Marjolijn Elsinga & Elze de Groot 5 Key property of Markov Chains The probability of a symbol xi depends only on the value of the preceding symbol xi-1 Marjolijn Elsinga & Elze de Groot 6 Begin and End states Silent states Marjolijn Elsinga & Elze de Groot 7 Example: CpG Islands CpG = Cytosine – phosphodiester bond – Guanine 100 – 1000 bases long Cytosine is modified by methylation Methylation is suppressed in short stretches of the genome (start regions of genes) High chance of mutation into a thymine (T) Marjolijn Elsinga & Elze de Groot 8 Two questions How would we decide if a short strech of genomic sequence comes from a CpG island or not? How would we find, given a long piece of sequence, the CpG islands in it, if there are any? Marjolijn Elsinga & Elze de Groot 9 Discrimination 48 putative CpG islands are extracted Derive 2 models - regions labelled as CpG island (‘+’ model) - regions from the remainder (‘-’ model) Transition probabilities are set - Where Cst+ is number of times letter t follows letter s Marjolijn Elsinga & Elze de Groot 10 Maximum Likelihood Estimators Each row sums to 1 Tables are asymmetric Marjolijn Elsinga & Elze de Groot 11 Log-odds ratio Marjolijn Elsinga & Elze de Groot 12 Discrimination shown Marjolijn Elsinga & Elze de Groot 13 Simulation: ‘+’ model Marjolijn Elsinga & Elze de Groot 14 Simulation: ‘-’ model Marjolijn Elsinga & Elze de Groot 15 Todays topics Markov chains Hidden Markov models - Viterbi Algorithm - Forward Algorithm - Backward Algorithm - Posterior Probabilities Marjolijn Elsinga & Elze de Groot 16 Hidden Markov Models (HMM) (1) No one-to-one correspondence between states and symbols No longer possible to say what state the model is in when in xi Transition probability from state k to l: πi is the ith state in the path (state sequence) Marjolijn Elsinga & Elze de Groot 17 Hidden Markov Models (HMM) (2) Begin state: a0k End state: a0k In CpG islands example: Marjolijn Elsinga & Elze de Groot 18 Hidden Markov Models (HMM) (3) We need new set of parameters because we decoupled symbols from states Probability that symbol b is seen when in state k: Marjolijn Elsinga & Elze de Groot 19 Example: dishonest casino (1) Fair die and loaded die Loaded die: probability 0.5 of a 6 and probability 0.1 for 1-5 Switch from fair to loaded: probability 0.05 Switch back: probability 0.1 Marjolijn Elsinga & Elze de Groot 20 Dishonest casino (2) Emission probabilities: HMM model that generate or emit sequences Marjolijn Elsinga & Elze de Groot 21 Dishonest casino (3) Hidden: you don’t know if die is fair or loaded Joint probability of observed sequence x and state sequence π: Marjolijn Elsinga & Elze de Groot 22 Three algorithms What is the most probable path for generating a given sequence? Viterbi Algorithm How likely is a given sequence? Forward Algorithm How can we learn the HMM parameters given a set of sequences? Forward-Backward (Baum-Welch) Algorithm Marjolijn Elsinga & Elze de Groot 23 Viterbi Algorithm CGCG can be generated on different ways, and with different probabilities Choose path with highest probability Most probable path can be found recursively Marjolijn Elsinga & Elze de Groot 24 Viterbi Algorithm (2) vk(i) = probability of most probable path ending in state k with observation i Marjolijn Elsinga & Elze de Groot 25 Viterbi Algorithm (3) Marjolijn Elsinga & Elze de Groot 26 Viterbi Algorithm Most probable path for CGCG Marjolijn Elsinga & Elze de Groot 27 Viterbi Algorithm Result with casino example Marjolijn Elsinga & Elze de Groot 28 Three algorithms What is the most probable path for generating a given sequence? Viterbi Algorithm How likely is a given sequence? Forward Algorithm How can we learn the HMM parameters given a set of sequences? Forward-Backward (Baum-Welch) Algorithm Marjolijn Elsinga & Elze de Groot 29 Forward Algorithm (1) Probability over all possible paths Number of possible paths increases exponentonial with length of sequence Forward algorithm enables us to compute this efficiently Marjolijn Elsinga & Elze de Groot 30 Forward Algorithm (2) Replacing maximisation steps for sums in viterbi algorithm Probability of observed sequence up to and including xi, requiring πi = k Marjolijn Elsinga & Elze de Groot 31 Forward Algorithm (3) Marjolijn Elsinga & Elze de Groot 32 Three algorithms What is the most probable path for generating a given sequence? Viterbi Algorithm How likely is a given sequence? Forward Algorithm How can we learn the HMM parameters given a set of sequences? Forward-Backward (Baum-Welch) Algorithm Marjolijn Elsinga & Elze de Groot 33 Backward Algorithm (1) Probability of observed sequence from xi to the end of the sequence, requiring πi = k Marjolijn Elsinga & Elze de Groot 34 Disadvantage Algorithms Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer can be solved by doing the algorithms in log space, calculating log(vl(i)) Marjolijn Elsinga & Elze de Groot 35 Backward Algorithm Marjolijn Elsinga & Elze de Groot 36 Posterior State Probability (1) Probability that observation xi came from state k, given the observed sequence Posterior probability of state k at time i when the emitted sequence is known: P(πi = k | x) Marjolijn Elsinga & Elze de Groot 37 Posterior State Probability (2) First calculate probability of producing entire observed sequence with the ith symbol being produced by state k P(x, πi = k) = fk (i) · bk (i) Marjolijn Elsinga & Elze de Groot 38 Posterior State Probability (3) Posterior probabilities will then be: P(x) is result of forward or backward calculation Marjolijn Elsinga & Elze de Groot 39 Posterior Probabilities (4) For the casino example Marjolijn Elsinga & Elze de Groot 40 Two questions How would we decide if a short strech of genomic sequence comes from a CpG island or not? How would we find, given a long piece of sequence, the CpG islands in it, if there are any? Marjolijn Elsinga & Elze de Groot 41 Prediction of CpG islands First way: Viterbi Algorithm - Find most probable path through the model - When this path goes through the ‘+’ state, a CpG island is predicted Marjolijn Elsinga & Elze de Groot 42 Prediction of CpG islands Second Way: Posterior Decoding - function: - g(k) = 1 for k Є {A+, C+, G+, T+} g(k) = 0 for k Є {A-, C-, G-, T-} G(i|x) is posterior probability according to the model that base i is in a CpG island Marjolijn Elsinga & Elze de Groot 43 Summary (1) Markov chain is a collection of states where a state depends only on the state before Hidden markov model is a model in which the states sequence is ‘hidden’ Marjolijn Elsinga & Elze de Groot 44 Summary (2) Most probable path: viterbi algorithm How likely is a given sequence?: forward algorithm Posterior state probability: forward and backward algorithms (used for most probable state of an observation) Marjolijn Elsinga & Elze de Groot 45