Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis A matter of confidence 0 1 short branches give more weight to inheritance explanation 1 long branches give more weight to convergence explanation 0 1 1 more confident anc. state was 1 0 1 less confident anc. state was 1 Copyright © 2007 by Paul O. Lewis Maximum likelihood ancestral state estimation 0 0 1 1 0.1 0.1 0.2 data from one character 0 1 trait absent trait present state here not of direct interest 0.2 0.1 ? Copyright © 2007 by Paul O. Lewis branch length (expected number of changes per character) ancestral state of interest "Mk" models "Cavender-Farris" model: 2 possible states (k = 2) expected no. changes = μ t Pii(t) = 0.5 + 0.5 e-2μ t Pij(t) = 0.5 - 0.5 e-2μ t Haldane, J. B. S. 1919. The combination of linkage values and the calculation of distances between the loci of linked factors. Journal of Genetics 8:299-309. Cavender, J. A. 1978. Taxonomy with confidence. Mathematical Biosciences 40:271-280. Farris, J. S. 1973. A probability model for inferring evolutionary trees. Systematic Zoology 22:250-256. Felsenstein, J. 1981. A likelihood approach to character weighting and what it tells us about parsimony and compatibility. Biological Journal of the Linnean Society 16:183-196. Jukes-Cantor (1969) model: Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pages 21-132 in Mammalian Protein Metabolism (H. N. Munro, ed.) Academic Press, New York. Copyright © 2007 by Paul O. Lewis 4 possible states (k = 4) expected no. changes = 3 μ t Pii(t) = 0.25 + 0.75 e-4μ t Pij(t) = 0.25 - 0.25 e-4μ t For branch of length 0.1 μ t = 0.1 P00(0.1) = 0.5 + 0.5 e-2(0.1) = 0.90936 P01(0.1) = 0.5 - 0.5 e-2(0.1) = 0. 09064 P10(0.1) = 0.5 - 0.5 e-2(0.1) = 0. 09064 P11(0.1) = 0.5 + 0.5 e-2(0.1) = 0.90936 Copyright © 2007 by Paul O. Lewis For branch of length 0.2 μ t = 0.2 P00(0.2) = 0.5 + 0.5 e-2(0.2) = 0.83516 P01(0.2) = 0.5 - 0.5 e-2(0.2) = 0. 16484 P10(0.2) = 0.5 - 0.5 e-2(0.2) = 0. 16484 P11(0.2) = 0.5 + 0.5 e-2(0.2) = 0. 83516 Copyright © 2007 by Paul O. Lewis 0 Calculating conditional likelihoods at upper node 0 1 0.1 1 0.1 L0 = P01(.1) P01(.1) = [.09064]2 = 0.00822 0.2 0.2 0.1 L1 = P11(.1) P11(.1) = [0.90936]2 = 0.82694 L0 and L1 are known as conditional likelihoods. L0, for example, is the likelihood of the subtree given that the node has state 0. ? Copyright © 2007 by Paul O. Lew L0 = Pr(data above node | node state = 0) 0 Calculating likelihood at basal node conditional on 0 1 1 basal node having state 0 0.1 0.2 0.1 L0 = 0.00822 L1 = 0.82694 0.2 0.1 0 Copyright © 2007 by Paul O. Lewis L0 = P00(.2) P00(.2) P00(.1) L0 + P00(.2) P00(.2) P01(.1) L1 = (.83516)2 (.90936) (.00822) + (.83516)2 (.09064) (.82694) = 0.05749 0 Calculating conditional likelihoods at upper node 0 1 0.1 1 0.1 L0 = P01(.1) P01(.1) = [.09064]2 = 0.00822 0.2 0.2 0.1 L1 = P11(.1) P11(.1) = [0.90936]2 = 0.82694 L0 and L1 are known as conditional likelihoods. L0, for example, is the likelihood of the subtree given that the node has state 0. ? Copyright © 2007 by Paul O. Lewis L0 = Pr(data above node | node state = 0) Calculating likelihood of the tree using conditional 0 1 1 likelihoods at the basal node 0 0.1 0.1 D here refers to the observed data at the tips 0 here refers to the state at the basal node 0.2 0.2 0.1 L = Pr(0) Pr(D|0) + Pr(1) Pr(D|1) = (0.5) L0 + (0.5) L1 = (0.5) (0.05749) + (0.5) (0.02045) = (0.5)(0.07794) = 0.03897 lnL = -3.24496 L0 = 0.05749 L1 = 0.02045 Copyright © 2007 by Paul O. Lewis Choosing the maximum likelihood ancestral state is simply a 0 0 1 1 matter of seeing which conditional 0.1 0.1 likelihood is the largest 0.2 0.2 0.1 Copyright © 2007 by Paul O. Lewis In this case, L0 is larger of the two so, if forced to decide, you would go with 0 as the ancestral state. L0 = 0.05749 L1 = 0.02045 (73.8%) (26.2%) Is this a ML or Bayesian approach? L0 = 0.05749 L1 = 0.02045 (73.8%) (26.2%) prior likelihood (0.5) (0.05749) 0.738 = posterior probability of state 0 (0.5) (0.05749) + (0.5) (0.02045) marginal probability of the data Converting likelihoods to percentages produces Bayesian posterior probabilities (and implicitly assumes a flat prior over ancestral states) Yang, Z., S. Kumar, and M. Nei. 1995. A new method of inference of ancestral nucleotide and amino-acid sequences. Genetics 141:1641-1650. Copyright © 2007 by Paul O. Lewis Empirical Bayes vs. Full Bayes The likelihood used in the previous slide is the maximum likelihood, i.e. the likelihood at one particular combination of branch length values (and values of other model parameters). A fully Bayesian approach would approximate the posterior probability associated with state 0 and state 1 at the node of interest using an MCMC analysis that allowed all the model parameters to vary, including the branch lengths. Because some parameters are fixed at their maximum likelihood estimates, this approach is often called an empirical Bayesian method. Copyright © 2007 by Paul O. Lewis Marginal vs. Joint Estimation • marginal estimation involves summing likelihood over all states at all nodes except one of interest (most common) • joint estimation involves finding the combination of states at all nodes that together account for the highest likelihood (seldom used) Copyright © 2007 by Paul O. Lewis 0 0 1 1 0.1 0.2 0 0.2 Joint Estimation 0.1 Find likelihood of this particular combination of ancestral states (call it L00) 0.1 0 Copyright © 2007 by Paul O. Lewis L00 = 0.5 P00(.2) P00(.2) P00(.1) P01(.1) P01(.1) = (0.5) (.83516)2 (.90936) (.09064)2 = 0.00261 0 0 1 1 0.1 0.2 1 0.2 0.1 0 Copyright © 2007 by Paul O. Lewis Joint Estimation 0.1 Now we are calculating the likelihood of a different combination of ancestral states (call this one L01) L01 = 0.5 P00(.2) P00(.2) P01(.1) P11(.1) P11(.1) = (0.5) (.83516)2 (.09064) (.90936)2 = 0.02614 0 0 1 0.1 0.2 Y 0.2 0.1 X Copyright © 2007 by Paul O. Lewis Joint Estimation 1 0.1 Summary of the four possibilities: L00 = 0.00261 L01 = 0.02614 L10 = 0.00003 L11 = 0.01022 X Y best Returning to marginal estimation... 0 0 1 1 What about a pie diagram for this node? 0.1 0.1 Previously, we computed these conditional likelihoods for this node: 0.2 0.2 0.1 L0 = 0.00822 L1 = 0.82694 Can we just make a pie diagram from these two values? Copyright © 2007 by Paul O. Lewis Returning to marginal estimation... 0 0 1 0.1 0.2 0.2 0.1 Copyright © 2007 by Paul O. Lewis 1 0.1 The conditional likelihoods we computed only included data above this node (i.e. only half the observed states) Estimation of where changes take place 1. What is the probability that a change 0 → 1 took place on a particular branch in the tree? 2. What is the probability that the character changed state exactly x times on the tree ? Should we answer these questions with a joint or marginal ancestral character state reconstruction? No We should figure out the probability of explaining the character state distribution with exactly the set of changes we are interested in. Marginalize over all states that are not crucial to the question 1. What is the probability that a change 0 → 1 took place on a particular branch in the tree? – fix the ancestral node to 0 and the descendant to 1, calculate the likelihood and divide that by the total likelihood. 2. What is the probability that the character changed state exactly x times on the tree ? 1. What is the probability that a change 0 → 1 took place on a particular branch in the tree? 2. What is the probability that the character changed state exactly x times on the tree ? – Sum the likelihoods over all reconstructions that have exactly x changes. This is not easy to do! Recall that Pr(0 → 1|ν, θ) from our models accounts for multiple hits. A solution is to sample histories according to their posterior probabilities and summarize these samples. Ancestral states are one thing... But wouldn't it be nice to estimate not only what happened at the nodes, but also what happened along the edges of the tree? © 2007 by Paul O. Lewis 5 Stochastic mapping simulates an entire character history consistent with the observed character data. In this case, green means perennial and blue means annual in the flowering plant genus Polygonella © 2007 by Paul O. Lewis Three Mappings for Lifespan Blue = annual Green = perennial © 2007 by Paul O. Lewis Each simulation produces a different history (or mapping) consistent with the observed states. Kinds of questions you can answer using stochastic mapping • How long, on average, does a character spend in state 0? In state 1? • Is the total time in which two characters are both in state 1 more than would be expected if they were evolving independently? • What is the number of switches from state 0 to state 1 for a character (on average)? © 2007 by Paul O. Lewis Step 1: Downpass to obtain conditional likelihoods This part reflects the calculations needed to estimate ancestral states. © 2007 by Paul O. Lewis Step 2: Up-pass to sample states at the interior nodes This part resembles a standard simulation, with the exception that we must conform to the observed states at the tips Posterior probability © 2007 by Paul O. Lewis Likelihood Prior probability Choosing a state at an interior node Posterior probability of state 0 Once the posterior probabilities have been calculated for all three states, a virtual "Roulette wheel" or "pie" can be constructed and used to choose a state at random. © 2007 by Paul O. Lewis Posterior probability of state 1 In this case, the posterior distribution was: {0.1, 0.2, 0.7} Posterior probability of state 2 Sampling state at node 6 Have already chosen state at the root node © 2007 by Paul O. Lewis Note: only the calculation for x6=0 is presented (but you must do a similar calculation for all three states) Sampling state at node 5 © 2007 by Paul O. Lewis Note: only the calculation for x5=0 is presented "Mapping" edges • Mapping an edge means choosing an evolutionary history for a character along a particular edge that is consistent with the observed states at the tips, the chosen states at the interior nodes, and the model of character change being assumed • This is done by choosing random sojourn times and "causing" a change in state at the end of each of these sojourns • What is a sojourn time? It is the time between substitution events, or the time one has to wait until the next change of character state occurs © 2007 by Paul O. Lewis Last sojourn time takes us beyond the end of the branch, so 2 is the final state 0 0→1 1st. sojourn time 2 1→0 0→2 2nd. sojourn time 1st. waiting time 2nd. waiting time 3rd. sojourn time 4th. sojourn time 3rd. waiting time 4th. waiting time Sojourn times are related to substitution rates. High substitution rates mean short sojourn times, on average. A dwell time is the total amount of time spent in one particular state. For example, the dwell time associated with state 0 is the sum of the 1st. and 3rd. sojourn times. © 2007 by Paul O. Lewis Substitution rates and sojourn times Symmetric model for character with 3 states: 0, 1 and 2 0 0 1 2 -2 β β β 1 β -2 β β 2 β β -2 β © 2007 by Paul O. Lewis The rate at which state 0 changes to something else (either state 1 or state 2) is 2β Note that 2βt equals the branch length. For these symmetric models, the branch length is (k-1)βt where k is the number of states. Sojourn times are exponentially distributed If λ is the total rate of leaving state 0, then you can simulate a soujourn by: • drawing a time according to the cumulative distribution function of the exponential distribution (1 − e−λt). • pick the “destination” state by normalizing the rates away from state 0 such that the rates sum to one. This is picking j and t by: Pr(0 → j, t) = Pr(mut. at t) Pr(j|mut.) But we want quantities like: Pr(0 → j, t|0 → 2, ν1) Conditioning on the data We simulate according to Pr(0 → j, t|0 → 2, ν1) by rejection sampling: 1. Simulate a series of draws of the form: Pr(0 → j, t) until the sum of t exceeds ν1, 2. If we ended up in the correct state, then accept this series of sojourns as the mapping. 3. If not, go back to step 1. "Mapping" edge 1 © 2007 by Paul O. Lewis 20 Failed mapping © 2007 by Paul O. Lewis 21 Completed mapping © 2007 by Paul O. Lewis 22 “Stochastic character mapping” or “Mutational Mapping” • allows you to sample from the posterior distribution of histories for characters • conditional on an assumed model • conditional on the observed data • implemented in Simmap (v1 by J. Bollback; v2 by J. Bollback and J. Yu) and Mesquite (module by P. Midford) “Stochastic character mapping” You can calculate summary statistics on the set of simulated histories to answer almost any type of question that depends on knowing ancestral character states or the number of changes: • What proportion of the evolutionary history was ‘spent’ in state 0? • What is the posterior probability that there were between 2 and 4 changes of state? • Does the state in one character appear to affect the probability of change at another? – Map both characters and do a contingency analysis. See Minin and Suchard (2008a,b); O’Brein et al. (2009) for rejection free approaches to generate mappings. Often we are more interested in the “forces” driving evolutionary patterns than the outcome. Perhaps we should focus on fitting a model of character evolution rather than the ancestral character state estimates. Rather than: 1. estimate a tree, 2. estimate character states, 3. figure out what model of character evolution best fits those ancestral character state reconstructions. Rather than: 1. estimate a tree, 2. figure out what model of character evolution best fits tip data Or even: 1. Jointly estimate character evolution in trees in MCMC (marginalizing over trees when answering questions about character evolution). Correlation of states in a discrete-state model character 1: character 2: change in character 1 species states #2 branch changes #2 change in character 2 #1 #1 Y N 0 6 Y 1 0 4 0 N 0 18 Week 8: Paired sites tests, gene frequencies, continuous characters – p.35/44 Example from pp. 429-430 of the MacClade 4 manual. Character 1 Q: Does character 2 tend to change from yellow →blue only when character 1 is in the blue state? Character 2 © 2007 by Paul O. Lewis Concentrated Changes Test More precisely: how often would three yellow→blue changes occur in the blue areas of the cladogram on the left if these 3 changes were thrown down at random on branches of the tree? Answer: 12.0671% of the time. Thus, we cannot reject the null hypothesis that the observed coincident changes in the two characters were simply the result of chance. © 2007 by Paul O. Lewis Model-based approach to detecting correlated evolution View the problem as model-selection or use LRT if you prefer the hypothesis testing framework. Consider two binary characters. Under the simplest model, there is only one rate: 0 1 2 3 (0,0) (0,1) (1,0) (1,1) Why are there 0’s? 0 (0,0) q q 0 1 (0,1) q 0 q 2 (1,0) q 0 q 3 (1,1) 0 q q - Under a model assuming character independence: 0 1 2 3 (0,0) (0,1) (1,0) (1,1) 0 (0,0) q∗0 q0∗ 0 1 (0,1) f01 0 q0∗ 2 (1,0) s01 0 q∗0 3 (1,1) 0 q1∗ q∗1 - 2 (1,0) q02 0 q32 3 (1,1) 0 q13 q23 - Under a the most general model: 0 1 2 3 (0,0) (0,1) (1,0) (1,1) 0 (0,0) q10 q20 0 1 (0,1) q01 0 q31 ◦ s01 '$ 0,0 '$ 0,1 - &% &% ◦ s10 6 6 f10 f01 f10 f01 ? ? s01 - 1,0 s10 1,1 We can calculate maximum likelihood scores under any of these models and do model selection. Or we could put priors on the parameter values and using Bayesian model selection (e.g. reversible jump methods in BayesTraits). References Minin, V. N. and Suchard, M. A. (2008a). Counting labeled transitions in continuous-time markov models of evolution. J. Math. Biol, 56:391–412. Minin, V. N. and Suchard, M. A. (2008b). Fast, accurate and simulation-free stochastic mapping. Philos Trans R Soc Lond, B, Biol Sci, 363:3985–3995. O’Brein, J. D., Minin, V., and Suchard, M. (2009). Learning to count: Robust estimates for labeled distances between molecular sequences. Molecular Biology and Evolution, 26(4):801–814.