Some of these slides have been borrowed from Dr. Paul Lewis, Dr

advertisement
Some of these slides have been borrowed from Dr.
Paul Lewis, Dr. Joe Felsenstein. Thanks!
Paul has many great tools for teaching phylogenetics at his
web site:
http://hydrodictyon.eeb.uconn.edu/people/plewis
A matter of confidence
0
1
short branches give more
weight to inheritance
explanation
1
long branches give
more weight to
convergence
explanation
0
1
1
more confident
anc. state was 1
0 1
less confident
anc. state was 1
Copyright © 2007 by Paul O. Lewis
Maximum likelihood ancestral state estimation
0
0
1
1
0.1 0.1
0.2
data from one character
0
1
trait absent
trait present
state here not of direct interest
0.2
0.1
?
Copyright © 2007 by Paul O. Lewis
branch length (expected number
of changes per character)
ancestral state of interest
"Mk" models
"Cavender-Farris" model:
2 possible states (k = 2)
expected no. changes = μ t
Pii(t) = 0.5 + 0.5 e-2μ t
Pij(t) = 0.5 - 0.5 e-2μ t
Haldane, J. B. S. 1919. The combination of linkage
values and the calculation of distances between the loci
of linked factors. Journal of Genetics 8:299-309.
Cavender, J. A. 1978. Taxonomy with confidence.
Mathematical Biosciences 40:271-280.
Farris, J. S. 1973. A probability model for inferring
evolutionary trees. Systematic Zoology 22:250-256.
Felsenstein, J. 1981. A likelihood approach to character
weighting and what it tells us about parsimony and
compatibility. Biological Journal of the Linnean Society
16:183-196.
Jukes-Cantor (1969) model:
Jukes, T. H., and C. R. Cantor. 1969. Evolution of
protein molecules. Pages 21-132 in Mammalian
Protein Metabolism (H. N. Munro, ed.) Academic
Press, New York.
Copyright © 2007 by Paul O. Lewis
4 possible states (k = 4)
expected no. changes = 3 μ t
Pii(t) = 0.25 + 0.75 e-4μ t
Pij(t) = 0.25 - 0.25 e-4μ t
For branch of length 0.1
μ t = 0.1
P00(0.1) = 0.5 + 0.5 e-2(0.1) = 0.90936
P01(0.1) = 0.5 - 0.5 e-2(0.1) = 0. 09064
P10(0.1) = 0.5 - 0.5 e-2(0.1) = 0. 09064
P11(0.1) = 0.5 + 0.5 e-2(0.1) = 0.90936
Copyright © 2007 by Paul O. Lewis
For branch of length 0.2
μ t = 0.2
P00(0.2) = 0.5 + 0.5 e-2(0.2) = 0.83516
P01(0.2) = 0.5 - 0.5 e-2(0.2) = 0. 16484
P10(0.2) = 0.5 - 0.5 e-2(0.2) = 0. 16484
P11(0.2) = 0.5 + 0.5 e-2(0.2) = 0. 83516
Copyright © 2007 by Paul O. Lewis
0
Calculating conditional likelihoods
at upper node
0
1
0.1
1
0.1
L0 = P01(.1) P01(.1)
= [.09064]2 = 0.00822
0.2
0.2
0.1
L1 = P11(.1) P11(.1)
= [0.90936]2 = 0.82694
L0 and L1 are known as conditional likelihoods.
L0, for example, is the likelihood of the subtree
given that the node has state 0.
?
Copyright © 2007 by Paul O. Lew
L0 = Pr(data above node | node state = 0)
0
Calculating likelihood at basal node
conditional on
0
1
1
basal node having
state 0
0.1
0.2
0.1
L0 = 0.00822 L1 = 0.82694
0.2
0.1
0
Copyright © 2007 by Paul O. Lewis
L0 = P00(.2) P00(.2) P00(.1) L0
+ P00(.2) P00(.2) P01(.1) L1
= (.83516)2 (.90936) (.00822)
+ (.83516)2 (.09064) (.82694)
= 0.05749
0
Calculating conditional likelihoods
at upper node
0
1
0.1
1
0.1
L0 = P01(.1) P01(.1)
= [.09064]2 = 0.00822
0.2
0.2
0.1
L1 = P11(.1) P11(.1)
= [0.90936]2 = 0.82694
L0 and L1 are known as conditional likelihoods.
L0, for example, is the likelihood of the subtree
given that the node has state 0.
?
Copyright © 2007 by Paul O. Lewis
L0 = Pr(data above node | node state = 0)
Calculating likelihood of the tree
using conditional
0
1
1
likelihoods at the
basal node
0
0.1
0.1
D here refers to the observed data at the tips
0 here refers to the state at the basal node
0.2
0.2
0.1
L = Pr(0) Pr(D|0) + Pr(1) Pr(D|1)
= (0.5) L0 + (0.5) L1
= (0.5) (0.05749) + (0.5) (0.02045)
= (0.5)(0.07794) = 0.03897
lnL = -3.24496
L0 = 0.05749 L1 = 0.02045
Copyright © 2007 by Paul O. Lewis
Choosing the maximum likelihood ancestral
state is simply a
0
0
1
1
matter of seeing
which conditional
0.1 0.1
likelihood is the
largest
0.2
0.2
0.1
Copyright © 2007 by Paul O. Lewis
In this case, L0 is larger of the two
so, if forced to decide, you would
go with 0 as the ancestral state.
L0 = 0.05749 L1 = 0.02045
(73.8%)
(26.2%)
Is this a ML or Bayesian approach?
L0 = 0.05749
L1 = 0.02045
(73.8%)
(26.2%)
prior
likelihood
(0.5) (0.05749)
0.738 =
posterior
probability
of state 0
(0.5) (0.05749) + (0.5) (0.02045)
marginal probability of the data
Converting likelihoods to percentages produces
Bayesian posterior probabilities (and implicitly
assumes a flat prior over ancestral states)
Yang, Z., S. Kumar, and M. Nei. 1995. A new method of inference of ancestral nucleotide and amino-acid sequences.
Genetics 141:1641-1650.
Copyright © 2007 by Paul O. Lewis
Empirical Bayes vs. Full Bayes
The likelihood used in the previous slide is the maximum
likelihood, i.e. the likelihood at one particular combination
of branch length values (and values of other model parameters).
A fully Bayesian approach would approximate the posterior
probability associated with state 0 and state 1 at the node of
interest using an MCMC analysis that allowed all the model
parameters to vary, including the branch lengths.
Because some parameters are fixed at their maximum
likelihood estimates, this approach is often called an
empirical Bayesian method.
Copyright © 2007 by Paul O. Lewis
Marginal vs. Joint Estimation
• marginal estimation involves summing
likelihood over all states at all nodes except
one of interest (most common)
• joint estimation involves finding the
combination of states at all nodes that
together account for the highest likelihood
(seldom used)
Copyright © 2007 by Paul O. Lewis
0
0
1
1
0.1
0.2
0
0.2
Joint
Estimation
0.1
Find likelihood of this
particular combination of
ancestral states (call it L00)
0.1
0
Copyright © 2007 by Paul O. Lewis
L00 = 0.5 P00(.2) P00(.2) P00(.1) P01(.1) P01(.1)
= (0.5) (.83516)2 (.90936) (.09064)2
= 0.00261
0
0
1
1
0.1
0.2
1
0.2
0.1
0
Copyright © 2007 by Paul O. Lewis
Joint
Estimation
0.1
Now we are calculating the
likelihood of a different
combination of ancestral
states (call this one L01)
L01 = 0.5 P00(.2) P00(.2) P01(.1) P11(.1) P11(.1)
= (0.5) (.83516)2 (.09064) (.90936)2
= 0.02614
0
0
1
0.1
0.2
Y
0.2
0.1
X
Copyright © 2007 by Paul O. Lewis
Joint
Estimation
1
0.1
Summary of the four possibilities:
L00 = 0.00261
L01 = 0.02614
L10 = 0.00003
L11 = 0.01022
X
Y
best
Returning to marginal estimation...
0
0
1
1
What about a pie diagram for
this node?
0.1
0.1
Previously, we computed
these conditional likelihoods
for this node:
0.2
0.2
0.1
L0 = 0.00822 L1 = 0.82694
Can we just make a pie diagram from
these two values?
Copyright © 2007 by Paul O. Lewis
Returning to marginal estimation...
0
0
1
0.1
0.2
0.2
0.1
Copyright © 2007 by Paul O. Lewis
1
0.1
The conditional likelihoods we
computed only included data
above this node (i.e. only half
the observed states)
Estimation of where changes take place
1. What is the probability that a change 0 → 1 took place on
a particular branch in the tree?
2. What is the probability that the character changed state
exactly x times on the tree ?
Should we answer these questions with a joint or marginal
ancestral character state reconstruction?
No
We should figure out the probability of explaining the character
state distribution with exactly the set of changes we are
interested in.
Marginalize over all states that are not crucial to the question
1. What is the probability that a change 0 → 1 took place on
a particular branch in the tree? – fix the ancestral node to 0
and the descendant to 1, calculate the likelihood and divide
that by the total likelihood.
2. What is the probability that the character changed state
exactly x times on the tree ?
1. What is the probability that a change 0 → 1 took place on
a particular branch in the tree?
2. What is the probability that the character changed state
exactly x times on the tree ? – Sum the likelihoods over all
reconstructions that have exactly x changes. This is not easy
to do! Recall that Pr(0 → 1|ν, θ) from our models accounts
for multiple hits. A solution is to sample histories according
to their posterior probabilities and summarize these samples.
Ancestral states are one thing...
But wouldn't it be nice
to estimate not only
what happened at the
nodes, but also what
happened along the
edges of the tree?
© 2007 by Paul O. Lewis
5
Stochastic mapping
simulates an entire
character history
consistent with the
observed character
data.
In this case, green
means perennial and
blue means annual
in the flowering
plant genus
Polygonella
© 2007 by Paul O. Lewis
Three Mappings for Lifespan
Blue = annual
Green = perennial
© 2007 by Paul O. Lewis
Each simulation produces a different
history (or mapping) consistent with the
observed states.
Kinds of questions you can
answer using stochastic mapping
• How long, on average, does a character
spend in state 0? In state 1?
• Is the total time in which two characters are
both in state 1 more than would be expected
if they were evolving independently?
• What is the number of switches from state 0
to state 1 for a character (on average)?
© 2007 by Paul O. Lewis
Step 1: Downpass to obtain conditional likelihoods
This part reflects
the calculations
needed to estimate
ancestral states.
© 2007 by Paul O. Lewis
Step 2: Up-pass to sample states at the interior nodes
This part resembles a
standard simulation, with
the exception that we
must conform to the
observed states at the tips
Posterior
probability
© 2007 by Paul O. Lewis
Likelihood
Prior probability
Choosing a state at an interior node
Posterior probability of state 0
Once the posterior
probabilities have
been calculated for
all three states, a
virtual "Roulette
wheel" or "pie"
can be constructed
and used to choose
a state at random.
© 2007 by Paul O. Lewis
Posterior probability of state 1
In this case, the
posterior distribution was:
{0.1, 0.2, 0.7}
Posterior probability of state 2
Sampling state at node 6
Have already chosen
state at the root node
© 2007 by Paul O. Lewis
Note: only the calculation for x6=0 is presented (but
you must do a similar calculation for all three states)
Sampling state at node 5
© 2007 by Paul O. Lewis
Note: only the calculation for x5=0 is presented
"Mapping" edges
• Mapping an edge means choosing an
evolutionary history for a character along a
particular edge that is consistent with the
observed states at the tips, the chosen states at
the interior nodes, and the model of character
change being assumed
• This is done by choosing random sojourn times
and "causing" a change in state at the end of each
of these sojourns
• What is a sojourn time? It is the time between
substitution events, or the time one has to wait
until the next change of character state occurs
© 2007 by Paul O. Lewis
Last sojourn time
takes us beyond
the end of the
branch, so 2 is the
final state
0
0→1
1st. sojourn time
2
1→0 0→2
2nd. sojourn time
1st. waiting time
2nd. waiting time
3rd.
sojourn
time
4th. sojourn time
3rd. waiting time
4th. waiting time
Sojourn times are related to substitution rates. High substitution
rates mean short sojourn times, on average. A dwell time is the total
amount of time spent in one particular state. For example, the
dwell time associated with state 0 is the sum of the 1st. and
3rd. sojourn times.
© 2007 by Paul O. Lewis
Substitution rates and sojourn times
Symmetric model for character
with 3 states: 0, 1 and 2
0
0
1
2
-2 β
β
β
1
β
-2 β
β
2
β
β
-2 β
© 2007 by Paul O. Lewis
The rate at which state 0
changes to something
else (either state 1 or
state 2) is 2β
Note that 2βt equals the
branch length. For these
symmetric models, the
branch length is (k-1)βt
where k is the number
of states.
Sojourn times are exponentially distributed
If λ is the total rate of leaving state 0, then you can simulate
a soujourn by:
• drawing a time according to the cumulative distribution
function of the exponential distribution (1 − e−λt).
• pick the “destination” state by normalizing the rates away
from state 0 such that the rates sum to one.
This is picking j and t by:
Pr(0 → j, t) = Pr(mut. at t) Pr(j|mut.)
But we want quantities like:
Pr(0 → j, t|0 → 2, ν1)
Conditioning on the data
We simulate according to Pr(0 → j, t|0 → 2, ν1) by rejection
sampling:
1. Simulate a series of draws of the form: Pr(0 → j, t) until
the sum of t exceeds ν1,
2. If we ended up in the correct state, then accept this series
of sojourns as the mapping.
3. If not, go back to step 1.
"Mapping" edge 1
© 2007 by Paul O. Lewis
20
Failed mapping
© 2007 by Paul O. Lewis
21
Completed mapping
© 2007 by Paul O. Lewis
22
“Stochastic character mapping” or “Mutational
Mapping”
• allows you to sample from the posterior distribution of
histories for characters
• conditional on an assumed model
• conditional on the observed data
• implemented in Simmap (v1 by J. Bollback; v2 by J. Bollback
and J. Yu) and Mesquite (module by P. Midford)
“Stochastic character mapping”
You can calculate summary statistics on the set of simulated
histories to answer almost any type of question that depends on
knowing ancestral character states or the number of changes:
• What proportion of the evolutionary history was ‘spent’ in
state 0?
• What is the posterior probability that there were between 2
and 4 changes of state?
• Does the state in one character appear to affect the
probability of change at another? – Map both characters
and do a contingency analysis.
See Minin and Suchard (2008a,b); O’Brein et al. (2009) for
rejection free approaches to generate mappings.
Often we are more interested in the “forces” driving evolutionary
patterns than the outcome.
Perhaps we should focus on fitting a model of character
evolution rather than the ancestral character state estimates.
Rather than:
1. estimate a tree,
2. estimate character states,
3. figure out what model of character evolution best fits those
ancestral character state reconstructions.
Rather than:
1. estimate a tree,
2. figure out what model of character evolution best fits tip
data
Or even:
1. Jointly estimate character evolution in trees in MCMC
(marginalizing over trees when answering questions about
character evolution).
Correlation of states in a discrete-state model
character 1:
character 2:
change in
character 1
species states
#2
branch changes
#2
change in
character 2
#1
#1
Y
N
0
6
Y
1
0
4
0
N
0
18
Week 8: Paired sites tests, gene frequencies, continuous characters – p.35/44
Example from pp. 429-430 of
the MacClade 4 manual.
Character 1
Q: Does character 2 tend to change
from yellow →blue only when
character 1 is in the blue state?
Character 2
© 2007 by Paul O. Lewis
Concentrated Changes Test
More precisely: how often would three yellow→blue changes occur in the
blue areas of the cladogram on the left if these 3 changes were thrown
down at random on branches of the tree?
Answer: 12.0671% of the time. Thus, we cannot reject the null hypothesis
that the observed coincident changes in the two characters were simply
the result of chance.
© 2007 by Paul O. Lewis
Model-based approach to detecting correlated evolution
View the problem as model-selection or use LRT if you prefer
the hypothesis testing framework.
Consider two binary characters. Under the simplest model,
there is only one rate:
0
1
2
3
(0,0)
(0,1)
(1,0)
(1,1)
Why are there 0’s?
0
(0,0)
q
q
0
1
(0,1)
q
0
q
2
(1,0)
q
0
q
3
(1,1)
0
q
q
-
Under a model assuming character independence:
0
1
2
3
(0,0)
(0,1)
(1,0)
(1,1)
0
(0,0)
q∗0
q0∗
0
1
(0,1)
f01
0
q0∗
2
(1,0)
s01
0
q∗0
3
(1,1)
0
q1∗
q∗1
-
2
(1,0)
q02
0
q32
3
(1,1)
0
q13
q23
-
Under a the most general model:
0
1
2
3
(0,0)
(0,1)
(1,0)
(1,1)
0
(0,0)
q10
q20
0
1
(0,1)
q01
0
q31
◦ s01
'$
0,0
'$
0,1
-
&%
&%
◦ s10
6
6
f10
f01
f10
f01
?
?
s01
-
1,0
s10
1,1
We can calculate maximum likelihood scores under any of these
models and do model selection.
Or we could put priors on the parameter values and using
Bayesian model selection (e.g. reversible jump methods in
BayesTraits).
References
Minin, V. N. and Suchard, M. A. (2008a). Counting labeled
transitions in continuous-time markov models of evolution.
J. Math. Biol, 56:391–412.
Minin, V. N. and Suchard, M. A. (2008b). Fast, accurate
and simulation-free stochastic mapping. Philos Trans R Soc
Lond, B, Biol Sci, 363:3985–3995.
O’Brein, J. D., Minin, V., and Suchard, M. (2009). Learning
to count: Robust estimates for labeled distances between
molecular sequences. Molecular Biology and Evolution,
26(4):801–814.
Download