Molecular Clocks

advertisement
Molecular Clocks
Prediction of time from molecular divergence
Outline
• What is the molecular clock hypothesis?
• How do you detect deviations of the
molecular clock hypothesis?
• Assuming a perfect molecular clock, what
are the potential pitfalls in using it for
dating?
• Dating with “relaxed” clocks
• Cautionary notes
Molecular Clock
• Molecular divergence is ROUGHLY
correlated with divergence of time
Evidence for Rate Constancy
in Hemoglobin
from Zuckerkandl and Pauling (1965)
• Given
110 MYA
– a phylogenetic tree
– branch lengths
– a time estimate for
one (or more) node(s)
C
D
R
M
• Can we date other nodes in the tree?
• Yes... if the rate of molecular change is
constant across all branches
H
The Molecular Clock Hypothesis
• Amount of genetic difference between
sequences is a function of time since
separation
• Rate of molecular change is constant
(enough) to predict times of divergence
(within the bounds of particular genes and
taxa)
Rate Constancy?
Page & Holmes p240
Rate Heterogeneity
• Rate of molecular evolution can differ between
–
–
–
–
–
–
nucleotide positions
genes
genomic regions
genomes within species (nuclear vs organelle)
species
over time
• If not considered, introduces bias into
time estimates
Rate Heterogeneity among lineages
Cause
Reason
Repair
mechanisms
e.g. RNA viruses have
error-prone polymerases
More free radicals
Metabolic rate
Generation time Copies DNA more frequently
Population size
Effects mutation fixation rate
Local Clocks?
• Closely related species often share similar
properties, likely to have similar rates
• For example
– murid rodents on average 2-6 times faster
than apes and humans (Graur & Li p150)
– mouse and rat rates are nearly equal (Graur &
Li p146)
Rate Changes within a Lineage
Cause
Population size
changes
Reason
Genetic drift more likely to fix
neutral alleles in small
population
Strength of selection 1. new role/environment
changes over time
2. gene duplication
3. change in another gene
Identifying rate heterogeneity
Tests of molecular clock:
– Likelihood ratio test
• identifies deviance from clock but not the deviant sequences
– Relative rates tests
• compares rates of sister nodes using an outgroup
– Tajima test
• Number of sites in which character shared by outgroup and
only one of two ingroups should be equal for both ingroups
– Branch length test
• deviation of distance from root to leaf compared to average
distance
Likelihood Ratio Test
• estimate a phylogeny under molecular
clock and without it
– e.g. root-to-tip distances must be equal
• difference in likelihood ~ 2*Chi^2 with n-2
degrees of freedom (n = # taxa in tree)
– asymptotically
– when models are nested
Relative Rates Tests
Sarich & Wilson 1973, Wu and Li 1985
• Tests whether distance between two taxa and an
outgroup are equal (or average rate of two clades vs an
outgroup)
– need to compute expected variance
– many triples to consider, and not independent (although
modifications such as Li & Bousquet 1992 correct for this)
• Lacks power, esp
– short sequences
– low rates of change
• Given length and number of variable sites in typical
sequences used for dating, (Bromham et al 2000) says:
– unlikely to detect moderate variation between lineages (1.5-4x)
– likely to result in substantial error in date estimates
Relative Rates Tests
Sarich & Wilson 1973, Wu and Li 1985
Taxon 1
Taxon 1
0
Taxon 2
Taxon 2
Taxon 3
Outgroup
Taxon 3
Outgroup
Relative Rates Tests
Sarich & Wilson 1973, Wu and Li 1985
H0: K01 = K02 or K01 - K02 = 0
K01
Taxon 1
0
K02
K03
Taxon 2
Taxon 3
Outgroup
K13 = K01 + K03 (1)
K23 = K02 + K03 (2)
K12 = K01 + K02 (3)
K01 = (K13 + K12 – K23 )/2 (4)
K02 = (K12 + K23 – K13 )/2 (5)
K03 = (K13 + K23 – K12 )/2 (6)
K01 – K02 = K13 - K23
Variance z = K13 - K23 \ [var (K13 - K23)] 1/2
Compare to normal distribution
Bayesian Relative Rates test (Wilcox et al. 2004)
•
•
MrBayes in conjunction with Cadence; variation is estimated from the
posterior distribution
Cadence summarizes for all tree samples, the distance between specific
taxa and the most recent common ancestor (MRCA)
Measuring Evolutionary time with a
molecular clock
1. Estimate genetic distance
d = number amino acid replacements
2. Use paleontological data to determine date of
common ancestor
T = time since divergence
3. Estimate calibration rate (number of genetic
changes expected per unit time)
r = d / 2T
4. Calculate time of divergence for novel
sequences
Tij = dij / 2r
Perfect Molecular Clock
• Change linear function time (substitutions
~ Poisson) (variation is only due to
stochastic error)
• Rates constant (positions/lineages)
• Tree perfect
• Molecular distance estimated perfectly
• Calibration dates without error
• Regression (time vs substitutions) without
error
Poisson Variance
(Assuming A Perfect Molecular Clock)
If mutation every MY
• Poisson variance
– 95% lineages 15 MYA
old have 8-22
substitutions
– 8 substitutions also
could be 5 MYA
Molecular Systematics p532
Estimating Substitution Rate
• Calculate separate rate for each data set
(species/genes) using known date of
divergence (from fossil, biogeography)
• One calibration point
– Rate = d/2T
• More than one calibration point
– use regression
Calibration Complexities
• Cannot date fossils perfectly
• Fossils usually not direct ancestors
– branched off tree before (after?) splitting
event.
• Impossible to pinpoint the age of last
common ancestor of a group of living
species
Linear Regression
• Fix intercept at (0,0)
• Fit line between
divergence estimates and
calibration times
• Calculate regression and
prediction confidence limits
• A = regression line
• B1-B2 = 95% CI of
regression line
• C1-C2 = 95% CI for
predicted time values
Molecular Systematics p536
Molecular Dating
Sources of Error (assuming
constant rates)
• Both X and Y values only estimates
–
–
–
–
substitution model could be incorrect
tree could be incorrect
errors in orthology assignment
Poisson variance is large
• Pairwise divergences correlated (Molec Systematics p534)
– inflates correlation between divergence & time
• Sometimes calibrations correlated
– if using derived calibration points
• Error in inferring slope
• Confidence interval for predictions much larger than
confidence interval for slope
Working Around Rate Heterogeneity
1. Identify lineages that deviate and remove them
2. Quantify degree of rate variation to put limits on
possible divergence dates
– requires several calibration dates, not always
available
– gives very conservative estimates of molecular
dates
3. Explicitly model rate variation (relaxed clocks)
Relaxing the Molecular Clock
Rutschmann 2006 (review)
• Likelihood analysis
– Assign each branch a rate parameter
• explosion of parameters, not realistic
– User can partition branches based on domain knowledge
– Rates of partitions are independent
• Nonparametric methods smooth rates along tree and
penalized likelihood (program r8s)
• Bayesian approach
– stochastic model of evolutionary change
– prior distribution of rates:
• Autocorrelation: BEAST and Multidivtime
• Non-autocorrelation: BEAST (can also incorporate uncertainty in topology)
Multiple Gene Loci
• “Trying to estimate time of divergence
from one protein is like trying to estimate
the average height of humans by
measuring one human”
--Molecular Systematics p539
• Ideally:
– use multiple genes
– use multiple calibration points
Even so, be Very cautious about
divergence time inferences
• Point estimates are absurd
• Sample errors often based
only on the difference between
estimates in the same study
• Even estimates with confidence
intervals unlikely to really capture
all sources of variance
General References
Reviews/Critiques
1. Bromham and Penny. The modern molecular clock, Nature
Genetics, 2003.
2. Graur and Martin. Reading the entrails of chickens...the illusion of
precision. Trends in Genetics, 2004.
3. Rutschmann.2006 Molecular dating of phylogenetic trees: A brief
review of current methods that estimate divergence times. Diversity
and Distributions
Textbooks:
1. Molecular Systematics. 2nd edition. Edited by Hillis, Moritz, and
Mable.
2. Inferring Phylogenies. Felsenstein.
3. Molecular Evolution, a phylogenetic approach. Page and Holmes.
4. Chapter 11 textbook “The Phylogenetic Handbook”
Rate Heterogeneity References
Dealing with Rate Heterogeneity
1. Yang and Yoder. Comparison of likelihood and bayesian methods for
estimating divergence times. Syst. Biol, 2003.
2. Kishino, Thorne, and Bruno. Performance of a divergence time
estimation method under a probabilistic model of rate evolution. Mol.
Biol. Evol, 2001.
3. Huelsenbeck, Larget, and Swofford. A compound poisson process
for relaxing the molecular clock. Genetics, 2000.
Testing for Rate heterogeneity
1. Takezaki, Rzhetsky and Nei. Phylogenetic test of the molecular clock
and linearized trees. Mol. Bio. Evol., 1995.
2. Bromham, Penny, Rambaut, and Hendy. The power of relative rates
test depends on the data. J Mol Evol, 2000.
3. Wilcox, T. P., F. J. Garcia de Leon, D. A. Hendrickson, and D. M.
Hillis. 2004. Convergence among cave catfishes: long-branch
attraction and a Bayesian relative rates test. Mol. Phylogenet. Evol.
31:1101-1113.
Download