Document

advertisement
Human Molecular Evolution
Lecture 2
Molecular phylogenies and
molecular clocks
You can download a copy of these slides
from www.stats.ox.ac.uk/~harding
Concepts and topics to be covered in
this lecture
• Classification and Phylogeny
• Controversies in the phylogenetic systematics
of primates
• Constructing phylogenetic trees
• Molecular clocks
• Neutral theory basis for a constant evolutionary
rate given by the mutation rate.
From classification to phylogeny
What is the difference between classification and
phylogeny?
Classification: grouping of taxa based on similarity.
Phylogeny: grouping of taxa based on descent from a common
ancestor (monophyletic origin) and resolution of a tree
Both classification and phylogeny use trees to show
relationships.
But classification (numerical taxonomy) can use other display
methods instead of trees.
In phylogeny, trees provide information not only about relationships,
but also about common ancestors and time-scales.
Let’s try to define some terms…
Classification
• In the 1960s and 70s, numerical taxonomy offered methods for
identifying natural groups among taxa, ie classification, by
statistical analysis of variation, eg morphology, and serum and
electrophoretic proteins.
• Dendograms or phenograms (from numerical taxonomy) classify
populations or taxa into related groups. These groupings may be
due to relatively recent evolutionary divergence (phylogeny), to
gene flow, or to natural selection (convergence).
Matrix of distances between taxa
Pattern of relationships in
an unrooted tree
A phylogenetic problem: finding the root
With 5 taxa in one unrooted tree, 7 rooted trees are
possible
Dendograms and Cladograms
CGT
• Dendograms or phenograms (from numerical taxonomy) show how
taxa group together. Hierarchial classification may indicate
phylogeny.
• In evolutionary biology, numerical taxonomy has been largely
replaced by phylogenetic systematics.
• One important method used in phylogenetic systematics is called
cladistics.
Reconstructing the history of character
evolution (cladistics)
• With molecular data, the character states for cladistics
are given by DNA nucleotides and the steps are due to
mutations.
• Cladistic methods are based on the rule of maximum
parsimony. The best tree is given by the shortest
number of steps required for a character to evolve
from its ancestral state at the root, to all its descendant
forms.
• The philosophical aim of cladistics is to logically
deduce the ‘true’ phylogeny. But what if the data give
you conflicting information about which construction is
the best tree? Cladists designed rules for choosing
which information to use and which to ignore. Other
phylogenetic systematists went on to develop new
statistical methods.
Parsimony and likelihood
• Parsimony was originally proposed by Edwards and
Cavalli-Sforza as an approximation to maximum
likelihood (a statistical evaluation of all the possibilities).
• Faced with serious computational problems in computing
likelihoods, they suggested that under some
circumstances the shortest tree may also be the
maximum likelihood tree.
• But with the development of fast modern computers,
parsimony and cladistics have been integrated into
methods of maximum likelihood. These methods are
used to evaluate which of many likely trees (given the
data) seems to be the most probable.
Features of trees used for
phylogenies
• A tree consists of nodes
connected by branches, also
called edges in an unrooted tree.
• Taxa are placed at terminal nodes
of trees.
• Internal nodes in phylogenetic
trees represent hypothetical
ancestors. (No special meaning
in classifications.)
• The root of the tree is an internal
node that represents the common
ancestor of all taxa in the tree.
C
H
S
G
O
G
C: Chimp
H: Human
G: Gorilla
S: Sivapithecus
(Extinct ape)
O: Orangutan
G: Gibbon
Phylogenetic trees from DNA variation
1)
2)
3)
4)
5)
6)
-A-------------------------------------------C-------------G----------------------C-------------------------------------A-----------------C-----------------T
1)
2)
3)
4)
5)
6)
-A-------------------------A---------------------C--A----------T---------C----T---G---------A---------T---G-------------------------G------------------
Using trees to show phylogeny
• All of life is related by common ancestry. Recovering these
ancestral relationships is a major goal of evolutionary biology.
• A tree is a mathematical structure that has many uses: trees can be
descriptive and used to show clusters of similar taxa (classification).
• Trees are also used as evolutionary models for phylogenies.
• To use a tree to show phylogeny we want to:
– Define a monophyletic group ie a group of taxa with a single common
ancestor.
– Reconstruct the history of character evolution along each lineage from
the root. This is the aim of cladistic methods. Characters may be
morphologies or DNA markers.
– Find the root (hypothetical common ancestor)
– Make the branch lengths proportional to time. Information from fossils
and molecular clocks.
Monophyletic
group
Monophyletic
group
Not a
Monophyletic
group
H
H
C
C
C
C
C
C
G
O
These monophyletic groups are clades
and include all the descendants of their
single common ancestral species.
O
Here lineages are
missing: this is a
paraphyletic group.
Monkeys: a paraphyletic group (Why?)
Spider
monkey
Old World monkeys
Monkeys
of New
World, ie
(Africa & Eurasia)
Americas
White-faced
monkey
Patas monkey
Colobus
Rhesus
The phylogeny for anthropoids
Old World monkeys
(OWM) share a
common ancestor with
Great Apes (including
humans) as a
monophyletic clade
called: Catarrhines.
Platyrrhine New World
monkeys are a sister
group.
H
Great
Apes
Catarrhines.
C
C
G
O
Platyrrhines
Tarsier
Johnson et al. (2001) Nature 413: 514-519.
These anthropoids are a
sister group to tarsiers.
Controversy in
phylogenetics of Old
World monkeys
•
•
Morphologists classified
mangabeys (Cercocebus) as
belonging together in a natural
taxon, separate from a different
group including mandrills,
baboons and geladas.
But molecular systematists group
terrestrial mangabeys
(Cercocebus) with mandrills and
arboreal mangabeys
(Lophocebus) with baboons and
geladas.
Terrestrial
mangabey
Mandrill
Arboreal
mangabey
Baboon
Red cap
mangabey
Polyphyletic
group
Mandrill
Polyphyletic origins?
• For most of the 20th century anthropoids (humans, apes and
monkeys) were grouped together but considered to have
polyphyletic origins, ie to evolved several times from different
ancestors.
• It was assumed that anthropoids were a grade that was attained
independently by the platyrrhines in the New World (Americas)
and the catarrhines in the Old World (Africa+Europe) each
evolving from different prosimian ancestors.
• What is a prosimian? See below. Now defined as a paraphyletic
group that includes tarsiers and strepsirrhines
Strepsirrhines
Ring tailed
lemur
(Madagascar)
Tarsier
(endemic
to Asia)
Pygmy loris
Controversy in Primate
Phylogenetics: the Tarsius problem
Tarsiers are a problem: they
share some features with
Anthropoids, others with
lorises (Strepsirhini).
Some molecular phylogenies
do group Tarsius +
Lemuriformes as
monophyletic Suborder
Prosimii
But, monophyletic Haplorhini
vs Strepsirhini (shown
here) has been preferred
since 1980.
Note: Not much
morphological evolution
apparent on the branch
leading to tarsiers.
Estimating branch lengths: fossils and
molecules
C
• Molecular clock proposed by Zuckerkandl &
Pauling (1962): rate of evolution in terms of
substitutions (fixations) is constant per year†
per site.
• The amount of evolution measured by fixations
is related to time, which we can show in the
tree by branch length.
• The time scale is provided by a ‘known’
divergence given by dated fossils.
• In the 1960s and 70s, concordances with
current interpretations of the fossil record were
used to validate molecular phylogenies. Now,
phylogenies from molecular (DNA) data provide
the ‘gold standard’.
• What is the evidence that fixations (i.e.
substitutions throughout the species of one
amino-acid or one nucleotide for another) ‘tick’
at a constant rate?
B
H
G
O
C: Common Chimp
B: Bonobo
H: Human
G: Gorilla
O: Orangutan
A Molecular Clock
• Original observation: percentages of amino-acids per
protein (eg haemoglobin) that differ between species pairs
are related to ‘known’ divergence times (D) estimated from
the fossil record.
^ is the number of amino-acid substitutions
• K
• Linear relationship implies a constant evolutionary rate.
0.25
^K=-ln(1-D)
0.2
0.15
K
0.1
Ref: p. 331 of
Hartl & Clark,
3rd ed.
0.05
0
0
20
40
60
Time (millions of years)
80
100
Estimating divergence times
• Need some pairs of extant species for which the
fossil record identifies a common ancestor and the
time it lived.
• Generate molecular divergence data for proteins or
genes from extant species. For proteins, divergence
is measured at level of amino-acids; for genes,
divergence can be measured at the level of DNA
nucleotides.
• Estimate an evolutionary rate from molecular data
assuming a molecular clock (constant rate).
• Apply a molecular clock estimate from molecular
divergence to date the common ancestors of species
for which no fossils are unavailable.
Is molecular evolution really clock-like?
• If DNA evolves in a clock-like manner, the number of
mutational steps along lineages, eg from humans or
chimps or gorillas, back to their common ancestor, should
be the same, roughly, starting from any terminal node.
2
7
8
3
GTCCATCAA
5
4
human
GACCATTT
chimp
GTGCTTCT
gorilla
CTCGAACA
6 1
This might be true if most substitution are neutral.
Positive selection ie adaptation would clearly speed up
the rate of substitution and make the clock tick faster.
A theoretical basis for a molecular clock
from neutral theory
• The evolutionary rate of divergence is a product of the
mutation rate and the probability that a mutation is fixed
in a population, ie that complete substitution of one
variant for another has occurred.
• Note that the evolutionary rate is estimated from fixed
differences between species.
• Kimura (1980) showed that if the probability of fixation is
given by a process of genetic drift:
– the rate of fixation estimates the mutation rate.
– the rate of fixation of neutral variants is independent of
population size fluctuation, and so may be constant.
Trajectories for neutral alleles
Testing the clock assumption
• First calculate the likelihood (product of a set of
independent probabilities) of the data given a phylogenetic
model that assumes a molecular clock.
• Then add a parameter to the model so branch length can
vary and look for a set of branch lengths that maximize the
likelihood of the data (maximum likelihood method).
• Compare the two models using a log-likelihood ratio test.
Rejecting the molecular clock hypothesis suggests that
there is rate heterogeneity among lineages. This may imply
differential selective pressures on the gene evolving in
different lineages.
A surprisingly recent common
ancestor
• Before Sarich and Wilson (1967) Ramapithecus, a
fossil ape dated to ~15 million years, was thought
to be ancestral to humans, and that there was an
earlier split between the human lineage and ‘apes’.
• Then Sarich and Wilson (1967) applied a molecular
clock to hominoid evolution. Science 158:1200-1203.
Sivapithecus
Facial
morphology
of an orangutan
ONLY
5 Myr!
Another controversy: resolving the HumanAfrican apes trichotomy
G
Morphology suggested a closer
relationship between chimps and
gorillas, ie root at position 3.
1?
2?
3?
C
H
On which branch do we locate
the root if the branch lengths
are nearly the same?
Molecular evidence has resolved the
phylogeny by placing a more recent
common ancestor for chimps and
humans, in a sister clade to gorillas, ie
root is at position 1.
Why have so many controversies
arisen in resolving phylogeny?
•
Taxa can look similar but one taxon may be more
closely related to another that looks different
because:
–
–
•
a lot more morphological evolution may occur on one
lineage from the common ancestor than on the other, eg
anthropoids versus tarsius
there is selection in diverged lineages for the same
functional adaptations – that makes them look similar
(convergence). Characters that evolve the same state
multiple times independently are called homologies. An
example may be knuckle-walking in chimps and gorillas.
Over the last 25 years, phylogenetic analysis has
increasingly relied on molecular (protein and DNA)
analyses rather than on morphology, but the
problems of estimating branch length and detecting
convergent selection don’t disappear.
Phylogenetic relationships within the
Superfamily Hominoidea
15-20 Myr
• Hylobatidae (Gibbins)
• Hominidae
– Pongidae
•
Pongo pygmaeus (orangutan)
– Homininae
• Gorilla gorilla
• Pan troglodytes (“Common
chimp”)
• Pan paniscus (“Pygmy
chimp”)
• Homo sapiens (modern
humans)
5-8 Myr
Phylogenetic systematics
• Phylogenetic methods can be applied to
morphological characters from bones and fossils, but
these methods have become most powerful in
application to molecular (protein and DNA) data.
• Phylogenetic systematics once meant cladistics, but it
now includes a broad range of statistical methods.
• First propose a phylogenetic model (ie a particular
tree) that reconstructs the evolutionary history for the
data and then statistically evaluate the data assuming
the model (eg using maximum likelihood methods).
• Is the evolutionary rate clock-like? If not, try variable
branch lengths (rate heterogeneity).
Phylogenetic trees are also
constructed from sequence variation
within species
• mtDNA haplotypes
• Y chromosome
haplotypes
• X chromosomes
(sampled from
males)
Interpreting these
phylogenies seems
simple enough but is
problematic.
X chromosome
phylogeny
Horizontal gene transfer: a problem
for phylogenetic models
• Mediated between genomes of different species
of bacteria
– Between other species, consider transposable
elements, introduced by viruses or retroviruses.
• Mediated within species by sex (and
transposable elements)
– Autosomal genomes are mosaic compositions
inherited from many ancestors
– Lack on recombination between mtDNA genomes
and between the Non-Recombining part of the Y
chromosome (NRY), allows reconstruction of
molecular phylogenies, otherwise called gene
genealogies
Why don’t molecular phylogenies from
mtDNA or NRY polymorphism tell the
full story of human (or chimp) evolution?
• No sex in phylogenetic
models;
just reproduction or
extinction
• Sex is a key feature of
population genetic models;
though usually reduced to
the more dull concepts of
random union of gametes
in a gene pool, or gene
flow between subdivided
populations
Molecular phylogenies: progress
summary
•
Molecular phylogenies: models for evolutionary
history (descent with modification);
–
•
•
basic assumption: new-formed species or sub-species
diverge by accumulating genetic differences, can be
modelled graphically as a tree
Problems for these models are caused by
– Convergence due to similar adaptations
– Heterogeneity in rates of evolution (upsets
molecular clock assumption)
– Horizontal gene transfer between lineages
Solutions to these problems
–
–
Choice of data
Choice of specific phylogenetic reconstruction algorithm – eg
there are options for taking rate heterogeneity into account.
Additional reading
• Page, RDM and Holmes EC (1998) Molecular Evolution, A
phylogenetic approach. Blackwell Science Ltd, Oxford
• Fleagle, JG (2000) The Century of the Past: One Hundred Years in
the Study of Primate Evolution. Evolutionary Anthropology 9:87-100
• Heesy, CP (2001) Rethinking Anthropoid Origins. Evolutionary
Anthropology 10:119-121
• Hartwig, WC & Rosenberger, AL (2001) Primates (Lemurs, Apes and
Monkeys). Encyclopedia of Life Sciences http://www.els.net
• Martin, AP (2001) Molecular clocks. Encyclopedia of Life Sciences
• Kumar, S. & Filipski AJ (2001) Molecular Phylogeny Reconstruction.
Encyclopedia of Life Sciences
Theoretical Principles of Neutral Theory:
Molecular Evolution
1. 1. The rate of fixation of neutral mutations is
determined by m, the neutral mutation rate,
and is independent of population size. (For
larger N, the fixation probability is lower, but
the number of mutations per generation for
the population, 2Nm on average, is greater.)
2. 2. The average time interval between
consecutive substitutions (fixed mutations)
is 1/m generations† (molecular clock).
3. The average time interval for fixation of a
new allele is 4Ne generations.
Reference: Hartl and Clark, 3rd edition, p. 316-319
Theoretical Principles of Neutral Theory
4. The magnitude of genetic drift is greater in
smaller populations. Assuming a random
sampling model for drift, the probability of an
allele attaining fixation (100%) is given by its
frequency (1/2N for a new mutant allele).
5. Levels of allelic diversity are set by the steady
state balance of new mutations (gain) against
drift (loss). Assuming genetic drift with
infinite-alleles mutation, an estimate for the
equilibrium heterozygosity is given by
4Nem/(1+4Nem).
Download