Human Molecular Evolution Lecture 2 Molecular phylogenies and molecular clocks You can download a copy of these slides from www.stats.ox.ac.uk/~harding Concepts and topics to be covered in this lecture • Classification and Phylogeny • Controversies in the phylogenetic systematics of primates • Constructing phylogenetic trees • Molecular clocks • Neutral theory basis for a constant evolutionary rate given by the mutation rate. From classification to phylogeny What is the difference between classification and phylogeny? Classification: grouping of taxa based on similarity. Phylogeny: grouping of taxa based on descent from a common ancestor (monophyletic origin) and resolution of a tree Both classification and phylogeny use trees to show relationships. But classification (numerical taxonomy) can use other display methods instead of trees. In phylogeny, trees provide information not only about relationships, but also about common ancestors and time-scales. Let’s try to define some terms… Classification • In the 1960s and 70s, numerical taxonomy offered methods for identifying natural groups among taxa, ie classification, by statistical analysis of variation, eg morphology, and serum and electrophoretic proteins. • Dendograms or phenograms (from numerical taxonomy) classify populations or taxa into related groups. These groupings may be due to relatively recent evolutionary divergence (phylogeny), to gene flow, or to natural selection (convergence). Matrix of distances between taxa Pattern of relationships in an unrooted tree A phylogenetic problem: finding the root With 5 taxa in one unrooted tree, 7 rooted trees are possible Dendograms and Cladograms CGT • Dendograms or phenograms (from numerical taxonomy) show how taxa group together. Hierarchial classification may indicate phylogeny. • In evolutionary biology, numerical taxonomy has been largely replaced by phylogenetic systematics. • One important method used in phylogenetic systematics is called cladistics. Reconstructing the history of character evolution (cladistics) • With molecular data, the character states for cladistics are given by DNA nucleotides and the steps are due to mutations. • Cladistic methods are based on the rule of maximum parsimony. The best tree is given by the shortest number of steps required for a character to evolve from its ancestral state at the root, to all its descendant forms. • The philosophical aim of cladistics is to logically deduce the ‘true’ phylogeny. But what if the data give you conflicting information about which construction is the best tree? Cladists designed rules for choosing which information to use and which to ignore. Other phylogenetic systematists went on to develop new statistical methods. Parsimony and likelihood • Parsimony was originally proposed by Edwards and Cavalli-Sforza as an approximation to maximum likelihood (a statistical evaluation of all the possibilities). • Faced with serious computational problems in computing likelihoods, they suggested that under some circumstances the shortest tree may also be the maximum likelihood tree. • But with the development of fast modern computers, parsimony and cladistics have been integrated into methods of maximum likelihood. These methods are used to evaluate which of many likely trees (given the data) seems to be the most probable. Features of trees used for phylogenies • A tree consists of nodes connected by branches, also called edges in an unrooted tree. • Taxa are placed at terminal nodes of trees. • Internal nodes in phylogenetic trees represent hypothetical ancestors. (No special meaning in classifications.) • The root of the tree is an internal node that represents the common ancestor of all taxa in the tree. C H S G O G C: Chimp H: Human G: Gorilla S: Sivapithecus (Extinct ape) O: Orangutan G: Gibbon Phylogenetic trees from DNA variation 1) 2) 3) 4) 5) 6) -A-------------------------------------------C-------------G----------------------C-------------------------------------A-----------------C-----------------T 1) 2) 3) 4) 5) 6) -A-------------------------A---------------------C--A----------T---------C----T---G---------A---------T---G-------------------------G------------------ Using trees to show phylogeny • All of life is related by common ancestry. Recovering these ancestral relationships is a major goal of evolutionary biology. • A tree is a mathematical structure that has many uses: trees can be descriptive and used to show clusters of similar taxa (classification). • Trees are also used as evolutionary models for phylogenies. • To use a tree to show phylogeny we want to: – Define a monophyletic group ie a group of taxa with a single common ancestor. – Reconstruct the history of character evolution along each lineage from the root. This is the aim of cladistic methods. Characters may be morphologies or DNA markers. – Find the root (hypothetical common ancestor) – Make the branch lengths proportional to time. Information from fossils and molecular clocks. Monophyletic group Monophyletic group Not a Monophyletic group H H C C C C C C G O These monophyletic groups are clades and include all the descendants of their single common ancestral species. O Here lineages are missing: this is a paraphyletic group. Monkeys: a paraphyletic group (Why?) Spider monkey Old World monkeys Monkeys of New World, ie (Africa & Eurasia) Americas White-faced monkey Patas monkey Colobus Rhesus The phylogeny for anthropoids Old World monkeys (OWM) share a common ancestor with Great Apes (including humans) as a monophyletic clade called: Catarrhines. Platyrrhine New World monkeys are a sister group. H Great Apes Catarrhines. C C G O Platyrrhines Tarsier Johnson et al. (2001) Nature 413: 514-519. These anthropoids are a sister group to tarsiers. Controversy in phylogenetics of Old World monkeys • • Morphologists classified mangabeys (Cercocebus) as belonging together in a natural taxon, separate from a different group including mandrills, baboons and geladas. But molecular systematists group terrestrial mangabeys (Cercocebus) with mandrills and arboreal mangabeys (Lophocebus) with baboons and geladas. Terrestrial mangabey Mandrill Arboreal mangabey Baboon Red cap mangabey Polyphyletic group Mandrill Polyphyletic origins? • For most of the 20th century anthropoids (humans, apes and monkeys) were grouped together but considered to have polyphyletic origins, ie to evolved several times from different ancestors. • It was assumed that anthropoids were a grade that was attained independently by the platyrrhines in the New World (Americas) and the catarrhines in the Old World (Africa+Europe) each evolving from different prosimian ancestors. • What is a prosimian? See below. Now defined as a paraphyletic group that includes tarsiers and strepsirrhines Strepsirrhines Ring tailed lemur (Madagascar) Tarsier (endemic to Asia) Pygmy loris Controversy in Primate Phylogenetics: the Tarsius problem Tarsiers are a problem: they share some features with Anthropoids, others with lorises (Strepsirhini). Some molecular phylogenies do group Tarsius + Lemuriformes as monophyletic Suborder Prosimii But, monophyletic Haplorhini vs Strepsirhini (shown here) has been preferred since 1980. Note: Not much morphological evolution apparent on the branch leading to tarsiers. Estimating branch lengths: fossils and molecules C • Molecular clock proposed by Zuckerkandl & Pauling (1962): rate of evolution in terms of substitutions (fixations) is constant per year† per site. • The amount of evolution measured by fixations is related to time, which we can show in the tree by branch length. • The time scale is provided by a ‘known’ divergence given by dated fossils. • In the 1960s and 70s, concordances with current interpretations of the fossil record were used to validate molecular phylogenies. Now, phylogenies from molecular (DNA) data provide the ‘gold standard’. • What is the evidence that fixations (i.e. substitutions throughout the species of one amino-acid or one nucleotide for another) ‘tick’ at a constant rate? B H G O C: Common Chimp B: Bonobo H: Human G: Gorilla O: Orangutan A Molecular Clock • Original observation: percentages of amino-acids per protein (eg haemoglobin) that differ between species pairs are related to ‘known’ divergence times (D) estimated from the fossil record. ^ is the number of amino-acid substitutions • K • Linear relationship implies a constant evolutionary rate. 0.25 ^K=-ln(1-D) 0.2 0.15 K 0.1 Ref: p. 331 of Hartl & Clark, 3rd ed. 0.05 0 0 20 40 60 Time (millions of years) 80 100 Estimating divergence times • Need some pairs of extant species for which the fossil record identifies a common ancestor and the time it lived. • Generate molecular divergence data for proteins or genes from extant species. For proteins, divergence is measured at level of amino-acids; for genes, divergence can be measured at the level of DNA nucleotides. • Estimate an evolutionary rate from molecular data assuming a molecular clock (constant rate). • Apply a molecular clock estimate from molecular divergence to date the common ancestors of species for which no fossils are unavailable. Is molecular evolution really clock-like? • If DNA evolves in a clock-like manner, the number of mutational steps along lineages, eg from humans or chimps or gorillas, back to their common ancestor, should be the same, roughly, starting from any terminal node. 2 7 8 3 GTCCATCAA 5 4 human GACCATTT chimp GTGCTTCT gorilla CTCGAACA 6 1 This might be true if most substitution are neutral. Positive selection ie adaptation would clearly speed up the rate of substitution and make the clock tick faster. A theoretical basis for a molecular clock from neutral theory • The evolutionary rate of divergence is a product of the mutation rate and the probability that a mutation is fixed in a population, ie that complete substitution of one variant for another has occurred. • Note that the evolutionary rate is estimated from fixed differences between species. • Kimura (1980) showed that if the probability of fixation is given by a process of genetic drift: – the rate of fixation estimates the mutation rate. – the rate of fixation of neutral variants is independent of population size fluctuation, and so may be constant. Trajectories for neutral alleles Testing the clock assumption • First calculate the likelihood (product of a set of independent probabilities) of the data given a phylogenetic model that assumes a molecular clock. • Then add a parameter to the model so branch length can vary and look for a set of branch lengths that maximize the likelihood of the data (maximum likelihood method). • Compare the two models using a log-likelihood ratio test. Rejecting the molecular clock hypothesis suggests that there is rate heterogeneity among lineages. This may imply differential selective pressures on the gene evolving in different lineages. A surprisingly recent common ancestor • Before Sarich and Wilson (1967) Ramapithecus, a fossil ape dated to ~15 million years, was thought to be ancestral to humans, and that there was an earlier split between the human lineage and ‘apes’. • Then Sarich and Wilson (1967) applied a molecular clock to hominoid evolution. Science 158:1200-1203. Sivapithecus Facial morphology of an orangutan ONLY 5 Myr! Another controversy: resolving the HumanAfrican apes trichotomy G Morphology suggested a closer relationship between chimps and gorillas, ie root at position 3. 1? 2? 3? C H On which branch do we locate the root if the branch lengths are nearly the same? Molecular evidence has resolved the phylogeny by placing a more recent common ancestor for chimps and humans, in a sister clade to gorillas, ie root is at position 1. Why have so many controversies arisen in resolving phylogeny? • Taxa can look similar but one taxon may be more closely related to another that looks different because: – – • a lot more morphological evolution may occur on one lineage from the common ancestor than on the other, eg anthropoids versus tarsius there is selection in diverged lineages for the same functional adaptations – that makes them look similar (convergence). Characters that evolve the same state multiple times independently are called homologies. An example may be knuckle-walking in chimps and gorillas. Over the last 25 years, phylogenetic analysis has increasingly relied on molecular (protein and DNA) analyses rather than on morphology, but the problems of estimating branch length and detecting convergent selection don’t disappear. Phylogenetic relationships within the Superfamily Hominoidea 15-20 Myr • Hylobatidae (Gibbins) • Hominidae – Pongidae • Pongo pygmaeus (orangutan) – Homininae • Gorilla gorilla • Pan troglodytes (“Common chimp”) • Pan paniscus (“Pygmy chimp”) • Homo sapiens (modern humans) 5-8 Myr Phylogenetic systematics • Phylogenetic methods can be applied to morphological characters from bones and fossils, but these methods have become most powerful in application to molecular (protein and DNA) data. • Phylogenetic systematics once meant cladistics, but it now includes a broad range of statistical methods. • First propose a phylogenetic model (ie a particular tree) that reconstructs the evolutionary history for the data and then statistically evaluate the data assuming the model (eg using maximum likelihood methods). • Is the evolutionary rate clock-like? If not, try variable branch lengths (rate heterogeneity). Phylogenetic trees are also constructed from sequence variation within species • mtDNA haplotypes • Y chromosome haplotypes • X chromosomes (sampled from males) Interpreting these phylogenies seems simple enough but is problematic. X chromosome phylogeny Horizontal gene transfer: a problem for phylogenetic models • Mediated between genomes of different species of bacteria – Between other species, consider transposable elements, introduced by viruses or retroviruses. • Mediated within species by sex (and transposable elements) – Autosomal genomes are mosaic compositions inherited from many ancestors – Lack on recombination between mtDNA genomes and between the Non-Recombining part of the Y chromosome (NRY), allows reconstruction of molecular phylogenies, otherwise called gene genealogies Why don’t molecular phylogenies from mtDNA or NRY polymorphism tell the full story of human (or chimp) evolution? • No sex in phylogenetic models; just reproduction or extinction • Sex is a key feature of population genetic models; though usually reduced to the more dull concepts of random union of gametes in a gene pool, or gene flow between subdivided populations Molecular phylogenies: progress summary • Molecular phylogenies: models for evolutionary history (descent with modification); – • • basic assumption: new-formed species or sub-species diverge by accumulating genetic differences, can be modelled graphically as a tree Problems for these models are caused by – Convergence due to similar adaptations – Heterogeneity in rates of evolution (upsets molecular clock assumption) – Horizontal gene transfer between lineages Solutions to these problems – – Choice of data Choice of specific phylogenetic reconstruction algorithm – eg there are options for taking rate heterogeneity into account. Additional reading • Page, RDM and Holmes EC (1998) Molecular Evolution, A phylogenetic approach. Blackwell Science Ltd, Oxford • Fleagle, JG (2000) The Century of the Past: One Hundred Years in the Study of Primate Evolution. Evolutionary Anthropology 9:87-100 • Heesy, CP (2001) Rethinking Anthropoid Origins. Evolutionary Anthropology 10:119-121 • Hartwig, WC & Rosenberger, AL (2001) Primates (Lemurs, Apes and Monkeys). Encyclopedia of Life Sciences http://www.els.net • Martin, AP (2001) Molecular clocks. Encyclopedia of Life Sciences • Kumar, S. & Filipski AJ (2001) Molecular Phylogeny Reconstruction. Encyclopedia of Life Sciences Theoretical Principles of Neutral Theory: Molecular Evolution 1. 1. The rate of fixation of neutral mutations is determined by m, the neutral mutation rate, and is independent of population size. (For larger N, the fixation probability is lower, but the number of mutations per generation for the population, 2Nm on average, is greater.) 2. 2. The average time interval between consecutive substitutions (fixed mutations) is 1/m generations† (molecular clock). 3. The average time interval for fixation of a new allele is 4Ne generations. Reference: Hartl and Clark, 3rd edition, p. 316-319 Theoretical Principles of Neutral Theory 4. The magnitude of genetic drift is greater in smaller populations. Assuming a random sampling model for drift, the probability of an allele attaining fixation (100%) is given by its frequency (1/2N for a new mutant allele). 5. Levels of allelic diversity are set by the steady state balance of new mutations (gain) against drift (loss). Assuming genetic drift with infinite-alleles mutation, an estimate for the equilibrium heterozygosity is given by 4Nem/(1+4Nem).