Phylogenetic Interpretation Dr Laura Emery Laura.Emery@ebi.ac.uk www.ebi.ac.uk Objectives • After this tutorial you should be able to… • Discuss the impact of a range of biological phenomena upon phylogenetic inference • Appreciate some challenges and limitations of phylogenetic approach • Interpret published phylogenies (and your own) Phylogenetic interpretation is essential throughout data analysis Decide upon and implement method Data assessment - known biology - additional data (e.g. geography) Investigate unexpected and unresolved aspects further - consider including more data Formulate hypotheses No No Yes Can you validate this? Phylogeneti c Result(s) Answere d your question? Yes Final phylogeny and analysis Phylogenetic interpretation skill set 1. Tree-thinking skills • Revise: relatedness, trait evolution, confidence, homology 2. Knowledge of phylogenetic methods and their limitations 3. Knowledge of biological processes affecting sequence evolution • gene duplication, recombination, horizontal gene transfer, population genetic processes, and many more! 4. Knowledge of the data you wish to interpret Recap of tree-thinking skills 1. 2. 3. 4. Relatedness Trait evolution Confidence Homology 1. Relatedness: taxa that share a more recent common ancestor are more closely most recent common ancestor related shared with second cousin most recent common ancestor shared with first cousin 2. Trait evolution • It can be useful to map traits onto phylogenies as a first step in inferring their evolutionary histories • Interpreting trait evolution in its phylogenetic context is rarely straightforward! • Assumptions must be made regarding the loss and gain of traits • It is often useful to construct alternative scenarios • Then we have to decide upon the most plausible (character state methods e.g. MP and ML can be applied) Example: The Evolution of Mitochondria origin of eukaryotes Ginger et al. 2010 Example: The Evolution of Mitochondria G = gain L = loss G G G G G G G G G G G G origin of eukaryotes Ginger et al. 2010 Scenario one: Mitochondria evolved from mitosomes Example: The Evolution of Mitochondria G = gain L = loss G L G L G L L G origin of eukaryotes Ginger et al. 2010 Scenario two: Mitochondria occurred at the origin of eukaryotes 3. Tree Confidence Question Does this tree support the grouping of pelecaniforms and ciconiiforms as a monophyletic group? 4. Homology is similarity due to shared ancestry Example: limbs and wings • Limbs are homologous they share a common ancestor • Wings are not homologous they are an analogous as they have evolved similarity independently Homology Question: Trap-jaws in ants Based on this phylogeny, which scenario do you think is more likely? • trap-jaws are homologous • trap-jaws are analogous and have evolved independently four times Moreau et al. 2006 Homology Question: Trap-jaws in ants Based on this phylogeny, which scenario do you think is more likely? L L L • trap-jaws are homologous L • trap-jaws are analogous and have evolved independently G four times L L L Moreau et al. 2006 Scenario one: Trap-jaws are homologous Homology Question: Trap-jaws in ants G Based on this phylogeny, which scenario do you think is more likely? G • trap-jaws are homologous • trap-jaws are analogous and have evolved independently four times G G more parsimonious Moreau et al. 2006 Scenario two: Trap-jaws are analogous Phylogenetic interpretation skill set 1. Tree-thinking skills • Revise: relatedness, trait evolution, confidence, homology 2. Knowledge of phylogenetic methods and their limitations 3. Knowledge of biological processes affecting sequence evolution • gene duplication, recombination, horizontal gene transfer, population genetic processes, and many more! 4. Knowledge of the data you wish to interpret Processes that affect sequence evolution 1. 2. 3. 4. 5. 6. 7. Gene/genome duplication and divergence Recombination Horizontal gene transfer Coevolution Migration Rate and time of divergence Other 1. Gene duplication Gene duplication and subsequent divergence can result in novel gene functions (it can also result in pseudogenes) • Genes that are homologous due to gene duplication are paralogous • Genes that are homologous due to speciation are orthologous Gene duplication question This is a tree of gene family that has undergone one gene duplication event in its evolutionary past. Where on the tree did this occur? Is the event well-supported? Cells Tissues & Organs 2007 2. Recombination • Single or small numbers of events: • Within genes • Between genes • Where there is extensive recombination - a phylogenetic approach is inappropriate (not tree-like) Recombination example: Dengue-2 virus data from E. Holmes, figure from A. Rambaut Recombination Question Can you spot the recombinant strain? Mauro et al 2003 3. Horizontal Gene Transfer (HGT/LGT) Horizontal gene transfer violates the assumption that sequences have evolved in a tree-like manner • Where sparse, can be detected by comparing with species phylogeny • Where extensive, phylogenetic approach is inappropriate Gogarten & Townsend 2005 Phylogenetics is not appropriate for highly recombinant taxa • Phylogenetics assumes that patterns of relatedness among taxa follow a treelike structure • Recombination and horizontal gene transfer produce networks • Avoid phylogenetics for: • Intraspecific sexual species (recombination at each meiosis) • Asexual species with extensive HGT (e.g. some Bacteria) Horizontal gene transfer question Can you spot the horizontally transferred gene? 4. Coevolution Where parasites or symbionts co-evolve with their hosts, both topologies are expected to be very similar. Weiss 2009 from Reed et al 2007 Coevolution Question Do these phylogenies provide evidence that the lice are inherited vertically? Hafner & Nadler 1988 6. Migration Patterns of migration influence phylogenetic topology, especially in structured populations Phylogeography example: Chimpanzees P. troglodytes and P.schweinfurthii are more dissimilar than you would expect given their proximity > Chimpanzees can't cross rivers! Gao et al 1999 Migration Question What can you infer about patterns of migration of the Taiwanese stagbeetle based upon this phylogeny? Black = Taiwan 5. Rate and time of divergence • Phylogenies can be used to date divergence times when some temporal information is known • e.g. carbon dating from fossil evidence • e.g. dates of sample isolation • Genetic change = Evolutionary rate x Divergence time (substitutions/site) (substitutions/site/year) (years) • If all lineages evolve at the same rate (i.e. there is a molecular clock) then branch lengths should reflect divergences times A B C E D Is there a molecular clock? • Zuckerland and Pauling (1962) • No. substitutions in haemoglobin roughly proportional to time based upon fossil datings Dating divergence with a molecular clock X d = genetic distance (branch length) We know time T since a and c diverged We want to find out time X since a and b diverged 1. Use T to estimate the evolutionary rate r r = d(a-c) / 2T 2. Use r to estimate time X X = 1/2 (d(a-b) / r) Dating Drosophila Divergence around Hawaii • The volcanic activity around Hawaii has produced a chain of islands; the oldest is furthest away from the mainland Figure Andrew Rambaut from • Several species including Drosophila have diverged Fleischer, McIntosh &Tarr 1998 with island formation Dating Drosophila Divergence in Hawaii • Island formation dates reflecting species’ divergence were plotted against genetic distance (branch length) • Genetic distance scaled linearly with divergences date, Genetic distance indicating the presence of a molecular clock gradient = evolutionary rate NB: Not all species exhibit a molecular clock! Time Fleischer, McIntosh &Tarr 1998 7. Other biological processes can complicate molecular analyses • • • • • • • • Population genetic processes Epidemiological processes Gene conversion Codon bias Hypermutable sites Concerted evolution Reassortment Many more… Summary: Phylogenetic interpretation skill set 1. Tree-thinking skills • Revise: relatedness, trait evolution, confidence, homology 2. Knowledge of phylogenetic methods and their limitations 3. Knowledge of biological processes affecting sequence evolution • gene duplication, recombination, horizontal gene transfer, population genetic processes, and many more! 4. Knowledge of the data you wish to interpret Further Reading • Molecular Evolution: A Phylogenetic Approach (1998) Roderic D M Page & Edward C Holmes, Blackwell Science, Oxford. • The Phylogenetic Handbook (2003), Marco Salemi and Anne-Mieke Vandamme Eds, Cambridge University Press, Cambridge. • Inferring Phylogenies (2003) Joseph Felsenstein, Sinauer. • Molecular Evolution (1997) Wen-Hsiung Li , Sinauer Train online • Free online courses • Learn in your own time, at your own pace • Created for life-science researchers • No previous knowledge of bioinformatics needed www.ebi.ac.uk/training/online Acknowledgements People • Andrew Rambaut (University of Edinburgh) team • Paul Sharp (University of Edinburgh) • Nick Goldman (EMBL-EBI) • Benjamin Redelings (Duke University) • Brian Moore (University of California, Davis) • Olivier Gascuel (University of Montpelier) • Aiden Budd (EMBL-EBI) Funding EMBL member states and… …and the EBI training Thank you! www.ebi.ac.uk Twitter: @emblebi Facebook: EMBLEBI Now it's your turn… • Open your tutorial manual and begin Tree-thinking quiz 2 (appendix 2) • The manual is available to download from: http://www.ebi.ac.uk/training/course/scuola-di-bioinformatica2013 • When you are finished you can mark your own. • Remember to ask for help at any stage!