Introduction to Phylogenetic Trees Alignment Recap • What is the fundamental principle underpinning multiple sequence alignment? • Homology - we want all columns in our alignment to consist of characters which share common ancestry • Alignments are made up from conserved and variable blocks • Alignment programs aim to maximise conserved blocks • CLUSTAL - very widely used, uses Progressive Sequence Alignment • MUSCLE - more sophisticated program, uses multiple processes to avoid the problems of ‘once a gap, always a gap’ • Gap penalties - gaps are rare in nature, so we want penalize them in our alignments. Use different gap penalties for datasets of closely and distantly related sequences Introduction to Phylogenetic Trees Aims: To learn how to read and interpret phylogenetic trees and get a general overview of phylogenetic analysis Objectives: at the end of this lecture you should: be able to understand what a phylogenetic tree is be able to differentiate between monophyletic, paraphyletic & polyphyletic groups be able to differentiate between orthologues & paralogues be able to write out phylogenetic trees in the Newick format understand the standard terminology used in phylogenetics Terminology b c d f e a phylogenetic tree = evolutionary tree = phylogeny = dendrogram - a graphic display of predicted evolutionary relationships - may be created for genes, proteins or species - consists of “branches” and “nodes” branches, also called “edges” - internal (nodes to nodes) or terminal (nodes to terminals) terminals = “leaves” = operational taxonomic units, “OTUs” - OTUs = organisms (taxa), tree = “species tree” - OTUs = genes or proteins, tree = “gene tree” g i h Terminology node = point at which two branches diverge - correspond to hypothetical last common ancestor - branches represent divergence event - gene tree: divergence = gene duplication event c - species tree: divergence = speciation event b f e a root = origin of the tree, or sub-tree d g i h Terminology Polytomy • Unresolved phylogenetic relationships • Three or more branches leading from one node • Polytomies may result from a lack of phylogenetic data (soft polytomy) • Soft polytomies may be resolved by increasing the phylogenetic signal, i.e. using more data • Polytomies may arise if multiple speciation events take place instantaneously (hard polytomy) - see the D. simulans complex • Hard polytomies cannot be resolved by increasing the volume of data Drosophila simulans complex Rooting Phylogenetic Trees: 1 2 1 2 root = oldest point in the tree if molecular clock -> root would be in the middle (i.e. common ancestor equidistant from everything) without a clock (i.e., in the real world) need external point of reference = outgroup, = anything not in your ingroup (= group of interest) for gene trees can use distant relative/gene family for species tree use sister group = closest relative to ingroup root - doesn’t change distances, but shows chronological order of events Phylograms & Cladograms Phylogram Cladogram Mb_spo11_1_jg Mb_spo11_1_jg Drosophila_me Drosophila_me Lg_jgi_Lotgi1 Lg_jgi_Lotgi1 Nematostella_ Nematostella_ Xenopus_tropi Xenopus_tropi Homo_sapiens_ Homo_sapiens_ Trichoplax_ad Trichoplax_ad At_spo11_3gi_ At_spo11_3gi_ Os_spo11_3gi_ Os_spo11_3gi_ Monosiga_brev Monosiga_brev Movata_DC4710 Movata_DC4710 Aspergillus_n Aspergillus_n Schizosacchar Schizosacchar Saccharomyces Saccharomyces Cryptococcus_ Cryptococcus_ Plasmodium_fa Plasmodium_fa Entamoeba_his Entamoeba_his Caenorhabditi Caenorhabditi At_spo11_2gi_ At_spo11_2gi_ Os_spo11_1gi_ Os_spo11_1gi_ 0.2 • Branch lengths informative • Scale bar (no. of subs/site) • Branch lengths non-informative • Easier to read • No scale bar Tree Conversion Coffee Chocolate Caviar Oyster Lobster Coffee Truffle Nori Coffee Coffee Chocolate Truffle = Caviar Oyster Lobster Nori Trees Are About Groupings monophyletic (pure) group (clade) paraphyletic group (convenience) polyphyletic group (similarities = parallel/convergent evolution) nodes define “clades” clade = monophyletic group = node plus all descendants - share unique common ancestor (relative to the rest of the tree) and common history Paraphyly & Polyphyly • The two groupings are often difficult to differentiate • The smaller number of parsimony steps indicates which is the more likely type of group Dikarya Zygomycota 1 Zygomycota 2 Zygomycota 3 Chytridiomycota 1 Zygomycota 4 Zygomycota 5 Chytridiomycota 2 Chytridiomycota 3 Chytridiomycota 4 Microsporidia Adapted from James et al. (2006) Nature 443:818-822 Paraphyly & Polyphyly • Chytridiomycota paraphyly: 6 steps • Chytridiomycota polyphyly: 4 steps Dikarya Zygomycota 1 Zygomycota 2 Zygomycota 3 Chytridiomycota 1 Zygomycota 4 Zygomycota 5 Chytridiomycota 2 Chytridiomycota 3 Chytridiomycota 4 Microsporidia Adapted from James et al. (2006) Nature 443:818-822 Paraphyly & Polyphyly • Zygomycota paraphyly: 3 steps • Zygomycota polyphyly: 5 steps Dikarya Zygomycota 1 Zygomycota 2 Zygomycota 3 Chytridiomycota 1 Zygomycota 4 Zygomycota 5 Chytridiomycota 2 Chytridiomycota 3 Chytridiomycota 4 Microsporidia Adapted from James et al. (2006) Nature 443:818-822 Questions • What type of group are the mammals? • a) monophyletic, b) paraphyletic, c) polyphyletic Mammal 1 Mammal 2 Mammal 3 Reptiles Amphibians Fish Questions • What type of group are the mammals? • a) monophyletic, b) paraphyletic, c) polyphyletic Mammal 1 Mammal 2 Mammal 3 Reptiles Amphibians Fish Questions • What type of group are the reptiles? • a) monophyletic, b) paraphyletic, c) polyphyletic Mammals Reptile 1 Reptile 2 Birds Reptile 3 Reptile 4 Amphibians Questions • What type of group are the reptiles? • a) monophyletic, b) paraphyletic, c) polyphyletic Mammals Reptile 1 3 steps vs. 4 steps Reptile 2 Birds Reptile 3 Reptile 4 Amphibians Questions • What type of group are the slugs? • a) monophyletic, b) paraphyletic, c) polyphyletic Snail 1 Slug 1 Snail 2 Snail 3 Snail 4 Slug 2 Snail 6 Slug 3 Snail 7 Snail 8 Questions • What type of group are the slugs? • a) monophyletic, b) paraphyletic, c) polyphyletic Snail 1 Slug 1 Snail 2 3 steps vs. 5 steps Snail 3 Snail 4 Slug 2 Snail 6 Slug 3 Snail 7 Snail 8 Questions • What type of group are the Monosiga? • a) monophyletic, b) paraphyletic, c) polyphyletic Salpingoeca 1 Salpingoeca 2 Salpingoeca 3 Salpingoeca 4 Codosiga Choanoeca Monosiga 1 Monosiga 2 Desmarella Salpingoeca 5 Salpingoeca 6 Salpingoeca 7 Salpingoeca 8 Salpingoeca 9 Adapted from Nitsche et al. (2011) JEM 58:452-462 Questions • What type of group are the Monosiga? • a) monophyletic, b) paraphyletic, c) polyphyletic Salpingoeca 1 Salpingoeca 2 Salpingoeca 3 2 steps vs. 7 steps Salpingoeca 4 Codosiga Choanoeca Monosiga 1 Monosiga 2 Desmarella Salpingoeca 5 Salpingoeca 6 Salpingoeca 7 Salpingoeca 8 Salpingoeca 9 Adapted from Nitsche et al. (2011) JEM 58:452-462 Questions • What type of group are the Salpingoeca? • a) monophyletic, b) paraphyletic, c) polyphyletic Salpingoeca 1 Salpingoeca 2 Salpingoeca 3 Salpingoeca 4 Codosiga Choanoeca Monosiga 1 Monosiga 2 Desmarella Salpingoeca 5 Salpingoeca 6 Salpingoeca 7 Salpingoeca 8 Salpingoeca 9 Adapted from Nitsche et al. (2011) JEM 58:452-462 Questions • What type of group are the Salpingoeca? • a) monophyletic, b) paraphyletic, c) polyphyletic Salpingoeca 1 Salpingoeca 2 Unknown: 4 steps vs. 4 steps Salpingoeca 3 Salpingoeca 4 Codosiga Choanoeca Monosiga 1 Monosiga 2 Desmarella Salpingoeca 5 Salpingoeca 6 Salpingoeca 7 Salpingoeca 8 Salpingoeca 9 Adapted from Nitsche et al. (2011) JEM 58:452-462 Homologues can be Orthologues or Paralogues - X and X’ are paralogues, i.e. XX’ is a multigene family. - All the X genes are orthologues of each other - All the X’ genes are orthologues of each other. A. • • B. for orthologues, gene trees = species trees It is essential not to mix up orthologues and paralogues for species trees Xenologues (Xenology) Results from Lateral Gene Transfer (LGT) Salmonella E. coli Salmonella Mycoplasma Listeria X Strep E. coli Mycoplasma Listeria Strep Bacillus * very common in bacteria, e.g. pathogenicity island, antibiotic resistance genes important in bacterial evolution -> new metabolic pathways, etc. (e.g. E.coli K12 vs E.coli 0157, ~1.5 mB difference in genome size) Bacillus Two Main Categories of Phylogenetic Methods Distance methods tree based single metric: % difference (distance) between sequences also referred to as “clustering” or “algorithmic” methods take data (matrix of % D), plug into equation, -> tree, one solution only fast, easy, reasonably accurate, good enough for many things methods: (UPGMA), neighbour-joining Discrete data (tree searching) methods each column in alignment = discrete data point i.e., hypothesis for each column of alignment look for tree that best fits this collection of hypotheses for >8 OTUs = a lot of possible trees much more detail, precision..., much slower methods: parsimony, maximum likelihood, bayesian inference Newick Format: Written Tree Description common interchange format, read by most tree drawing programs also called “New Hampshire format” describes a tree using set annotation all parenthesis must be balanced all taxa and groups are separated by commas no spaces ended by semi-colon ((( ( A , B ) , C) , ( D , E )) , F); A B C D E F Very useful for producing representative phylogenies from different studies Questions (((Chimp,Bonobo),Human),Gorila); Questions ( ( (Human,Chimp) , (Lion,Tiger)),(Kangaroo,Wallaby)); Questions (((Human,Chimp),(Lion,Tiger)),(Kangaroo, Wallaby)); Questions ( ((((Drosophila,Mosquito) ,Ladybird),(Dragonfly,Damselfly)),Crab), (Spider,Scorpion)); Questions ((mauritiana,sechellia,simulans) ,melanogaster); Questions ((((Stramenopiles,Alveolata,Rhizaria),Archaeplastida),(Opisthokonta,Amoebozoa) ) ,Excavata); Summary • A phylogenetic tree provides a graphical representation of evolutionary relationships between genes, proteins or species • The root represents the oldest part of the tree from which all descendant sequences are derived • An outgroup, of closely related sequences, is required in order to accurately place the root • A monophyletic group, or clade, is a pure group which contains the ancestral node and all descendant sequences • Orthologues are homologous genes derived by speciation events • Paralogues are homologous genes derived by gene duplication events within a single genome • Molecular phylogenies are created using either distance methods or discrete data methods • The Newick format allows phylogenies to be written out in a single line of text