Terminology of Phylogenetic Trees Dan Graur 1 •Evolutionary relationships are usually illustrated by means of a phylogenetic tree (dendogram). •The “tree metaphor” cannot always be used. 2 Ernst Heinrich Haeckel 1834-1919 3 Jean-Baptiste [Pierre Antoine de Monet, Chevalier de] Lamarck. 1809 4 Charles Darwin July 1837 July 2007 5 Charles Darwin November 1859 6 The terminology of phylogenetics is discombobulated. 7 Graduate Student Assignments Instead of a stream of emails that will yield unsatisfactory results, kindly set up appointments and let’s talk. 8 In mathematics, a graph is an abstract representation of a set of objects called nodes (or vertices), some of which are connected to one another by links called branches (or edges). A path in a graph is a sequence of branches that connect any two nodes. 9 Graphs = Trees + Non-Tree Graphs (or Networks) In a tree (b), any two nodes are connected by a single path. In a network (a), there may be multiple pathways connecting two nodes. 10 The evolutionary relationships among a group of organisms are illustrated by means of phylogenetic trees (or dendrograms). 11 Internal External or Peripheral Branch 12 The branching pattern of a tree is called its topology. Three different styles of trees, one topology. 13 One topology 14 16 Terminal node = Operational taxonomic unit (OTU) Internal node = Hypothetical taxonomic unit (HTU) Peripheral ( or terminal) branch = relationship between OTU and HTU Internal branch = relationship between two HTUs 17 Bifurcating and multifurcating trees A node is bifurcating (or binary or dichotomous) if it has only two immediate descendant lineages, but multifurcating (or polytomous) if it has three or more than two immediate descendant lineages. In a strictly bifurcating tree, each internal node is incident to exactly three branches, two derived and one ancestral. 18 A bifurcation is always interpreted as a speciation event Two possible interpretations for a multifurcation (polytomy) in a tree: 1. The polytomy represents the true sequence of events (hard polytomy), whereby an ancestral taxon gave rise to three or more descendant taxa simultaneously. 2. The polytomy represents a lack of resolution. The exact order of two or more bifurcations cannot be determined unambiguously with the available data (soft polytomy). 19 Rooted and unrooted trees In a rooted tree there exists a particular node, called the root, from which a unique path leads to any other node. The direction of each path corresponds to evolutionary time, and the root is the common ancestor of all the taxonomic units 20 under study. In an unrooted tree with four external nodes, the internal branch is referred to as the central branch. 21 How many unrooted topologies are here? d a 1 d b 2 b a e c e c c e e a a b 3 4 b d d c 22 • In an unrooted phylogenetic tree you cannot immediately assess evolutionary relationships. • In a rooted phylogenetic tree, evolutionary relationships are evident. 23 Phoronida (horseshoe worms) Brachiopoda (lampshells) Arthropoda (arthropods) Vertebrata (vertebrates) Which of the following taxa are evolutionarily the closest to 24 Rick Perry? (a) Phoronida, (b) Brachiopoda, (c) Arthropoda, (d) all three taxa are equidistant from Perry, or (e) two taxa are closer to Perry than the third taxon. Cladograms & Phylograms (collectively Dendograms) Bacterium 1 Bacterium 2 Bacterium 3 Eukaryote 1 Eukaryote 2 Cladograms show branching order branch lengths are meaningless Eukaryote 3 Eukaryote 4 Bacterium 1 Bacterium 2 Bacterium 3 Eukaryote 1 Phylograms show branch order and branch lengths Eukaryote 2 Eukaryote 3 Eukaryote 4 25 Unscaled phylogram Scaled phylogram The branch length is number of changes (e.g., nucleotide substitutions) that have occurred along a branch. The total number of changes in a particular tree is called the tree length. 27 Tree balance Tree balance is a measure of the degree of symmetry of a rooted phylogenetic tree. It serves as an indication of the pattern of speciation events in the group of taxa under study. Balanced tree Unbalanced or Pectinate (comb-like) tree 28 Tree balance Balanced tree Unbalanced or Pectinate (comb-like) tree In an unbalanced tree, only one descendant of a node continues to speciate after a splitting event. In a balanced tree, all descendants of a node participate equally in cladogenesis. 29 Tree balance Balanced tree Unbalanced or Pectinate (comb-like) tree Tree balance is an important indicator of the ease of phylogenetic reconstruction. Because, by definition, unbalanced trees contain long branches, they are more difficult to reconstruct phylogenetically than balanced trees. In fact, unbalanced and balanced tree are sometimes referred to as “good” and “bad” trees, respectively (Sackin 1972). 30 How to describe a phylogenetic tree in computerese? 31 The Newick format In computer programs, trees are represented in a linear form by a string of nested parentheses, enclosing taxon names (and possibly also branch lengths and bootstrap values), and separated by commas. This type of representation is called the Newick format. The originator of this format in mathematics was Arthur Cayley (1821–1895). 32 The Newick format The Newick format for phylogenetic trees was adopted on June 26, 1986 at an informal meeting at Newick's Lobster House in Dover, New Hampshire. The Newick format currently serves as the de facto standard for representing phylogenetic tree and is employed by almost all phylogenetic software tools. Unfortunately, it has never been described in a formal publication; the first time it is mentioned in a publication is in 1992. 33 The Newick format In the Newick format, the pattern of the parentheses indicates the topology of the tree by having each pair of parentheses enclose all members of a monophyletic group. A phylogenetic tree in the Newick format always ends in a semicolon (;). ; 34 The Newick format One can use the Newick format to write down rooted trees, unrooted trees, multifurcations, branch lengths, and bootstrap values. 35 36 3 OTUs 1 unrooted tree = 3 rooted trees 37 4 OTUs 3 unrooted trees = 15 rooted trees 38 The number of possible bifurcating rooted trees (NR) for n 2 OTUs (2n 3)! N = R n 2 2 (n 2)! (2n 5)! NU = n 3 2 (n 3)! The number of possible bifurcating unrooted trees (NU) for n 3 OTUs 39 Number of OTUs Number of possible rooted tree 2 1 3 3 4 15 5 105 6 954 7 10,395 8 135,135 9 2,027,025 10 34,459,425 15 213,458,046,676,875 20 8,200,794,532,637,891,559,375 40 Evolution is an historical process. Only one historical narrative is true. From 8,200,794,532,637,891,559,375 possibilities, 1 possibility is true and 8,200,794,532,637,891,559,374 are false. Truth is one, falsehoods are many. 41 How do we know which of the 8,200,794,532,637,891,559,375 trees is true? 42 We don’t, we infer by using decision criteria. 43 True and inferred trees The sequence of speciation events that has led to the formation of a group of OTUs is historically unique. A tree representing the true evolutionary history is called the true tree. A tree that is obtained by using a certain set of data and a certain method of tree reconstruction is called an inferred tree. An inferred tree may or may not be the true tree. 44 ancestor descendant 1 Cladogenesis = the splitting of an evolutionary lineage into two genetically independent lineages. descendant 2 45 Anagenesis = changes occurring along an evolutionary lineage. descendant 46 In molecular phylogenetics, we assume that species are only created by cladogenesis. Species Trees & Gene Trees 48 At every locus, if we trace back the history of any two alleles from any two populations, we will eventually reach a common ancestral allele from which both contemporary alleles have been derived. 49 The routes of inheritance represent the passage of genes from parents to offspring, and the branching pattern depicts a gene tree. 50 Different genes, however, may have different evolutionary histories, i.e., different routes of inheritance, different gene trees. 51 The routes of inheritance are mostly confined by reproductive barriers—that is, gene flow occurs only within the species. A species is therefore like a bundle of genetic connections, in which many entangled parent-offspring lines form the ties that bundle individuals together into a species lineage. 52 A gene tree may differ from a species tree S = Divergence time for species 1 and 2 53 A gene tree may differ from a species tree G1 = Inferred divergence time by using alleles a and f S = Divergence time for species 1 and 2 54 A gene tree may differ from a species tree Alleles d and b are closer to each other than alleles d and f. 55 Incomplete lineage sorting due to polymorphism at speciation time 56 Gene trees and species trees Gene tree a A b B c D Species tree It is often assumed that gene trees always equal species trees. This may be not be true. 57 Taxon (singular); Taxa (plural) A taxon is a species or a group of species that has been given a name, e.g., Homo sapiens (modern humans) or Lepidoptera (butterflies). There are codes of biological nomenclature which seek to ensure that every taxon has a single and stable name, and that every name is used for only one taxon. 58 Clades* • Strictly: A clade is a group of all the taxa that have been derived from a common ancestor plus the common ancestor itself. • In molecular phylogenetics: A clade is a group of taxa under study that share a common ancestor, which is not shared by any other species outside the group. 59 *also: monophyletic groups, natural clades Paraphyletic Taxa • A taxon whose common ancestor is shared by any other taxon is called a paraphyletic taxon or an invalid taxon. Reptiles are paraphyletic. 60 • A named taxon that lacks phylogenetic validity, but is nonetheless used, is called a convenience taxon. “a convenience fish” 61 Sister Taxa • If a clade is composed of two taxa, these are referred to as sister taxa. Birds and crocodiles are sister taxa. 62 Which of the following groups are not monophyletic? E. coli rat mouse baboon chimp human a. human, chimpanzee, baboon b. mouse, chimpanzee, baboon c. rat, mouse d. human, chimpanzee, baboon, rat, mouse e. E. coli, human, chimpanzee, baboon, rat, mouse 63 Which of the following groups are not monophyletic? E. coli rat mouse baboon chimp human a. human, chimpanzee, baboon b. mouse, chimpanzee, baboon c. rat, mouse d. human, chimpanzee, baboon, rat, mouse e. E. coli, human, chimpanzee, baboon, rat, mouse 64 Two or more sequences are said to be homologous if they are related by descent. Homology is often ascertained on the basis of sequence similarity. Thus, if two or more sequences exhibit high degrees of similarity, it is likely (but not always the case) that they are homologous. Sequence similarity may also arise without common ancestry: by chance, or due to convergence driven by similar selective pressures. Such sequences, which are similar but not homologous, are said to be analogous. 65 Homology is a qualitative statement. Similarity is a quantitative and, hence, quantifiable statement (e.g., percent similarity, percent identity). Similarity is a fact. Homology is a hypothesis. Of course, as with any other scientific hypothesis, homology between two sequences may be tested and every so often rejected. 66 Types of homology •Orthology: Similarity due to speciation. •Paralogy: Similarity due to gene duplication. •Ohnology: A special case of paralogy in which similarity is due to genome duplication. •Xenology: Similarity due to horizontal gene transfer. 67 Orthologs and Paralogs paralogous orthologous a b c orthologous C B A Duplication yields 2 copies (paralogs) on the same genome Ancestral gene 68 Orthologs and Paralogs Only b, C, and A are sampled a b* c C B A b C A A mixture of orthologs and paralogs is sampled 69 70 71 A character provides information about an individual OTU. A distance represents a quantitative statement concerning the dissimilarity between two OTUs. 72 A character is a well-defined feature that in a taxonomic unit can assume one out of two or more mutually exclusive character states. Mutually exclusive: If David is tall, David cannot be short. 73 74 75 Character Continuous Discrete Multistate Ordered Polar Binary Unordered Unpolar Polar Unpolar 76 A character is unordered if a change from one character state to any other character state can occur in one step. 77 A character is ordered if there exists a symmetrical path of change from one character state to another. 78 A character is polar if there exists an asymmetrical (irreversible) path of change from one character state to another. Polar 79 The number of steps between two character states is specified by a step matrix. 80 Assumptions about character evolution Methods of phylogenetic reconstruction require that we make explicit assumptions about: (1) the number of discrete steps required for one character state to change into another. (2) the probability with which such a change may occur. 81 Temporal Polarity of Character States Character states may be ranked by relative antiquity into: (1) primitive or ancestral (plesiomorphy) (2) derived or novel (apomorphy) 82 Taxonomic Distribution of Character States A primitive state that is shared by several taxa is a symplesiomorphy. A derived state that is shared by several taxa is a synapomorphy. symbioisis sympathy synapse synteny A derived character state unique to a particular taxon is an autapomorphy. A character state that is shared by several taxa due to convergence, parallelism and reversals, rather than due to common descent, is a homoplasy. 83 homoplasy apomorphy synapomorphy (autapomorphy) D symplesiomorphy C C B A A B A C A A A plesiomorphy A 84 What is swimming in shark and carp? shark carp guppy chicken rat bat a. symplesiomorphic b. synapomorphic c. autapomorphic d. homoplasic 85 What are scales in guppy and carp? shark carp guppy chicken rat bat a. symplesiomorphic b. synapomorphic c. autapomorphic d. homoplasic 86 What are feathers in chicken? shark carp guppy chicken rat bat a. symplesiomorphic b. synapomorphic c. autapomorphic d. homoplasic 87 What are wings in chicken and bat? shark carp guppy chicken rat bat a. symplesiomorphic b. synapomorphic c. autapomorphic d. homoplasic 88 Distance Data 89 90 Most molecular data yield character states that are subsequently converted into distances. 91 92 93 + 94 Ultrametricity = Strict Molecular Clock