#29 - Phylogenetics 10/31/07 Required Reading BCB 444/544 (before lecture) Mon Oct 29 - Lecture 28 Lecture 29 Promoter & Regulatory Element Prediction • Chp 9 - pp 113 - 126 Wed Oct 30 - Lecture 29 Phylogenetics Phylogenetics Basics • Chp 10 - pp 127 - 141 Thurs Oct 31 - Lab 9 Gene & Regulatory Element Prediction #29_Oct31 Fri Oct 30 - Lecture 29 Phylogenetic Tree Construction Methods & Programs • Chp 11 - pp 142 - 169 BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 1 Assignments & Announcements 10/31/07 2 10/31/07 4 BCB 544 "Team" Projects Mon Oct 29 - HW#5 Last week of classes will be devoted to Projects HW#5 = Hands-on exercises with phylogenetics and tree-building software Due: Mon Nov 5 BCB 444/544 F07 ISU Terribilini #29- Phylogenetics • Written reports due: • Mon Dec 3 (no class that day) (not Fri Nov 1 as previously posted) • Oral presentations (20-30') will be: • Wed-Fri Dec 5,6,7 • 1 or 2 teams will present during each class period ¾ See Guidelines for Projects posted online BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 3 BCB 544 Only: New Homework Assignment Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: 544 Extra#2 Due: BCB 444/544 F07 ISU Terribilini #29- Phylogenetics http://www.bcb.iastate.edu/seminars/index.html √PART 1 - ASAP PART 2 - meeting prior to 5 PM Fri Nov 2 • Nov 1 Thurs - BBMB Seminar 4:10 in 1414 MBB UCLA TBA -something cool about structure and evolution? • Todd Yeates Part 1 - Brief outline of Project, email to Drena & Michael after response/approval, then: • Nov 2 Fri - BCB Faculty Seminar 2:10 in 102 ScI Part 2 - More detailed outline of project • Bob Jernigan BBMB, ISU • Control of Protein Motions by Structure Read a few papers and summarize status of problem Schedule meeting with Drena & Michael to discuss ideas BCB 444/544 F07 ISU Terribilini #29- Phylogenetics BCB 444/544 Fall 07 Dobbs 10/31/07 5 BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 6 1 #29 - Phylogenetics 10/31/07 Chp 10 - Phylogenetics Evolution and Phylogenetics • Evolution – the development of biological form from SECTION IV MOLECULAR PHYLOGENETICS other preexisting forms Xiong: Chp 10 Phylogenetics Basics • • • • • • • Evolution proceeds by natural selection Evolution and Phylogenetics Terminology Gene Phylogeny vs. Species Phylogeny Forms of Tree Representation Why Finding a True Tree is Dificult Procedure of Building a Phylogenetic Tree BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 7 Natural Selection 10/31/07 8 Phylogenetics • Phylogenetics is the study of the evolutionary history of living organisms • Uses tree like diagrams to represent the pedigrees of the organisms • Species can produce more offspring than the environment can support. This leads to competition for resources. Genetic variations exist in a population that give some individuals • Similarities and differences seen in a multiple an advantage, others a disadvantage, leading to sequence alignment are easier to make sense of differential reproductive success. BCB 444/544 F07 ISU Terribilini #29- Phylogenetics BCB 444/544 F07 ISU Terribilini #29- Phylogenetics in a phylogenetic tree 10/31/07 9 Data Used in Phylogenetics BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 10 Molecular Phylogenetics • Fossil records - morphology and timeline of divergence • Molecular phylogenetics is the study of evolutionary relationships of genes and other biological macromolecules by analyzing their sequences • Sequence similarity can be used to infer evolutionary relationships • Limitations - not available for all species in all areas, morphology determined by multiple genetic factors, fossils for microorganisms are especially rare • Molecular data - DNA and protein sequences molecular fossils • Advantages - lots of data, easy to obtain • Limitations - can be difficult to get sequences from extinct species • Physical, behavior, and developmental characteristics can also be used in phylogenetics BCB 444/544 F07 ISU Terribilini #29- Phylogenetics BCB 444/544 Fall 07 Dobbs 10/31/07 11 BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 12 2 #29 - Phylogenetics 10/31/07 Assumptions in Molecular Phylogenetics Terminology Taxa (terminal nodes) • Sequences used are homologous, i.e. share a common ancestor • Phylogenetic divergence is bifurcating, i.e. parent branch splits into two daughter branches • Each position in a sequence evolved independently • Molecular Clock – sequences evolve at constant rates (only used in some methods) BCB 444/544 F07 ISU Terribilini #29- Phylogenetics A 10/31/07 13 F G H Internal node BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 14 C D E F G • Tree topology is the branching pattern in a tree Dichotomy Bifurcation H 10/31/07 15 Rooted vs. Unrooted Trees C D Polytomy Multifurcation BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 16 Rooted vs. Unrooted Trees Rooted Tree B E Tree Topology BCB 444/544 F07 ISU Terribilini #29- Phylogenetics A D Root • Clade = group of taxa descended from a common ancestor • Lineage = branch path depicting ancestordescendant relationship • Paraphyletic group = group of taxa that share more than one closest common ancestor B C Branch Terminology A B • Unrooted trees have no root node – do not assume knowledge of a common ancestor, just relationships • Can convert between unrooted and rooted, but first need to determine where the root is • Two ways to define the root: Unrooted Tree C A • Use an outgroup • Midpoint rooting – midpoint of the two most divergent groups is assigned to be the root B BCB 444/544 F07 ISU Terribilini #29- Phylogenetics BCB 444/544 Fall 07 Dobbs D 10/31/07 17 BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 18 3 #29 - Phylogenetics 10/31/07 Outgroups Gene Phylogeny vs. Species Phylogeny • Outgroup is a sequence related to the sequences being studied, but is more distantly related • Must be distinct from the ingroup, but not too distant • If outgroup is too distantly related, it can lead to errors in tree construction • Trick is to find the closest related sequence that is removed from the ingroup BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 • When using molecular data, we are technically building a phylogeny for just that sequence, not for the species from which the sequences came • Species evolution is the result of mutations in the entire genome • Your gene may have evolved differently than other genes in the genome • To obtain a species phylogeny, we need to use a variety of gene families to construct the tree 19 Forms of Tree Representation BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 20 Forms of Tree Representation • Newick format – text format for use by computer programs • Example: (((B,C),A),(D,E)) • Can also have branch lengths Phylogram Branch lengths represent amount of evolutionary divergence Cladogram Branch lengths are meaningless, only topology matters BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 21 Consensus Trees BCB 444/544 Fall 07 Dobbs 10/31/07 22 Why Finding a True Tree is Difficult Number of rooted trees Multiple trees that are equally optimal – build consensus tree by collapsing disagreements into a single node BCB 444/544 F07 ISU Terribilini #29- Phylogenetics BCB 444/544 F07 ISU Terribilini #29- Phylogenetics • The number of possible trees grows exponentially with the number of species (or sequences) • Nr = (2n -3)!/2(n-2)(n-2)! • Nu = (2n -5)!/2(n-3)(n-3)! • To find the best tree, you must explore all possibilities (or must you?) 10/31/07 23 BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 24 4 #29 - Phylogenetics 10/31/07 Tree Building Procedure Choice of Molecular Markers • Choose molecular markers • Perform MSA • Choose a model of evolution • Determine tree building method • Assess tree reliability BCB 444/544 F07 ISU Terribilini #29- Phylogenetics • Very closely related organisms - nucleic acid sequence will show more differences • For individuals within a species - faster mutation rate is in noncoding regions of mtDNA • More distantly related species - slowly evolving nucleic acid sequences like ribosomal RNA or protein sequences • Very distantly related species - use highly conserved protein sequences 10/31/07 25 Advantages of Protein Sequences 10/31/07 26 Advantages of DNA Sequences • Better for closely related species • Show synonymous and non-synonymous mutations, which allows analysis of positive and negative selection events • More highly conserved - mutations in DNA may not change amino acid sequence • Third position in a codon especially can vary - violates our assumption of independent evolution of all positions in a sequence • DNA sequences can be biased by codon usage differences between species - causes variations in sequence that are not attributable to evolution • In alignments, DNA sequences that are not related can show a lot of similarity due to only 4 letters in alphabet, proteins do not have this problem (at least not as much) • Introducing gaps in alignments of DNA sequences can cause frameshift errors, making alignment biologically meaningless BCB 444/544 F07 ISU Terribilini #29- Phylogenetics BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 • Lots of nonsynonymous mutations may mean positive selection for new functions of protein with different amino acid sequence • Lots of synonymous mutations may mean negative selection - changed amino acid sequence is detrimental 27 Multiple Sequence Alignment BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 28 10/31/07 30 Automatic Editing of Alignments • Most critical step in tree building - cannot build correct tree without correct alignment • Should build alignments with multiple programs, then inspect and compare to identify the most reasonable one • Most alignments need manual editing • Rascal and NorMD – correct alignment errors, remove potentially unrelated or highly divergent sequences • Gblocks – detect and eliminate poorly aligned positions and divergent regions • Make sure important functional residues align • Align secondary structure elements • Use full alignment or just parts BCB 444/544 F07 ISU Terribilini #29- Phylogenetics BCB 444/544 Fall 07 Dobbs 10/31/07 29 BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 5