Molecular Phylogenies, Genomics and the Microbial Species Concept www.ai.mit.edu/.../ ce/microbial-engineering.html Peg Riley University of Massachusetts Amherst Biological Diversity From a morphological perspective Where does your organism belong? Biological Diversity From a molecular perspective 16S rRNA Now where does your organism belong? Biological Diversity Molecular phylogenies fundamentally changed our views of biological diversity Molecular Phylogenies Reveal We live on a PLANET of MICROBES Microbes comprise by far the greatest amount of biological diversity Morphology works well for inferring evolutionary relationships among non-microbial eukaryotes, but molecules open our eyes to a wealth of formerly hidden biological (microbial) diversity Molecular Phylogenies Also Reveal Species A Species B Horizontal transfer Recombination an unexpected and relatively high level of gene flow Great Moments in Evolution: Photosynthesis Evolves Vertical versus Horizontal Transfer Anaerobic Photosynthesis Oxygen - Based Photosynthesis Cyano early divergence results in first biological structure - Stromatolites Rock! Transfer can happen, BUT is there frequent gene transfer between domains? TPI ‘You Are What You Eat’! Frequent gene transfer proposed from Bacteria to the Eukaryotes that eat them…. Doolittle, 1998 Gene transfer: made possible by frequent, relatively kinky, bacterial sex Conjugation QuickTime™ and a Photo - JPEG decompressor are needed to see this picture. (or horizontal transfer) Transduction Transformation QuickTime™ and a GIF decompressor are needed to see this picture. “Sex with dead things is better than no sex at all. “ A B A So mechanisms for horizontal transfer exist BUT A B B …are such events common enough to limit divergence between lineages? Let’s focus on specific lineages A B Does h.t. result in a cloud of diversity, due to frequent exchange among distinct lineages? Gene Transfer Versus Retention • Mechanisms for gene transfer exist – Transfer happens all the time to all genes • Successful gene transfer is relatively rare – Just because a transfer events occurs does not mean it will survive in its new genome – Success depends upon the donor, the recipient, the environment, perhaps a phage, or plasmid, etc. Most horizontal transfer events are lost due to drift Probability of fixation = 1/N 1.0 Initial frequency = 1/N 0.0 Time If your population size is 1010, then the probability of fixation is 1/1010, or 0.0000000001 Successful Horizontal Transfer in Bacteria • Transfer occurs for all genes, it is just more likely to be retained www.cbs.dtu.dk/.../ roanoke/genetics98 0316.htm when selection is strong • That is why genes observed to have transferred are often involved in local adaptation – antibiotic resistance, heavy metal tolerance, virulence determinants Successful Horizontal Transfer in Bacteria • Between close relatives? – Frequently occurs due to shared plasmids, phages, recognition signals, appropriate gene regulation systems, etc. • Between distant relatives? – Less clear how often such transfer is successful • Antibiotic resistance genes, although? • Photosynthetic systems, endosymbiosis… • Genes involved in cytosolic metabolism ? Does the Universal Tree of Life Really Look Like This? Is There Stability Or Flux In Evolutionary Lineages? Low rate High rate Stability Flux Successful gene transfer rate Is successful transfer frequent enough to obliterate evolutionary lineages? Genome Comparisons Suggest Flux At First Blush Linear diagram comparing the six complete E. coli and S. flexneri genomes using a software tool called Mauve (Glasner and Perna, 2004) K12 and 0157H7 are 98.5% identical, BUT - punctuated by hundreds of islands of unique sequence Bacterial Phenotype Space Discreet phenotypes C. freundi E. coli S. marcescens Continuous phenotypes C. freundi E. coli S. marcescens Enteric Phenotype Space Hafnia alvei Citrobacter freundii Salmonella typhi Klebsiella oxytoca K. pneumoniae Escherichia coli Phenotype 1 Bacillus Phenotype Space B. pumilus B. amyloliquefaciens B. licheniformis B. subtilis From Shute et al., 1985 Mapping Phenotype To Genotype E. coli Phenotypic Characters S. enterica C. freundii Phenotypic Characters Genotypic Character Distribution Bacterial Taxonomy Gold Standard Polyphasic Approach – Requires a phenotypic component • Restricts taxonomy to the < 1% we can culture • 1930’s Bergey’s Manual of Determinative Bacteriology – exclusive, diagnostic traits required – Requires a genetic component • 16S rRNA sequence to place taxa • Measure of overall DNA similarity Phylo-Phenetic Approach To Bacterial Taxonomy • Collect adequate sample of strains & use them all • Determine closest relative with 16S rRNA • Characterize the phenotype – The more exhaustive, the better – Do not spare time or effort • Follow nomenclature rules – Avoid using words that are hard to pronounce if you do not wish to annoy your colleagues (Rossello-Mora & Amann, 2001) Phylo-Phenetic Species Concept •A monophyletic and genomically coherent cluster of individual organisms that show a high degree of overall similarity with respect to many independent characteristics, and is diagnosable by a discriminative phenotypic property. —Genomic similarity- >~70% DNA-DNA similarity —Phenotype description should be exhaustive —Monophyletic- 16S rRNA sequence analysis —It is “Theory-lite” Rossello-Mora and Amann, 2001 Two Facts We May Be Able To Agree Upon 1. Bacteria cluster in phenotype space 2. Bacteria successfully transfer some fraction of their genomes via horizontal transfer • What fraction of the genome underlies the phenotype clustering? – Is there a core set of genes that defines a bacterial lineage? • • Genes that rarely transfer Genes required for survival of the lineage The Hummer Analogy Basic (core) Hummer Niche adapted Hummers Core Genome Proposal • Core genes comprise the species “shared, core genome” – Rarely transfer and thus diverge between close relatives – Might include essential housekeeping gene – Present in frequencies of >95% of isolates TIME recent GENE SIMILARITY different ancient identical Species A Species B Ancestral Species Lan and Reeves, 2001 Core Genome Proposal • Auxiliary genes are that set of genes that serve to adapt isolates to local niches – Auxiliary genes frequently transfer and therefore do not diverge between close relatives – Includes resistance, tolerance, pathogenicity genes, etc. TIME recent GENE SIMILARITY very similar ancient identical Species A Species B Ancestral Species Lan and Reeves, 2001 Evolving a Barrier to Recombination • “Core” genes diverge as lineages evolve – Nucleotide diversity for core genes is lower within than between taxa – Suggests a genetic mechanism that can maintain lineage stability TIME - Divergence limits recombination Species A Species B recent ancient Ancestral Species GENE SIMILARITY different identical Assessing The Existence Of A Core Genome • Need a group of taxa that are closely related enough to avoid multiple substitution issues and alignment issues • Need multiple isolates per species and multiple species • Need to examine isolates that coexist in time and space such that recombination could occur Gordon Australian Enteric Collection Strain designation CF1 CF2 CF3 CF4 CF5 EB1 EB2 EB3 EB4 EB5 EC1 EC2 EC3 EC4 EC5 EC6 HA1 HA2 HA3 HA4 HA5 KO1 KO2 KO3 KO4 KO5 Collection # M250 M289 M141 M140 M255 M338 M50 M99 M90 M322 TA157 TA234 TA479 TA57 TA79 TA184 M163 M690 M230 M261 M259 M151 M328 M192 M499 M712 Species Citrobacter freundii Citrobacter freundii Citrobacter freundii Citrobacter freundii Citrobacter freundii Enterobacter cloacae Enterobacter cloacae Enterobacter cloacae Enterobacter cloacae Enterobacter cloacae Escherichia coli Escherichia coli Escherichia coli Escherichia coli Escherichia coli Escherichia coli Hafnia alvei Hafnia alvei Hafnia alvei Hafnia alvei Hafnia alvei Klebsiella oxytoca Klebsiella oxytoca Klebsiella oxytoca Klebsiella oxytoca Klebsiella oxytoca Source organism Isoodon macrourus Perameles nasuta Antechinus flavipes Antechinus flavipes Isoodon macrourus Mus musculus Mus musculus Mus musculus Mus musculus Mus musculus Trichosurus vulpecula Mus musculus Mus musculus Macropus giganteus Bettongia penicillata Trichosurus caninus Phascogale tapoatafa Homo sapiens Antechinus bellus Dasyurus hallucatus Dasyurus hallucatus Dasycercus cristicauda Trichosurus vulpecula Zyzomys argurus Vespadelus vulturnus Chalinolobus gouldii State NT NSW SA SA NT VIC VIC VIC VIC VIC NT ACT WA NSW WA WA NT VIC NT NT TAS NT NSW NSW Gordon et. al. 2001 Assessing The Existence Of A Core Genome 1. Choose potential “core” genes: • Essential for the survival of the cell • Not closely linked - avoid co-trandusction • Not physiologically linked - avoid co-evolution 2. What is “core” for one species may not be “core” for another Target Core Genes gapA groEL gyrA ompA Glyceraldehyde-3-phosphate dehydrogenase map position 40.11 gene length 996 bp sequence length 832 bp PIs 194 GroEL protein gene length 1647 bp map position 94.17 sequence length 1146 bp PIs 245 DNA gyrase subunit A map position 50.33 gene length 2628 bp sequence length 660 bp PIs 226 Outer membrance protein A gene length 1041 bp sequence length 526 bp map position 21.95 PIs 219 pgi Glucose-6-phosphate isomerase gene length 1650 bp sequence length 670 bp map position 91.21 PIs 210 16s 16S rRNA gene length 1541 bp map position several PIs 30 sequence length 291 bp Gene Tree Inference • Phylogenetic trees inferred with maximum likelihood methods (PAUP4.0b8) • MODELTEST used to generate optimum parameters for heuristic algorithm used for building ML trees in PAUP • Statistical support for branching patterns of gene trees assessed in two ways – Bootstrapping ML trees, 500 replicates – Mr. Bayes - 50,000 trees, majority rule consensus Core Gene Trees gapA groEL EB2 EB5 EB3 CF1 EC3 EC4 EC5 EB1 CF5 ECMG 99 96 EC6 KO4 KO3 KO2 KO1 KO5 KO2 KO1 CF3 54 58 CF2 KP3 57 KP5 KP1 100 EB1 54 EB4 HA5 HA1 HA3 68 KO5 KP2 88 87 KP6 96 97 79 95 100 KP1 97 SP1 SP2 SM1 KO4 CF3 61 EB5 84 KP6 100 99 59 83 KO3 SM1 SP1 SP3 67 KP2 EB4 86 EC2 EC1 KP4 66 CF2 CF4 98 54 71 94 97 74 SP3 HA1 93 HA3 80 99 KP4 92 57 CF4 61 CF1 CF5 HA2 EC6 HA5 EC2 58 62 ECMG 63 EC3 HA2 EC4 EC5 EC1 Enteric Core Gene Trees Summary Multiple isolates from each taxa always cluster together E. coli CF1 EC3 EC4 EC5 CF5 CF2 CF4 86 EC2 58 EC1 ECMG CF3 54 99 96 EC6 KO4 KO3 KO2 KO1 KO5 SM1 SP1 SP3 67 KP6 100 99 KP2 95 88 KP3 61 Suggests something maintains the stability of those taxa EB5 87 100 57 KP5 KP1 100 KP4 EB1 54 EB4 HA1 HA3 HA2 H. alvei HA5 gapA Enteric Core Gene Trees Summary • Within a species – Isolates cluster together in the composite tree • Between species – The branching patterns follow those suggested from phenotypic data • Practical take home message – A relatively few housekeeping genes provides a composite view of enteric phylogenetic relationships • Don’t need an entire genome • Serves as a proxy for phenotype Evolving a Barrier to Core Gene Recombination Between Taxa • Core genes have diverged significantly between these taxa – The levels of nucleotide diversity for core genes within these taxa are much lower than the levels of divergence between taxa • This pattern of divergence suggests a genetic mechanism that can maintain lineage stability – Core genes diverge as lineages evolve – Divergence prohibits homologous recombination Genomic Comparisons E. coli Salmonell species diverge recombine Although horizontal transfer of genetic information CAN bring lineages (species) together, in the enterics it has had little to no effect Core Genome Hypothesis • Provides a theoretical underpinning to the PhyloPhenetic approach to bacterial classification • So far, supports taxonomic distinctions based upon phenotype data – Does not require phenotype or culturing(!) – But may reveal genes that help in culturing efforts • Provides a simple molecular assay of bacterial species relationships Core Versus Auxiliary Genes • Core genes should accumulate substitutions between species based upon how long the species have been diverging • Auxiliary genes are passed back and forth and should be more similar, on average, than core genes Antibiotic Resistance “Core’ Gene? • bla OXY — Chromosomally encoded — Found only in isolates of K. oxytoca — Found in all K. oxytoca isolates tested in the AEC — Nucleotide diversity is higher that that found in housekeeping genes (0.200 vs. 0.002*) — Behaving like a core gene *Nucleotide diversity at synonymous sites Antibiotic Resistance “Auxiliary” Gene? • bla TEM — Plasmid encoded — Found In 31 of 73 AEC isolates tested — Found in at least one of each taxon examined — Only 2 alleles which differ at 2 nucleotide sites. — Nucleotide diversity is much lower than that of houskeeping genes (0.000 vs. 0.055*) — Behaving like an auxiliary gene *Nucleotide diversity at synonymous sites Methyl Red Indole Production Citrate (Simmons) Lysine Decarboxylase Urea Hydrolysis Esculin Hydrolysis DNase Polar Flagella CF - + - + - d - - + [+] EC - + + - + - d - + - EB + - + - d d - + - KO + [-] + + + + + - - - KP + [-] - + + + + - - - d - - [+] + + - - - - Symbols 0-10% [-] 11-25% d 26-75% [+] 76-89% + 90-100% positive positive positive positive positive - SP [+] + - HA [+] d - + - - + H2S Production Voges-Proskauer What Is A Core Gene For One Taxa May Be An Auxiliary Gene For Another Biological Species Concept Groups of actually or potentially interbreeding natural populations, which are reproductively isolated from other such groups (Mayr, 1942) BSC Applied to Bacteria Microbial Biological Species Concept Groups of strains that exchange, or could exchange, core genome information but that are restricted from exchange with other such groups • Allows for exchange of auxiliary genes • Predicts that core genes will show higher levels of recombination within a species than between species • Predicts that core genes will diverge more rapidly than auxiliary genes between species Conclusions 1. Bacteria cluster in phenotype space 2. There is corresponding genotypic clustering of “core” genes — At least in one sample of enteric bacteria — This is not the case in “auxiliary” genes 3. These patterns argue for a biological species concept for bacteria and the existence of coevolved genomes that survive through evolutionary time — Requires population as well as genomic divergence data 4. The question is not “does lateral transfer occur?” but rather “does its occurrence obliterate coevolved genomes?” The Future of the Microbial Species Concept A Rocky, Rocky, Road ahead! Why? 1. Requires population genetic thinking - Gene frequencies, not presence/absence 2. A species to one person may be a clinical isolate to another (E. coli vs Shigella) 3. Species are not static entities 4. Newly created Comparative Genome Analysis Consortium - DOE based Acknowledgements The Work Riley Lab Carla Goldstone John Wertz Cynthia Hunt Caroline Obert Lisa Nigro Ben Kirkup Emily Curd Osnat Gillor Milind Chavan Mike Vain Michelle Lizzote The Funding Collaborators David Gordon, ANU Rob Dorit, Smith Carl Bergstron, UW Ben Kerr, UW Rich Lenski, MSU NIH NSF Rockefeller Foundation Culpepper Foundation Yale University UMass Amherst The Microbial Planet 16S rRNA