Gene Trees, Populations and the Microbial Species Concept

advertisement
Molecular Phylogenies,
Genomics and the Microbial
Species Concept
www.ai.mit.edu/.../ ce/microbial-engineering.html
Peg Riley
University of Massachusetts Amherst
Biological Diversity
From a morphological perspective
Where does your organism belong?
Biological Diversity
From a molecular perspective
16S rRNA
Now where does your organism belong?
Biological Diversity
Molecular phylogenies fundamentally
changed our views of biological diversity
Molecular
Phylogenies Reveal
We live on a PLANET of MICROBES
Microbes comprise by far the
greatest amount of biological diversity
Morphology works well for inferring evolutionary
relationships among non-microbial eukaryotes,
but molecules open our eyes to a wealth of
formerly hidden biological (microbial) diversity
Molecular Phylogenies Also
Reveal
Species A
Species B
Horizontal
transfer
Recombination
an unexpected and relatively
high level of gene flow
Great Moments in Evolution:
Photosynthesis Evolves
Vertical versus Horizontal Transfer
Anaerobic
Photosynthesis
Oxygen - Based
Photosynthesis
Cyano early divergence results in
first biological structure
- Stromatolites Rock!
Transfer can happen, BUT is
there frequent gene transfer
between domains?
TPI
‘You Are What You Eat’!
Frequent gene transfer proposed
from Bacteria to the Eukaryotes
that eat them….
Doolittle, 1998
Gene transfer:
made possible
by frequent,
relatively kinky,
bacterial sex
Conjugation
QuickTime™ and a
Photo - JPEG decompressor
are needed to see this picture.
(or horizontal transfer)
Transduction
Transformation
QuickTime™ and a
GIF decompressor
are needed to see this picture.
“Sex with dead things is
better than no sex at all. “
A
B
A
So mechanisms for
horizontal transfer
exist
BUT
A
B
B
…are such
events common
enough to limit
divergence
between
lineages?
Let’s focus on specific lineages
A
B
Does h.t. result
in a cloud of
diversity, due to
frequent
exchange
among distinct
lineages?
Gene Transfer Versus Retention
• Mechanisms for gene transfer exist
– Transfer happens all the time to all genes
• Successful gene transfer is relatively rare
– Just because a transfer events occurs does not
mean it will survive in its new genome
– Success depends upon the donor, the recipient,
the environment, perhaps a phage, or plasmid,
etc.
Most horizontal transfer
events are lost due to drift
Probability of fixation = 1/N
1.0
Initial
frequency = 1/N
0.0
Time
If your population size is 1010, then the probability
of fixation is 1/1010, or 0.0000000001
Successful Horizontal
Transfer in Bacteria
• Transfer occurs for all genes, it is just
more likely to be retained
www.cbs.dtu.dk/.../
roanoke/genetics98
0316.htm
when selection is strong
• That is why genes observed to have transferred
are often involved in local adaptation
– antibiotic resistance, heavy metal tolerance,
virulence determinants
Successful Horizontal Transfer
in Bacteria
• Between close relatives?
– Frequently occurs due to shared plasmids,
phages, recognition signals, appropriate gene
regulation systems, etc.
• Between distant relatives?
– Less clear how often such transfer is successful
• Antibiotic resistance genes, although?
• Photosynthetic systems, endosymbiosis…
• Genes involved in cytosolic metabolism ?
Does the Universal Tree of Life
Really Look Like This?
Is There Stability Or Flux In
Evolutionary Lineages?
Low rate
High rate
Stability
Flux
Successful gene transfer rate
Is successful transfer frequent
enough to obliterate evolutionary
lineages?
Genome Comparisons
Suggest Flux At First Blush
Linear diagram comparing the six complete E. coli and S. flexneri genomes using a software
tool called Mauve (Glasner and Perna, 2004)
K12 and 0157H7 are 98.5% identical,
BUT - punctuated by hundreds of islands of unique
sequence
Bacterial Phenotype Space
Discreet
phenotypes
C. freundi
E. coli S. marcescens
Continuous
phenotypes
C. freundi E. coli S. marcescens
Enteric Phenotype Space
Hafnia alvei
Citrobacter freundii
Salmonella typhi
Klebsiella oxytoca
K. pneumoniae
Escherichia coli
Phenotype 1
Bacillus Phenotype Space
B. pumilus
B. amyloliquefaciens
B. licheniformis
B. subtilis
From Shute et al., 1985
Mapping Phenotype To Genotype
E. coli
Phenotypic Characters
S. enterica
C. freundii
Phenotypic Characters
Genotypic Character
Distribution
Bacterial Taxonomy
Gold Standard
Polyphasic Approach
– Requires a phenotypic component
• Restricts taxonomy to the < 1% we can culture
• 1930’s Bergey’s Manual of Determinative
Bacteriology
– exclusive, diagnostic traits required
– Requires a genetic component
• 16S rRNA sequence to place taxa
• Measure of overall DNA similarity
Phylo-Phenetic Approach To
Bacterial Taxonomy
• Collect adequate sample of strains & use them all
• Determine closest relative with 16S rRNA
• Characterize the phenotype
– The more exhaustive, the better
– Do not spare time or effort
• Follow nomenclature rules
– Avoid using words that are hard to pronounce if you do not wish to
annoy your colleagues
(Rossello-Mora & Amann, 2001)
Phylo-Phenetic Species Concept
•A monophyletic and genomically coherent
cluster of individual organisms that show a high
degree of overall similarity with respect to many
independent characteristics, and is diagnosable
by a discriminative phenotypic property.
—Genomic similarity- >~70% DNA-DNA similarity
—Phenotype description should be exhaustive
—Monophyletic- 16S rRNA sequence analysis
—It is “Theory-lite”
Rossello-Mora and Amann, 2001
Two Facts We May Be Able
To Agree Upon
1. Bacteria cluster in phenotype space
2. Bacteria successfully transfer some fraction of
their genomes via horizontal transfer
•
What fraction of the genome underlies the
phenotype clustering?
–
Is there a core set of genes that defines a bacterial
lineage?
•
•
Genes that rarely transfer
Genes required for survival of the lineage
The Hummer Analogy
Basic (core) Hummer
Niche adapted Hummers
Core Genome Proposal
• Core genes comprise the species “shared,
core genome”
– Rarely transfer and thus diverge between close relatives
– Might include essential housekeeping gene
– Present in frequencies of >95% of isolates
TIME
recent
GENE SIMILARITY
different
ancient
identical
Species A
Species B
Ancestral Species
Lan and Reeves, 2001
Core Genome Proposal
• Auxiliary genes are that set of genes that
serve to adapt isolates to local niches
– Auxiliary genes frequently transfer and therefore do not
diverge between close relatives
– Includes resistance, tolerance, pathogenicity genes, etc.
TIME
recent
GENE SIMILARITY
very similar
ancient
identical
Species A
Species B
Ancestral Species
Lan and Reeves, 2001
Evolving a Barrier to
Recombination
• “Core” genes diverge as lineages evolve
– Nucleotide diversity for core genes is lower within than
between taxa
– Suggests a genetic mechanism that can maintain
lineage stability
TIME
- Divergence limits recombination
Species A
Species B
recent
ancient
Ancestral Species
GENE SIMILARITY
different
identical
Assessing The Existence Of A
Core Genome
• Need a group of taxa that are closely related
enough to avoid multiple substitution issues and
alignment issues
• Need multiple isolates per species and multiple
species
• Need to examine isolates that coexist in time and
space such that recombination could occur
Gordon Australian Enteric Collection
Strain
designation
CF1
CF2
CF3
CF4
CF5
EB1
EB2
EB3
EB4
EB5
EC1
EC2
EC3
EC4
EC5
EC6
HA1
HA2
HA3
HA4
HA5
KO1
KO2
KO3
KO4
KO5
Collection
#
M250
M289
M141
M140
M255
M338
M50
M99
M90
M322
TA157
TA234
TA479
TA57
TA79
TA184
M163
M690
M230
M261
M259
M151
M328
M192
M499
M712
Species
Citrobacter freundii
Citrobacter freundii
Citrobacter freundii
Citrobacter freundii
Citrobacter freundii
Enterobacter cloacae
Enterobacter cloacae
Enterobacter cloacae
Enterobacter cloacae
Enterobacter cloacae
Escherichia coli
Escherichia coli
Escherichia coli
Escherichia coli
Escherichia coli
Escherichia coli
Hafnia alvei
Hafnia alvei
Hafnia alvei
Hafnia alvei
Hafnia alvei
Klebsiella oxytoca
Klebsiella oxytoca
Klebsiella oxytoca
Klebsiella oxytoca
Klebsiella oxytoca
Source
organism
Isoodon macrourus
Perameles nasuta
Antechinus flavipes
Antechinus flavipes
Isoodon macrourus
Mus musculus
Mus musculus
Mus musculus
Mus musculus
Mus musculus
Trichosurus vulpecula
Mus musculus
Mus musculus
Macropus giganteus
Bettongia penicillata
Trichosurus caninus
Phascogale tapoatafa
Homo sapiens
Antechinus bellus
Dasyurus hallucatus
Dasyurus hallucatus
Dasycercus cristicauda
Trichosurus vulpecula
Zyzomys argurus
Vespadelus vulturnus
Chalinolobus gouldii
State
NT
NSW
SA
SA
NT
VIC
VIC
VIC
VIC
VIC
NT
ACT
WA
NSW
WA
WA
NT
VIC
NT
NT
TAS
NT
NSW
NSW
Gordon et. al. 2001
Assessing The Existence Of A
Core Genome
1. Choose potential “core” genes:
•
Essential for the survival of the cell
•
Not closely linked - avoid co-trandusction
•
Not physiologically linked - avoid co-evolution
2. What is “core” for one species may not
be “core” for another
Target Core Genes
gapA
groEL
gyrA
ompA
Glyceraldehyde-3-phosphate dehydrogenase map position 40.11
gene length 996 bp
sequence length 832 bp
PIs 194
GroEL protein
gene length 1647 bp
map position 94.17
sequence length 1146 bp
PIs 245
DNA gyrase subunit A
map position 50.33
gene length 2628 bp sequence length 660 bp
PIs 226
Outer membrance protein A
gene length 1041 bp sequence length 526 bp
map position 21.95
PIs 219
pgi
Glucose-6-phosphate isomerase
gene length 1650 bp sequence length 670 bp
map position 91.21
PIs 210
16s
16S rRNA
gene length 1541 bp
map position several
PIs 30
sequence length 291 bp
Gene Tree Inference
• Phylogenetic trees inferred with maximum
likelihood methods (PAUP4.0b8)
• MODELTEST used to generate optimum
parameters for heuristic algorithm used for
building ML trees in PAUP
• Statistical support for branching patterns of gene
trees assessed in two ways
– Bootstrapping ML trees, 500 replicates
– Mr. Bayes - 50,000 trees, majority rule consensus
Core Gene Trees
gapA
groEL
EB2
EB5
EB3
CF1
EC3
EC4 EC5
EB1
CF5
ECMG
99
96
EC6
KO4
KO3
KO2
KO1
KO5
KO2
KO1
CF3
54
58
CF2
KP3
57
KP5 KP1
100
EB1
54
EB4
HA5
HA1
HA3
68
KO5
KP2
88
87
KP6
96
97
79
95
100
KP1
97
SP1
SP2
SM1
KO4
CF3
61
EB5
84
KP6
100
99
59
83
KO3
SM1
SP1
SP3
67
KP2
EB4
86
EC2
EC1
KP4
66
CF2 CF4
98
54
71
94
97
74
SP3
HA1
93
HA3
80
99
KP4
92
57
CF4
61
CF1
CF5
HA2
EC6
HA5
EC2
58
62
ECMG
63
EC3
HA2
EC4
EC5
EC1
Enteric Core Gene Trees
Summary
Multiple isolates
from each taxa
always cluster
together
E. coli
CF1
EC3
EC4 EC5
CF5
CF2 CF4
86
EC2
58
EC1
ECMG
CF3
54
99
96
EC6
KO4
KO3
KO2
KO1
KO5
SM1
SP1
SP3
67
KP6
100
99
KP2
95
88
KP3
61
Suggests something
maintains the stability
of those taxa
EB5
87
100
57
KP5 KP1
100
KP4
EB1
54
EB4
HA1
HA3
HA2
H. alvei
HA5
gapA
Enteric Core Gene Trees
Summary
• Within a species
– Isolates cluster together in the composite tree
• Between species
– The branching patterns follow those suggested from
phenotypic data
• Practical take home message
– A relatively few housekeeping genes provides a
composite view of enteric phylogenetic relationships
• Don’t need an entire genome
• Serves as a proxy for phenotype
Evolving a Barrier to Core Gene
Recombination Between Taxa
• Core genes have diverged significantly
between these taxa
– The levels of nucleotide diversity for core genes within
these taxa are much lower than the levels of divergence
between taxa
• This pattern of divergence suggests a
genetic mechanism that can maintain
lineage stability
– Core genes diverge as lineages evolve
– Divergence prohibits homologous recombination
Genomic Comparisons
E. coli
Salmonell species
diverge
recombine
Although horizontal transfer of genetic
information CAN bring lineages (species)
together, in the enterics it has had
little to no effect
Core Genome Hypothesis
• Provides a theoretical underpinning to the PhyloPhenetic approach to bacterial classification
• So far, supports taxonomic distinctions based
upon phenotype data
– Does not require phenotype or culturing(!)
– But may reveal genes that help in culturing efforts
• Provides a simple molecular assay of bacterial
species relationships
Core Versus Auxiliary Genes
• Core genes should accumulate
substitutions between species based
upon how long the species have
been diverging
• Auxiliary genes are passed back and
forth and should be more similar, on
average, than core genes
Antibiotic Resistance
“Core’ Gene?
• bla OXY
— Chromosomally encoded
— Found only in isolates of K. oxytoca
— Found in all K. oxytoca isolates tested in the AEC
— Nucleotide diversity is higher that that found in
housekeeping genes (0.200 vs. 0.002*)
— Behaving like a core gene
*Nucleotide diversity at synonymous sites
Antibiotic Resistance
“Auxiliary” Gene?
• bla TEM
— Plasmid encoded
— Found In 31 of 73 AEC isolates tested
— Found in at least one of each taxon examined
— Only 2 alleles which differ at 2 nucleotide sites.
— Nucleotide diversity is much lower than that of
houskeeping genes (0.000 vs. 0.055*)
— Behaving like an auxiliary gene
*Nucleotide diversity at synonymous sites
Methyl Red
Indole Production
Citrate (Simmons)
Lysine Decarboxylase
Urea Hydrolysis
Esculin Hydrolysis
DNase
Polar Flagella
CF
-
+
-
+
-
d
-
- + [+]
EC
-
+ +
-
+
-
d
- +
-
EB
+
-
+
-
d
d
- +
-
KO
+ [-] +
+ + +
+
-
-
-
KP
+ [-] -
+ + +
+
-
-
-
d -
- [+] + +
-
-
-
-
Symbols
0-10%
[-] 11-25%
d 26-75%
[+] 76-89%
+ 90-100%
positive
positive
positive
positive
positive
-
SP
[+] +
-
HA
[+] d -
+
-
- +
H2S Production
Voges-Proskauer
What Is A Core Gene For One
Taxa May Be An Auxiliary Gene For Another
Biological Species Concept
Groups of actually or potentially
interbreeding natural populations,
which are reproductively
isolated from other
such groups
(Mayr, 1942)
BSC Applied to Bacteria
Microbial Biological Species Concept
Groups of strains that exchange, or could exchange, core
genome information but that are restricted from
exchange with other such groups
• Allows for exchange of auxiliary genes
• Predicts that core genes will show higher levels of
recombination within a species than between species
• Predicts that core genes will diverge more rapidly than
auxiliary genes between species
Conclusions
1.
Bacteria cluster in phenotype space
2. There is corresponding genotypic clustering of “core”
genes
— At least in one sample of enteric bacteria
— This is not the case in “auxiliary” genes
3. These patterns argue for a biological species concept
for bacteria and the existence of coevolved genomes
that survive through evolutionary time
— Requires population as well as genomic divergence data
4. The question is not “does lateral transfer occur?” but
rather “does its occurrence obliterate coevolved
genomes?”
The Future of the Microbial
Species Concept
A Rocky, Rocky, Road ahead!
Why?
1. Requires population genetic thinking
-
Gene frequencies, not presence/absence
2. A species to one person may be a clinical
isolate to another (E. coli vs Shigella)
3. Species are not static entities
4. Newly created Comparative Genome Analysis
Consortium - DOE based
Acknowledgements
The Work
Riley Lab
Carla Goldstone
John Wertz
Cynthia Hunt
Caroline Obert
Lisa Nigro
Ben Kirkup
Emily Curd
Osnat Gillor
Milind Chavan
Mike Vain
Michelle Lizzote
The Funding
Collaborators
David Gordon, ANU
Rob Dorit, Smith
Carl Bergstron, UW
Ben Kerr, UW
Rich Lenski, MSU
NIH
NSF
Rockefeller Foundation
Culpepper Foundation
Yale University
UMass Amherst
The Microbial Planet
16S rRNA
Download