Towards in vivo editing of the human microbiome
by
Stephanie J. Yaung
S.B. Biological Engineering
Massachusetts Institute of Technology, 2010
S.B. Management Science
Massachusetts Institute of Technology, 2010
Submitted to the Harvard-MIT Division of Health Sciences and Technology
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy in Medical Engineering and Medical Physics
at the
Massachusetts Institute of Technology
June 2015
© 2015 Massachusetts Institute of Technology. All rights reserved.
Signature of Author: _____________________________________________________________
Harvard-MIT Division of Health Sciences and Technology
May 8, 2015
Certified by: ___________________________________________________________________
George M. Church, PhD
Professor of Genetics
Harvard Medical School
Thesis Supervisor
Accepted by: ___________________________________________________________________
Emery N. Brown, MD, PhD
Professor of Computational Neuroscience and Health Sciences and Technology
Director, Harvard-MIT Program in Health Sciences and Technology
2
Towards in vivo editing of the human microbiome
by
Stephanie J. Yaung
Submitted to the Harvard-MIT Division of Health Sciences and Technology
on May 8, 2015
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Medical Engineering and Medical Physics
Abstract
The human microbiota consists of 100 trillion microbial cells that naturally inhabit the
body and harbors a rich reservoir of genetic elements collectively called the microbiome. Efforts
based on metagenomic sequencing of microbiomes associated with healthy and diseased
individuals have revealed vast effects of microbiota on human health. However, compared to the
expanding amount of sequence data, little is known about the function of these microbes and
their genes. Furthermore, current clinical approaches to modify the microbiota face several
challenges, including colonization resistance in competitive environments such as the gut, and
imprecise ecological perturbations using antibiotics and fecal transplants.
The fundamental objective of this research is to develop safe methods to genetically edit
the microbiome in vivo to promote human health. The abilities to introduce commensally fit
strains and to control specificity of microbial modulations are critical steps towards ecological
engineering of healthy microbiota. This thesis describes strategies to investigate, propagate, and
ultimately engineer desired functions in microbiota. In particular, we developed a temporal
functional metagenomics method to identify genes that improved microbial fitness in the
mammalian gut in vivo. We also built foundational tools for delivering genetic elements and
immunizing endogenous microbiota against acquiring antibiotic resistance and toxins. In
addition to leveraging bacterial conjugation and the prokaryotic defense system CRISPR-Cas9,
we employed bacteriophages for depleting native strains to empty the niche for an engineered
version. Our work enables applications in engineering probiotic strains with augmented fitness
and anti-pathogenesis properties, tempering host autoimmunity, and combating hospital-acquired
infections and enteric diseases.
Thesis Supervisor: George M. Church, PhD
Title: Professor of Genetics, Harvard Medical School
3
Dedication
To my parents, Fangling Chang and Alan Tsu-I Yaung
To my grandparents, Chin-Wan Chang and Chin-Pin Chiu
To my uncle Fred Fang-Jen Chang, aunt Connie Tze-Mei Chen,
and cousins Angela Chang and Bora Chang
for their endless love, encouragement, and support.
4
Acknowledgements
The first thing I learned in graduate school was that science is done by people. Science
may be the pursuit of knowledge and objective truth, but the process of research and invention is
really a human endeavor. Therefore, I would like to thank the people who made this work
possible, especially those who gave me honest counsel, valuable guidance, and good company
during the ups and downs.
First, I am indebted to my research advisor George Church for his inspiration, kindness,
and insightful advice throughout my time in graduate school. I am grateful for the opportunity to
be a part of the uniquely inventive environment that he has fostered by bringing together diverse
people and resources. I would like to thank Eric Alm and Matt Waldor for serving on my thesis
committee and providing generous support and constructive feedback.
My graduate work would not exist without remarkable collaborators, including Harris
Wang, Kevin Esvelt, Georg Gerber, Lynn Bry, and Matt Waldor. I sincerely thank them for the
exceptional training and stimulating discussions. I am incredibly honored to have had the
opportunity to work closely with Harris and Kevin, who are amazing scientists and thoughtful
mentors. When I first joined the Church lab, Harris let me tag along on his new microbiome
adventures. Kevin also saw potential and recruited me to be a coconspirator in some of his
immense undertakings. They truly helped me launch and advance my graduate research career. I
am also deeply grateful to Georg and Matt, who made time to meet with me and served as
unofficial advisors in several aspects of my thesis research. I was also fortunate to work with
skilled colleagues in these collaborations, including Jonathan Braff, Rose Deng, and Ning Li,
who contributed significantly to portions of this work. I would also like to thank Pooja Jethani, a
MIT UROP student, and Takahiro Yokoi, a visiting graduate student, for their hard work and
contributions. In addition to crucial human capital, I must acknowledge the financial backing for
my work, sponsored by the DOE, NIH, DARPA, and NSF through grants awarded to George,
Harris, Georg, and Kevin, as well as the Wyss Institute. I am also thankful for support from the
NSF Graduate Research Fellowship Program and the MIT Neurometrix Presidential Fellowship.
At a place like Harvard or MIT, having brainpower and willpower can become so
commonplace that perhaps what stands out more is sincere compassion. In addition to my
wonderful scientific collaborators, many others have contributed positively to my time in
graduate school and have shown me great kindness. To past and present Church lab members, I
will treasure the banter and profound and candid conversations we shared. In particular, I would
like to thank Alex Chavez and Jon Scheiman for being like big brothers to me and keeping me
grounded; Noah Davidsohn and Noah Taylor for being easygoing neighbors and a calm force
when times were hectic; Dan Goodman and Vatsan Raman for always being supportive and
caring; and Susan Byrne, Su Vora, Andie Smidler, Michael Napolitano, Alex Garruss, Nikolai
5
Eroshenko, Prashant Mali, Jay Lee, Sri Kosuri, Eric Kelsic, Di Zhang, Mike Chou, John Aach,
Sara Vassallo, Mike Mee, Henry Lee, Marc Lajoie, James DiCarlo, Xavier Rios, Alex Ng, Javier
Fernández Juárez, Reza Kalhor, Marc Güell, Mike Sismour, Justin Feng, Anik Debnath, George
Chao, Ben Stranges, Eswar Iyer, Raj Chari, Fred Vigneault, Sven Dietz, Bobby Dhadwar, Yu
Wang, Noah Donoghue, Adam Marblestone, Evan Daugharthy, Uri Laserson, Adrian Briggs,
Julie Norville, Barry Wanner, Dima Ter-Ovanesyan, Matthieu Landon, Jun Teramoto, Wei
Leong Chew, Jamie Rogers, Nathan Johns, Chris Guzman, Joe Negri, Mirko Palla, Gleb
Kuznetsov, Mingjie Dai, Margo Monroe, Joyce Yang, Madeline Ball, Arthur Sun, Jun Li, Luhan
Yang, Po-yi Huang, Alex Hernandez-Siegel, Seth Shipman, Venky Soundararajan, Ido Bachelet,
Chao Li, Rigel Chan, Tara Gianoulis, Josh Mosberg, Dan Mandell, Danny Levner, Charles
Fracchia, Roger Conturie, Joe Davis, Yveta Masar, Meghan Radden, Laura Glass, Stan Fields,
Frank Poelwijk, and several others for lending a hand, offering input, and adding colorful
memories to my Church lab experience.
Being a part of the Harvard-MIT HST program, HMS Genetics, and the Wyss Institute, I
would like to express my gratitude to those who kept everything running as smoothly as possible.
At HST, I thank my academic advisor Richard Cohen for making sure I was on track throughout
my graduate program, and Julie Greenberg, Laurie Ward, Traci Anderson, and Joe Stein for all
their work behind the scenes. I would like to acknowledge Vonda Shannon, Ella Sexton, Scott
Blackwell, Heidi Turcotte, and Terri Broderick at HMS Genetics, Kelly Seary at the HIM animal
facility, and several individuals at the Wyss, including Susan Kelly, Jeanne Nisbet, Martin
Montoya-Zavala, Joel Rivera-Cardona, Angel Velarde, Ngawang Sherpa, Amanda Graveline,
Andyna Vernet, Matt Balestrieri, Rich Terry, Brian Turczyk, Marcelle Tuttle, and Ben Pruitt for
their assistance. I also thank the Harvard MSI for providing a warm community for students in
the microbial sciences.
Furthermore, I am grateful to all the faculty, teaching staff, and fellow classmates in my
HST courses, from Pathology to ICM at Mount Auburn Hospital, for the memorable learning
experience and incomparable exposure to clinical medicine. I am especially thankful to fellow
students in and outside of HST who were part of our weekly lunches, particularly George Xu,
Luvena Ong, Hanlin Tang, James Kath, Thomas Graham, and Luis Barrera, with guest
appearances by Sandeep Koshy, Vikram Juneja, and Helen Hou; I came to think of it as a
support group that helped preserve our well-being. I owe much gratitude to Luvena for helping
me survive graduate school; I am glad we joined neighboring labs at the Wyss and could keep
each other afloat during the high tides, though I suspect I benefited much more, at least in
substance from her culinary explorations. I wish to extend special thanks to George Xu, with
whom I have shared a delightful and unique partnership in building Quantamerix. I also thank
George and Jesse Engreitz for suggesting that I check out the Church lab during our first weeks
in HST given my broad interest in technology development. To some more senior HST students,
Pavitra Krishnaswamy, Dan Macaya, Alice Chen, Tim O’Shea, James Dahlman, Meghan Shan,
6
Kay Furman, Ronn Friedlander, Nate Reticker-Flynn, Alex German, and Ryan Cooper, I would
like to give thanks for reassuring me that things would turn out alright. At the Wyss, many other
members of the Yin, Shih, Silver, Collins, Ingber, and Joshi labs improved my graduate work
and experience, including Cameron Myhrvold, Marika Ziesack, Jaeseung Hahn, Bhavik
Nathwani, Thomas Schlichthärle, Maartje Bastings, Andries van der Meer, Aishwarya Sukumar,
Joanna Robaszewski, Ralf Jungmann, Buz Barstow, Nadia Cohen, Sauveur Jeanty, and Peter
Nguyen. I am also grateful to members of the MIT Sidney-Pacific community for their
camaraderie, including Tarun Jain, Sumit Dutta, Annie Chen, Naichun Chen, and Amy Bilton.
Many thanks to friends who kept me sane and connected to the real world at various
points during my graduate studies: Yodit Tewelde, Erika Sandford, Orrin Barnhart, Rebecca
Rich, Rebecca Gould, Mindy Eng, Michelle Princi, Heymian Wong, Nina Guo, Ruel Jerry,
Jackie Holmes, Julie Paul, Alice Chi, Margaret Ding, Jackie Goldstein, Ana Chen, Hattie Chung,
Lily Keung, Geena Márquez, Omar Abudayyeh, Eric Timmons, Jason Trigg, Seoung Yeon Kim,
Ashley Chang, Julie Wu, Ankur Mandhania, Rakesh Popli, Sherry Wu, and many others.
I would also like to express my heartfelt gratitude to those who have shaped my path,
specifically advisors and mentors from my undergraduate years at MIT, including Linda Griffith,
Ram Sasisekharan, Mary Camerlengo, Steve Wasserman, Agi Stachowiak, Chuck Eesley, Venky
Soundararajan, Anne Hunter, David Schauer, Tina Amarnani, and Wyman Li, and at my
internships outside of MIT, including Camilla Forsberg, Conan Li, Odilo Mueller, Michael
McNulty, David Deng, Dorothy Yang, Mark Van Cleve, Hirdesh Uppal, Kyle Kolaja, George
Zhou, Tseng-En Hu, Zhihao Lin, Yatin Gokarn, Elaine Tseng, Peter Matthews, and Frank
Reynolds. They not only gave me advice and assistance at critical times, but also offered me
opportunities to grow in many areas, such as leadership, communication, teamwork, and
scientific maturity. To my teachers before MIT, I owe special thanks to Steve Bowen, Joyce
Yamamoto, Bernadette Troyan, Diane Shires, and Bruce Compton for believing in me and
encouraging me to be industrious, inquisitive, and innovative.
Most of all, I am grateful to my family, to whom this thesis is dedicated. Their nurturing,
unconditional support has been indispensable to my growth. They have taught me courage,
resilience, integrity, and self-reflection; demonstrated giving without expectation; and shown me
what it means to be conscientious, empathetic, and humble – I hope I have not disappointed.
Moreover, I could not have asked for better parents; Mom and Dad have been my most ardent
supporters, never wavering in their faith in my endurance and abilities to chase my dreams.
Finally, I apologize for the laundry list nature and incompleteness of these
acknowledgements – countless pages would be required to properly thank everyone and provide
specific vignettes. Instead, I will end by saying, to everyone who has contributed, thank you for
making me a better person and a better scientist.
7
Contents
Abstract
3
Dedication
4
Acknowledgements
5
Contents
8
List of Figures
11
List of Tables
14
Chapter 1
Introduction
15
1.1
Scope
15
1.2
Recent progress in engineering human-associated microbiomes
17
1.2.1
Abstract
17
1.2.2
Introduction
17
1.2.3
Microbiota, host, and disease
19
1.2.4
Enabling tools for engineering the microbiota
21
1.2.5
Perspectives
31
1.2.6
Acknowledgements
32
Chapter 2
Improving microbial fitness in the mammalian gut using in vivo temporal
functional metagenomics sequencing
33
2.1
Abstract
33
2.2
Introduction
34
2.3
Materials and Methods
36
2.3.1
Bacterial strains and growth conditions
36
2.3.2
Library generation
37
2.3.3
Plasmid retention
37
2.3.4
In vitro selection
37
2.3.5
In vivo selection
38
2.3.6
Colony PCR and Sanger sequencing
38
2.3.7
DNA extraction and PCR amplification of inserts for Illumina sequencing
39
2.3.8
High-throughput sequencing and analysis of in vitro library selection data
40
8
2.3.9
2.4
High-throughput sequencing and processing of in vivo selection data
41
2.3.10 Statistical analyses of in vivo selection data
44
2.3.11 Whole genome sequencing of isolated clones from in vivo selection
45
2.3.12 Growth assays
46
Results
47
2.4.1
Library construction and characterization
47
2.4.2
In vitro stability and selection by media condition
49
2.4.3
In vivo library selection in germ-free mice
53
2.4.4
Characterization of in vivo library population dynamics
54
2.4.5
Genes showing transient selection during early gut colonization
57
2.4.6
Genes showing long-term selection during gut colonization
59
2.4.7
In vivo genomic stability of E. coli recipient strain
65
2.5
Discussion
69
2.6
Data Availability
71
2.7
Acknowledgements
72
Chapter 3
3.1
3.2
3.3
Delivering and maintaining genetic elements
73
Background
73
3.1.1
Limitations of current microbiota manipulations
74
3.1.2
Horizontal gene transfer
75
Engineering horizontal gene transfer networks
76
3.2.1
Introduction
76
3.2.2
Materials and Methods
76
3.2.3
Results
83
3.2.4
Discussion
94
3.2.5
Acknowledgements
95
Immunizing strains against acquisition of antibiotic resistance and toxins
96
3.3.1
Introduction
96
3.3.2
Materials and Methods
97
3.3.3
Results
99
3.3.4
Discussion
108
3.3.5
Acknowledgements
110
9
Chapter 4
Replacing gut microbial strains with precision using phages and CRISPR 111
4.1
Background
111
4.2
Phage-assisted niche depletion in the murine gut
114
4.2.1
Introduction
114
4.2.2
Materials and methods
114
4.2.3
Results
119
4.2.4
Discussion
124
4.2.5
Acknowledgements
125
4.2.6
Supplementary figures
126
4.3
4.4
4.5
CRISPR/Cas9-mediated phage resistance is not impeded by T4 DNA modifications
130
4.3.1
Abstract
130
4.3.2
Introduction
130
4.3.3
Materials and methods
131
4.3.4
Results
133
4.3.5
Discussion
141
4.3.6
Acknowledgements
143
Complete genome sequences of 11 T4-like bacteriophages
144
4.4.1
Abstract
144
4.4.2
Genome announcement
144
4.4.3
Acknowledgements
146
Generating effective CRISPR spacers against bacteriophages
147
4.5.1
Introduction
147
4.5.2
Materials and methods
148
4.5.3
Results
154
4.5.4
Discussion
162
4.5.5
Acknowledgements
171
Chapter 5
Bibliography
10
Conclusions and outlook on microbiome engineering
172
174
List of Figures
Figure 1-1
Framework for engineering human-associated microbiota.
18
Figure 1-2
Composition of the human gut microbiome during development with respect
to microbial diversity and population stability.
20
Changes in the composition of human microbiota during disease states
compared to healthy states.
22
Figure 1-4
Approaches to human microbiome engineering.
23
Figure 1-5
Genetic tractability of abundant or relevant human-associated microbial
genera.
25
Figure 2-1
Experimental design.
36
Figure 2-2
Double digestion and PCR protocol for sequencing.
40
Figure 2-3
Technical reproducibility of library sequencing protocol.
47
Figure 2-4
Input library characterization.
48
Figure 2-5
Insert distribution over time in in vitro selection.
49
Figure 2-6
Insert distribution over time in in vivo selection.
50
Figure 2-7
In vivo selection experiments.
53
Figure 2-8
Distribution of mapped bases to each Bt gene by mouse.
55
Figure 2-9
COG functional categories of bases mapped to the entire Bt genome
averaged across the five mice.
56
Figure 2-10
BT_1759 glycoside hydrolase selection kinetics.
60
Figure 2-11
BT_1759 glycoside hydrolase read mapping profile.
61
Figure 2-12
BT_1759 glycoside hydrolase functional characterization in sucrose media.
62
Figure 2-13
BT_0370 galactokinase and BT_371 glucose/galactose transporter.
64
Figure 2-14
Growth characterization of clones with genomic SNVs.
68
Figure 3-1
Maps of plasmids used in this study.
77
Figure 3-2
Triplicate design to minimize effects of evaporation in edge wells.
81
Figure 3-3
Conjugation mating experimental workflow.
82
Figure 3-4
Example validation for qPCR primer pair.
83
Figure 3-5
First set of growth curves by bacterial strain.
84
Figure 3-6
First set of growth curves by media condition.
85
Figure 3-7
Second set of growth curves by bacterial strain.
86
Figure 3-8
Second set of growth curves by media condition.
87
Figure 3-9
Third set of growth data.
88
Figure 1-3
11
Figure 3-10
Antibiotic resistance profiles of representative microbiota species.
90
Figure 3-11
Design of Cas9 cassette with genome-copying feature.
97
Figure 3-12
Example CRISPR spacer validation assay in V. cholerae.
101
Figure 3-13
Escapees recombine at repeat regions to excise the spacer.
103
Figure 3-14
Alternative CRISPR repeats with base substitutions.
104
Figure 3-15
Alternative CRISPR repeats with truncations.
105
Figure 3-16
Alternative CRISPR repeats with length 18 nt and one to two mismatches.
106
Figure 3-17
Stable incorporation of engineered Cas9 mobile elements in E. coli.
107
Figure 4-1
Strain rotation scheme using phage and corresponding susceptible and Cas9mediated resistant host strains.
112
Figure 4-2
Mouse experiment 1 design to test effect of phage and/or sugar.
117
Figure 4-3
Mouse experiment 2 design to test effect of repeated phage dosing.
118
Figure 4-4
Biomass of YFP- and YFP+ cells in mouse experiment 1.
120
Figure 4-5
Fraction of replaced cells in mouse experiment 1.
121
Figure 4-6
Biomass of YFP- and YFP+ cells in mouse experiment 2.
122
Figure 4-7
Fraction of replaced cells in mouse experiment 2.
123
Figure 4-8
Raw data points from mouse experiment 1.
126
Figure 4-9
Raw data points from mouse experiment 2.
127
Figure 4-10
Individual mouse data from experiment 2.
128
Figure 4-11
Raw data points for mice #24 and #27.
129
Figure 4-12
Native E. coli spacers target phage with modified DNA.
134
Figure 4-13
Cas9 cuts methylated cytosines and adenosines in E. coli.
135
Figure 4-14
Cas9 reduces E. coli susceptibility to phages T7 and RB49.
137
Figure 4-15
Cas9 reduces E. coli susceptibility to phages T4 and T4 gt.
139
Figure 4-16
Restriction digest of phages.
140
Figure 4-17
Efficiency of plating of T4 gt on wild-type E. coli K-12.
142
Figure 4-18
Host range of T4-like phages.
147
Figure 4-19
Spacer Y confers protection in phages T4, RB49, and RB69.
148
Figure 4-20
Library construction and sequencing design.
149
Figure 4-21
Mock library composition of T4 spacers.
155
Figure 4-22
Mock library selection enriched for effective spacer.
155
Figure 4-23
Fold change of spacers after phage selection.
156
12
Figure 4-24
Host strain differences across selection experiments.
157
Figure 4-25
Initial validation of top spacers using phage-embedded agar.
159
Figure 4-26
Semi-quantitative results of initial validation screen of top anti-phage
spacers.
160
Figure 4-27
Quantitative validation of screened spacers using plaque assays.
Figure 4-28
Nucleotide frequencies at each position in the spacer sequence across
libraries.
166
Figure 4-29
Enriched regions on the phage T6 genome.
167
Figure 4-30
Enriched regions on the phage RB15 genome.
168
Figure 4-31
Enriched regions on the phage RB33 genome.
169
Figure 4-32
Enriched regions on the phage RB69 genome.
170
Figure 5-1
Engineering microbiomes from diseased to healthy states.
173
161
13
List of Tables
Table 2-1
Primers used in the study.
38
Table 2-2
Summary of sequencing metrics for in vitro experiments.
41
Table 2-3
Summary of sequencing metrics for in vivo experiments.
43
Table 2-4
Summary of metrics for whole genome sequencing of E. coli strains.
46
Table 2-5
Bt genes significantly enriched or depleted at Day 6 or 7 in vitro.
52
Table 2-6
Statistical testing of in vivo selection of Bt genes.
58
Table 2-7
Genetic variants in mouse-isolated clones identified by whole genome
sequencing.
67
Table 3-1
Composition of first set of growth media.
78
Table 3-2
Composition of second set of growth media.
79
Table 3-3
Composition of third set of growth media.
80
Table 3-4
Composition of the “3:2pas” medium.
89
Table 3-5
List of species-specific primers.
91
Table 3-6
List of species-specific qPCR primers.
92
Table 3-7
Conjugation frequencies of pFD340 into Bacteroides.
93
Table 3-8
Secondary transfers from Bacteroides into E. coli.
93
Table 3-9
Conjugation frequencies of pBC003.
94
Table 3-10 Conjugation frequencies from literature.
95
Table 3-11 Validated spacers for E. coli and V. cholerae applications.
100
Table 3-12 Nested chi site regions in E. coli MG1655.
109
Table 3-13 Nested chi site regions in E. coli Nissle 1917.
109
Table 4-1
Primers to identify E. coli Nissle 1917.
116
Table 4-2
Phage escapee analysis.
143
Table 4-3
Genome features of the sequenced strains
145
Table 4-4
Pairwise similarity of phages T6, RB15, RB33, and RB69.
147
Table 4-5
Primers for amplifying sub-pools of oligonucleotides based on barcodes.
151
Table 4-6
Primers for amplifying libraries for high-throughput sequencing.
153
Table 4-7
Custom sequencing primers.
153
Table 4-8
Features of top spacers used for validation assays.
158
Table 4-9
Sequence analysis of cross-reactive spacers.
163
Table 4-10 Comparison of quantified spacer activity with library selection data.
14
164
Chapter 1
Introduction
1.1
Scope
Microorganisms occupy a fascinating space in our world, not only in the environment,
but also in our own bodies. With the increasingly large body of evidence that the microbes living
in or on us, or microbiota, play a role in human health, one begins to wonder what these
microbes do, how they may vary over time in a single person and across different individuals,
and if they can be systematically and safely tuned to improve clinical outcomes. To better
understand and potentially engineer microbes that have become associated with us, we have
developed novel approaches to probe the function as well as precisely modulate the genetic
content of these microbial residents. This dissertation aims to address two main questions: what
do we edit, and how do we edit?
To answer the first, we must conduct functional gene discovery. Since the mammalian
gut is home to the most densely populated microbial community characterized to date, we were
interested in how microbes survive and persist in this highly competitive environment, where
there are limited resources (e.g., nutrients) and intense pressures (e.g., host immune system). We
developed a method called temporal functional metagenomics sequencing (TFUMseq), described
in Chapter 2. As a starting point, we placed raw DNA of interest, from a well-defined “donor”
genome, in another species that can express the heterologous pieces of DNA. We then studied
how the new DNA fragments can improve the performance of the “recipient” strain, specifically
in the context of the mammalian gut. By introducing the recipient bacterial strain expressing the
donor DNA fragments into mice, we identified genes contributing to improved fitness in vivo.
This type of functional metagenomics approach opens the doors to more complex investigations
into how different sources of donor DNA material interact with different mammalian
environments and selections through various recipient strains; it is an avenue to gain insight into
the in vivo dynamics of host-microbiota interaction.
15
To answer the second question of how to edit the microbiota, we must consider methods
of gene delivery, and even cell-level perturbations to modulate microbial members and their
DNA. We built several molecular tools that contribute to efforts in editing the microbiome, the
genes that make up the microbiota. In one approach, as described in Chapter 3, bacterial strains
serve as the delivery vehicle for introducing genetic material into native microbiota that could
express a therapeutic protein or immunize the microbiota against acquisition of pathogenic
elements from the environment. Given the inadequate amount of available data on how well
DNA can transfer across different microbial species, we laid the groundwork for studying
complex, defined microbial communities and rates of gene transfer. We present methods for
molecular-based identification and growth-based selection of various microbiota species. Then,
we harnessed the bacterial adaptive immune system, CRISPR-Cas9, to explore opportunities in
lowering the likelihood of microbiota acquiring toxins or antibiotic resistance genes. We
validated several antibiotic resistance and toxin sequences to target, tested designs for building
large stable arrays of these sequences, and demonstrated the feasibility of an immunization
payload that can copy itself into the bacterial chromosome. These findings provide a critical
foundation for stably delivering engineered elements into the endogenous microbiota in order to
promote health by expressing therapeutic proteins or preventing pathogenesis.
In Chapter 4, we present another approach leveraging viruses that infect bacteria, or
bacteriophages. These are essentially highly specific antibiotics that we apply to selectively
vacate a niche in the native microbial community to allow for the colonization of an engineered
bacterial strain. We piloted a mouse experiment where we tested key assumptions about phage
selection pressures on targeted bacteria in vivo and phage resistance mediated by CRISPR-Cas9.
To overcome challenges in identifying effective CRISPR spacers against phage, we investigated
phage-encoded DNA modifications and genome sequences. We found that bulky DNA
modifications do not impede Cas9 activity in the context of lytic phage infection. We also
sequenced nearly a dozen T4-like phages in order to construct a large library of candidate
CRISPR spacers; we designed a selection method using phages to enrich for effective anti-phage
spacers. Our work enables novel targeted microbiome therapies by integrating the molecular
precision of CRISPR-Cas9 with the strain-specificity of bacteriophages.
The remaining portions of this introductory chapter are a review of microbiome-related
research and engineering efforts, and have been adapted from:
Stephanie J. Yaung, George M. Church, Harris H. Wang. Recent Progress in Engineering
Human-associated Microbiomes. Methods in Molecular Biology 1151:3-25 (2014). Ref. (1)
16
1.2
Recent progress in engineering human-associated microbiomes
1.2.1 Abstract
Recent progress in molecular biology and genetics open up the possibility of engineering
a variety of biological systems, from single-cellular to multi-cellular organisms. The consortia of
microbes that reside on the human body, the human-associated microbiota, are particularly
interesting as targets for forward engineering and manipulation due to their relevance in health
and disease. New technologies in analysis and perturbation of the human microbiota will lead to
better diagnostic and therapeutic strategies against diseases of microbial origin or pathogenesis.
Here, we discuss recent advances that are bringing us closer to realizing the true potential of an
engineered human-associated microbial community.
1.2.2 Introduction
Of the 100 trillion cells in the human body, 90% are microbes that naturally inhabit
various body sites, including the gastrointestinal tract, nasal and oral cavities, urogenital area,
and skin (2). An individual’s colon is home to 1011-1012 microbial cells/mL, the greatest density
compared to any microbial habitat characterized to date (3). Many studies, such as the Human
Microbiome Project and MetaHIT, have probed the vast effects of microbiota on human health
and disease (2, 4–6). In addition to metagenomic sequencing (7), traditional methods of studying
cells in isolation are important for elucidating molecular bases of microbial activity. However,
cells do not exist in single-species cultures in nature. In fact, some species are only culturable in
the presence of other microorganisms (8). This interdependence for survival amongst microbial
species in a community attests to the importance of intercellular interactions, both microbemicrobe and host-microbe. Despite the fact that the human microbiota is composed of many
individual microbes, these individuals work in concert to perform tasks that rival in complexity
to those of more sophisticated multicellular systems. Thus, the human-associated microbiome
presents a ripe opportunity for forward engineering to potentially improve human health (Figure
1-1). Here, we review recent advances in this area and outline potential avenues for future
endeavors.
17
Figure 1-1 Framework for engineering human-associated microbiota.
Engineering human-associated microbiota requires detailed understanding of processes that
govern the natural propagation and retention of microbes in the host as well as
environmental and adaptive pressures that drive the evolution of cells and communities.
18
1.2.3 Microbiota, host, and disease
Contrary to traditional views, microbes are social organisms that engage with the
environment and other organisms in specific ways. Microbes participate in intercellular
communication through contact-dependent signaling (9), quorum-sensing (10), metabolic
cooperation or competition (6), spatiotemporal organization (11), and horizontal gene transfer
(HGT) (12). Human-associated microbes produce byproducts that serve as substrates utilized by
other resident bacteria (13–15). For instance, accumulated hydrogen gas from bacterial sugar
fermentation is removed by acetogenic, methanogenic, and sulfate-reducing gut bacteria (16). In
contrast to cross-feeding relationships, microbes under stress can release bacteriocins to suppress
the growth of competitors (17–19). If microbes are members of a biofilm community, they
benefit from physical protection from the environment, access to nutrients trapped and
distributed through channels in the biofilm, development of syntrophic relationships with other
members, and the ability to share and acquire genetic traits (20, 21). Microbial populations also
genetically diversify to insure against possible unstable environmental conditions (22, 23).
Moreover, multispecies communities harbor a dynamic gene pool consisting of mobile genetic
elements, such as transposons, plasmids, and bacteriophages, which serve as a source of HGT to
share beneficial functions with neighbors to preserve community stability (24–27). Densely
populated communities such as the human gut are active sites for gene transfer and reservoirs for
antibiotic resistance genes (12, 28–30).
Beyond microbe-microbe interactions, the microbiota co-evolves with the host as it
develops, driving microbial adaptation (31–34). Core functions of microbiota benefit the host,
such as extraction of otherwise inaccessible nutrients, immune system development, and
protection against pathogen colonization (3, 35–38). Gut microbes are critical in intestinal
angiogenesis, epithelial cell maturation, and immunological homeostasis (38–41). For example,
the commensal Bacteroides fragilis produces polysaccharide A, which converts host CD4+ T
cells into Foxp3+ Treg cells, producing IL-10 and inducing mucosal tolerance (42). Host diet,
inflammatory responses, and aging also affect microbial community composition and function
(43–46) (Figure 1-2). Indeed, aberrations in host genetics, immunology, and diet can lead to
microbiota-associated human diseases. Diet-induced obesity in mice from a high-fat diet is
characterized by enhanced energy harvest and an increased Firmicutes to Bacteroidetes ratio (47,
48). Furthermore, disruptions in the homeostasis between gut microbial antigens and host
immunity can invoke allergy and autoimmunity, as in type 1 diabetes and multiple sclerosis (49–
51). It is thought that inflammatory bowel disease (IBD) results from inappropriate immune
responses to intestinal bacteria; genes identified in genome-wide association studies highlight the
role of a host imbalance between pro-inflammatory and regulatory states (49, 52).
19
Figure 1-2 Composition of the human gut microbiome during development
with respect to microbial diversity and population stability.
Data compiled from recent studies from the literature: a, Hong 2010 (53); b,
Saulnier 2011 (54); c, Claesson 2011 (55); d, Yatsunenko 2012 (56); e, Spor
2011 (57).
20
While the host selects for microbial communities that harvest nutrients and prime the
immune system, irregular microbiota composition may cause disease (Figure 1-3), including IBD
(58–60), lactose intolerance (61, 62), obesity (63, 64), type I diabetes (65), arthritis (66),
myocardial infarction severity (67), and opportunistic infections by pathogens such as
Clostridium difficile and HIV (68–71). Microbial gut metabolism links host diet not only to body
composition and obesity (72), but also chronic inflammatory states, such as IBD, type 2 diabetes,
and cardiovascular disease (73–75). Intestinal microbes are also important in off-target drug
metabolism, rendering digoxin, acetaminophen, and Irinotecan less effective or even toxic (76–
78). In the case of Irinotecan, a chemotherapeutic used mainly for colon cancer, the drug is
metabolized by β-glucuronidases of commensal gut bacteria into a toxic form that damages the
intestinal lining and causes severe diarrhea. In the oral cavity, ecological shifts in dental plaque
microbiota lead to caries (cavities), gingivitis, and periodontitis (79). Dental caries arise from
acidic environments generated by acidogenic (acid-forming) and aciduric (acid-tolerant) bacteria,
which metabolize sugar from the host diet. Translocation of oral bacteria into other tissues
results in infections, and cytokines from inflamed gums released into the bloodstream stimulate
systemic inflammation. Oral bacteria have been implicated in respiratory (80, 81) and
cardiovascular diseases (82–84), though mechanisms remain unclear.
1.2.4 Enabling tools for engineering the microbiota
The human-associated microbial community presents a vast reservoir of non-mammalian
genetic information that encode for a variety of functions essential to the mammalian host (85).
Second-generation sequencing technologies have enabled us for the first time to systematically
probe the genetic composition of these trillions of microbes that reside on the human body (2).
The ongoing effort by the Human Microbiome Project and MetaHIT to catalog dominant
microbial strains from different body sites have generated useful reference genomes for many of
the representative species (86). Metagenomic shot-gun sequencing approaches of whole
microbial communities, such as those found in the gut, have yielded near-complete gene catalogs
that describe abundance and diversity of genes that contribute to maintenance and metabolism of
the microbiota (7).
In order to determine functional relationships between human-associated microbes and
their concerted effect in the mammalian host, we rely on functional perturbation of the microbial
community. These investigative avenues include genome-scale perturbation assays, specified
community reconstitutions, and directed engineering through synthetic biology (Figure 1-4).
Each approach provides us with a unique angle to attack an otherwise daunting challenging of
de-convolving a highly intertwined set of microbial interactions in a very heterogeneous
environment and a difficult-to-manipulate human host. Advances in both in vitro and in vivo host
models have thus also facilitated research endeavors in this area, which we discuss in the
following sections.
21
Figure 1-3 Changes in the composition of human microbiota during
disease states compared to healthy states.
Data compiled from recent studies from the literature: a, De Filippo 2010
(87); b, Peterson 2008 (88); c, Larsen 2010 (89); d, Yang 2012 (90); e, Kong
2012 (91); f. Keijser 2008 (92); g, Gao 2007 (93).
22
Figure 1-4 Approaches to human microbiome engineering.
General approaches to engineer the human microbiome through
design, quantitative modeling, genome-scale perturbation and
analysis in in vitro and in vivo models, with the ultimate goal of
producing demand-meeting applications to improve sensing,
prevention, and treatment of diseases.
23
1.2.4.1 Challenges of building new genetics system
Approaches to study the function of human-associated microbes by genetic manipulation
rely on several fundamental capabilities, which are often the largest practical barriers to
manipulate microbes genetically. First, individual microbes need to be isolated and cultured in
the laboratory. Because microbes have a myriad of physiologies and require different nutritional
supplement for growth, different media compositions and growth conditions need to be
laboriously tested by trial-and-error to isolate and culture each microbe. These microbial
culturing techniques date back to the times of Louis Pasteur and are still the dominant approach
today. More recent microbial cultivation techniques use microfluidics and droplet technologies to
enable the discovery of synergistic interactions between natural microbes that allow otherwise
“uncultureable” organisms to be grown in laboratory conditions (8, 94, 95).
Upon successful microbial cultivation, the next limiting step of microbial genetic
manipulation is the transformation of foreign DNA into cells. The passage of foreign DNA (e.g.
plasmids, recombinant fragments) into the cell requires overcoming the physical barriers
presented by the cell wall or membrane. This task is accomplished in nature through processes
such as transduction by phage, conjugation and mating, or natural competency and DNA-uptake
(96, 97). Numerous laboratory techniques have been developed for microbial transformation
including electroporation (98), biolistics (99), sonication (100), and chemical or heat disruption
(101). Electroporation, the most common of the laboratory transformation techniques, rely on
high-voltage electrocution of the bacterial sample that is thought to transiently induce pores on
the cell membrane that then enable extracellular DNA to diffuse into the cell. Various protocols
for electroporation of human-associated microbes have been described and are good starting
points for developing genetic systems in these microbes (102, 103).
Upon transformation of DNA into the cell, the DNA needs to either stably propagate
intracellularly or integrate into the microbial host genome through recombination or other
integration strategies. Inside the cell, stable propagation of episomal DNA such as plasmids
requires DNA replication machinery that is compatible with the foreign DNA (96). Additionally,
cells often use methylation and DNA modification and restriction systems to discern foreign
versus host DNA through a primitive defensive mechanism that fight against viruses or other
invading genetic elements. Nonetheless, these promiscuous genetic elements can often be used as
a way to integrate foreign DNA into the chromosome and are often used for large-scale
functional genomics (104).
Taking all these parameters into considerations currently, we summarized the genetic
tractability of human-associated microbes with respect to culturability, availability of full
genome sequences, transfection methods, and expression and manipulation systems (Figure 1-5).
Expansion of these basic genetic tools is crucial to future functional studies of human-microbiota.
24
Figure 1-5 Genetic tractability of abundant or relevant human-associated microbial genera.
Genetic tractability is evaluated here by the availability of means to introduce genetic material (e.g.
transformation, conjugation, or transduction), vectors, expression systems, completed genomic
sequences, and culturing methods. Circles of increasing sizes indicate more genetic tractability.
Protocols and demonstrated methods for genetic manipulation are listed as follows: a. Clostridium:
Phillips-Jones 1995, Jennert 2000, Young 1999, Bouillaut 2011 (105–108); b. Ruminococcus:
Cocconcelli 1992 (109); c. Lactobacillus: van Pijkeren 2012, Ljungh 2009, Damelin 2010, Sorvig
2005, Thompson 1996, Lizier 2010 (110–115); d. Enterococcus: Shepard 1995 (116); e. Lactococcus:
25
Holo 1995, van Pijkeren 2012 (110, 117); f. Streptococcus: McLaughlin 1995, Biswas 2008 (118, 119);
g. Staphlyococcus: Lee 1995 (120); h. Listeria: Alexander 1990 (121); i. Treponema: Kuramitsu
2005 (122); j. Borrelia: Hyde 2011, Rosa 1999 (123, 124); k. Bifidobacterium: Mayo 2010 (125); l.
Actinomyces: Yeung 1994 (126); m. Mycobacterium: Parish 2009, Sassetti 2001 (127, 128); n.
Proprionibacterium: Luijk 2002 (129); o. Chlamydia: Binet 2009 (130); p. Porphyromonas:
Belanger 2007 (131); q. Prevotella: Flint 2000, Salyers 1992 (132, 133); r. Bacteroides: Salyers 1999,
Smith 1995, Bacic 2008 (134–136); s. Fusobacterium: Haake 2006 (137); t. Helicobacter: Taylor
1992, Segal 1995 (138, 139); u. Camplyobacter: Taylor 1992 (139); v. Rickettsia: Rachek 2000 (140);
w. Brucella: McQuiston 1995 (141); x. Bordetella: Scarlato 1996 (142); y. Neisseria: O'Dwyer 2005,
Bogdon 2002, Genco 1984 (143–145); z. Pseudomonas: Dennis 1995 (146).
1.2.4.2 Genome-scale perturbations
Genome-scale perturbations are a class of genetic approaches that disrupt or perturb the
expression of functional genes that contribute to relevant phenotypes by individual microbes. To
dissect the function of different genes in the cell, we have relied heavily on the use of
transposons, which are selfish genetic elements that can splice into and out of different locations
of chromosomal DNA thereby disrupting the coding sequence (147). This classical approach,
known as transposon mutagenesis, has allowed us to isolate many genetic mutants whose
disrupted genes give rise to interesting phenotypes that reflect the importance of those genes to
its physiology. Next-generation DNA sequencing has now enabled multiplexed genotyping of
pools of transposon mutants by using molecular barcodes that then can be applied to measure the
effect of genome-scale perturbations in different environmental conditions. For example,
techniques such as Insertion Sequencing (INSeq) (148) utilize the inverted repeat recognition of
the Himar transposase, which is also a restriction site for the type II restriction enzyme MmeI, to
generate paired 16-17 bp flanking genomic sequences around the transposon that can be
sequenced in pools. Thus, the defined insertion location of every transposon in the library can be
determined. By sequencing this pooled mutant library pre- and post-treatment with any number
of environmental perturbations, one can probe the effects of different gene disruptions on the
physiology of the cell in a multiplexed fashion. Similar techniques using other transposon
systems such as Tn-seq (149), high-throughput insertion tracking by deep sequencing (HITS)
(150), and transposon-directed insertion-site sequencing (TraDIS) (151) have also been
developed.
In addition to transposon-based systems, shotgun expression libraries have been useful in
discovering functional DNA elements in genomic or metagenomic DNA. Shot-gun expression
libraries rely on physical shearing or restriction digestion of a donor DNA source into smaller
DNA fragments that are then cloned into a gene expression vector and transformed into a host
strain for functional analysis. A library of metagenomic DNA samples can for example be
extracted from an environment and cloned into plasmids that are then expressed in E. coli.
Selection and sequencing of the E. coli population for heterologous DNA that enable new
26
function lead to discovery of new gene elements that perform a particular function. This
approach can easily identify functions such as antibiotic resistance (152), but have yielded less
success with other functions.
Towards forward engineering of human-associated microbes, new genome engineering
tools such as trackable multiplex recombineering (TRMR) (153, 154) and multiplex automated
genome engineering (MAGE) enable efficient, site-specific modification of the genome (155–
158). TRMR combines double-stranded homologous recombination (159) and molecular
barcodes synthesized from DNA microarrays to generate populations of mutants that are
trackable by microarray or sequencing. MAGE relies on introduction of pools of single-stranded
oligonucleotides that targets defined locations of the genome to introduce regulatory mutations
(156) or coding modifications (160). These and other recombineering technologies are now
being developed for a variety of other organisms including gram-negative bacteria (161), lactic
acid bacteria (110), Pseudomonas syringae (162), Mycobacterium tuberculosis (163), and are
likely to be very useful for engineering human-associated microbes.
1.2.4.3 Reconstituted communities
The community of microbes that make up the human-microbiome can be considered a
“pseudo-organ” of its own. These microbes interact with one another and the mammalian host in
potentially highly complex ways that may be difficult to decipher even with tractable genetic
systems (164). A direct approach to study these interactions is to build reconstituted communities
of microbes derived from monoculture isolates in defined combinations. This de novo
reconstitution approach to build synthetic communities has significant advantages over attempts
to deconvolute natural communities. Reconstituted synthetic consortium presents a tractable
level of complexity in terms of number of interacting microbial species that we can track by
sequencing and predict with quantitative models. In one such study, researchers inoculated 10
representative strains of the human microbiota into germ-free mice (165). The mice were then
fed with defined diets of macronutrients consisting of proteins, fats, polysaccharides, and sugars.
By tracking the abundance of the 10-member microbial consortium using high-throughput
sequencing, the researchers could predict over 60% of the variation in species abundance as a
result of diet perturbations. This avenue of investigation presents a viable approach to study the
human microbiome and ways to analyze synthetically engineered microbiota.
Engineered microbes have been utilized to reconstitute synthetic communities to
investigate the role of metabolic exchange. One such important metabolic exchange is that of
amino acids, as they are the essential constituent of proteins. Various syntrophic cross-feeding
communities have been described using auxotrophic E. coli and yeast strains that require
different amino acid supplementation for growth (166–168). In these syntrophic systems,
metabolites that are exchanged across different biosynthetic pathways promote more syntrophic
growth than those that exchanged along the same pathway, which also related to the cost of
27
biosynthesis of the amino acid metabolites. Amino acid exchange is likely a large player in
driving metabolism of microbial communities as a substantial fraction of all microbes are
missing biosynthesis of various metabolites and thus require growth on more rich and complex
substrates that are found in the gut (169).
1.2.4.4 Microbial engineering through synthetic biology
New approaches are now utilizing synthetic biology to engineer human-associated
microbiota to improve health and metabolism as well as monitor and fight diseases. These efforts
focus on developing genetic circuits that actuate in an engineered host cell such as E. coli that
can sense and respond to changes to its environment and in the presence of particular pathogens.
For example, to detect the human opportunistic pathogen Pseudomonas aeruginosa, which often
causes chronic cystic fibrosis infections and colonizes the gastrointestinal tract, E. coli was
engineered to detect the small diffusible molecule that is excreted by P. aeruginosa through the
quorum sensing pathway (170). An engineered synthetic circuit was placed in non-pathogenic E.
coli, which when placed in the presence of high-density P. aeruginosa, triggered a self-lysis
program that released a narrow-spectrum bacteriocin that specifically killed the P. aeruginosa
strain. Similar strategies have also been demonstrated to detect and respond to Vibrio cholerae
infection using engineered E. coli that sense autoinducer-1 (AI1) molecules from V. cholerae
quorum sensing pathway (171). These strategies appear to yield improved survival rates against
microbial-pathogenesis in murine models. Quorum sensing systems, which normally help
microbes detect local cell density, has been further enhanced to improve robustness and
performance to enable coupled short-range and long-range feedback circuits that enable
microbial communication across large distances in an engineered community.
Other microbes have been successfully engineered to perform specific functions on
human-associated surfaces such as the mucosal layer of the gut epithelium. Numerous diseases
that occur along the intestinal tract are targets of such engineered approaches. For example, the
probiotic strain Lactococcus lactis has been engineered to secrete recombinant human
interleukin-10 in the gastrointestinal tract to reduce colitis (172, 173). Other future applications
of engineered probiotics include enhancing catabolism of nutrients (e.g. lactose and gluten),
modulation of the immune system, and removal of pathogens by selective toxin release (170).
1.2.4.5 In vitro host models
To probe and engineer the human-associated microbial community, various in vitro
models have been developed, ranging from traditional batch culturing in chemostats to
microfluidic systems that incorporate host cells. Single-vessel chemostats inoculated with fecal
samples from healthy individuals have helped identify horizontal gene transfer (174) and
selective bacterial colonization on different carbohydrate substrates (175, 176). A multichamber
continuous culture system mimicking spatial, nutritional, and pH properties of different GI tract
28
regions can be used to investigate stabilization dynamics (177–179). Similarly, the constantdepth film fermenter resembles oral biofilm (180) and has enabled studies on biofilm formation,
antibiotic resistance, and horizontal gene transfer in a multispecies oral community (181, 182).
To incorporate mammalian cells in studying host-microbial interactions, organ-on-a-chip
microfluidic devices have been recently used. In one version of such a system, a gut-on-a-chip
device, the microfluidic channel is coated with extracellular matrix and lined by human intestinal
epithelial (Caco-2) cells. This system mimics intestinal flow and peristaltic motion, recapitulates
columnar epithelium polarization and intestinal villi formation, and supports the growth of
commensal Lactobacillus rhamnosus GG (183). These microdevices offer an opportunity to
investigate host-microbiota interactions in a well-controlled manner and in physiologically
relevant conditions.
Inoculating with native microbiota samples provides a method to overcome the uncultivability of many microbes as well as to study collective activity and discover novel
functions without a priori knowledge of community composition. However, starting with a
predefined microbial community allows a controlled setting better suited for testing engineered
systems. In one study analyzing the dynamics of a community representing the four main gut
phyla in a chemostat, the authors propose that intrinsic microbial interactions, rather than host
selective pressure, play a role in the observed colonization pattern, which was similar to what has
been documented in the human gut . Similar models have been developed for oral microbiota
studies. The use of predefined oral microbial inocula has helped elucidate metabolic cooperation
in batch culture (13) and community development in saliva-conditioned flow cells (184).
1.2.4.6 In vivo host models
In order to move into in vivo animal models that more closely represent the physiology of
the human host environment, researchers have extensively utilized murine models including
germ-free, gnotobiotic, and conventionally-raised mice. Gnotobiotic animals are born in aseptic
conditions and reared in a sterile environment where they are exposed only to known microbial
species; technically, germ-free mice are a type of gnotobiotic mice that have not been exposed to
any microbes. Similar to in vitro systems, mice can be inoculated with either a natural microbiota
sample or a predefined microbial community. Fecal samples, as well as oral swab and saliva
samples, can then be collected from gnotobiotic mice for biochemical analysis and species
quantification of gut and oral cavity microbiota. In vivo models have been used to study the
transmission of antibiotic resistance in the mouse gut (185, 186) and colonization resistance in
the oral cavity. Furthermore, the choice of the inoculum donor offers opportunities to compare
different host selection pressures and microbial community responses. Microbiota can be
transplanted not only from conventionally-raised to germ-free animals of the same species, but
also inter-species, as in human microbiota into mouse, called humanized gnotobiotic mice (187).
In one study, transplants from zebrafish gut microbiota into germ-free mice and mouse gut
microbiota into germ-free zebrafish revealed that the resulting community conformed to the
29
native host composition, demonstrating host selection (188).
Altering host diet, environment, or genetic background can also enable studies in hostmicrobiota interactions. One method to gain insight into the role of microbial communities in
disease is to utilize mice with recapitulated pathologies. For example, IL-10-/-, ob-/-, apoE-/-, and
TLR2-/- or TLR5-/- mice are models for colitis, obesity, hypercholesterolemia, and metabolic
syndrome, respectively (47, 188–191). To generate antigen- or pathogen-specific phenotypes,
mice can be infected with Salmonella typhimurium to study colitis , or Citrobacter rodentium as
a model for attaching and effacing pathogens, such as enterohemorrhagic E.coli (192, 193).
Furthermore, murine models with chemically induced inflammation can be tools to study chronic
mucosal inflammation; dextran sodium sulfate (DSS) can induce ulcerative colitis and
trinitrobenzene sulfonic acid (TNBS) can stimulate Crohn’s disease (194). To investigate oral
microbiota, there are periodontal disease (195) and oral infection models (196, 197); gnotobiotic
rodents can also be fed a high-sucrose cariogenic diet to promote plaque formation.
Germ-free mice inoculated with defined microbes are informative models for analyzing
microbial colonization and metabolic adaptation (198). For example, resident bacteria and
probiotic strains adapt their substrate utilization: in the presence of Bifidobacterium longum,
Bifidobacterium animalis, or Lactobacillus casei, Bacteroides thetaiotaomicron diversified its
carbohydrate utilization by shifting metabolism from mucosal glycans to dietary plant
polysaccharides (199). Furthermore, the effect of different diets on microbial community
composition can be studied, as in gnotobiotic mice inoculated with ten sequenced gut bacterial
species and fed with various levels of casein, cornstarch, sucrose, and corn oil to represent
protein, polysaccharide, sugar, and fat content in the diet, respectively (165).
1.2.4.7 Computational frameworks for human-microbiomics
Over the past several decades, a large number of theoretical and quantitative models have
been developed to describe the cell and its behavior. Constrain-based models are used to describe
metabolism of individual cells using stoichiometric representation of metabolic reactions and
optimization constraints (200). Approaches such as Flux Balance Analysis (FBA) enable the
analysis of metabolism under steady state assumptions by linear optimization solution methods.
These methods are now being scaled to ecosystems of cells. Recent developments using multilevel objective optimization (201), and dynamic systems (202) enable the modeling of synthetic
ecosystems of three or more members. Using metagenomic data of the gut microbiome,
Greenblum et al generated a community-level metabolic reconstruction network of the
microbiota and discovered topological variations that are associated with obesity and
inflammatory bowel disease, giving rise to low-diversity and differences in community
composition (203). For models that account for systems dynamics, population abundance and
metabolite concentrations can be solved independently through different FBA models that are
30
iterated at each time step. This approach called dynamic multi-species metabolic modeling
(DMMM) can capture scenarios of resource competition, leading to the identification of limiting
metabolites (204). Other complementary models include elementary mode analysis (EMA) (205)
that enable quantitative analysis of microbial ecosystems in a multicellular fashion.
1.2.5 Perspectives
Reframing the microbiota community as a core set of genes, not a core set of species,
opens a new front to the microbiome engineering design space. In a metagenomic study of 154
individuals, no single gut bacterial phylotype was detected at an abundant frequency amongst all
the samples, a finding that is consistent with the idea that the core human gut microbiome may
not be best defined by prominent species but by abundantly shared genes and functions (206).
We propose that manipulation at the gene, genome, and ultimately metagenome level offers the
ability for precise multicellular engineering of desirable traits in human-associated microbiota.
Besides controlled perturbations of the microbiome to advance our understanding of hostmicrobiota interactions, metagenome-scale tools enable novel developments in diagnostics and
therapeutics.
From biosensors on the skin to reporters in the gut, there are several opportunities in
monitoring the health and disease status of the human host, such as sensing nutritional
deficiencies, immune imbalances, environmental toxins, or invading pathogens. Prophylactic and
therapeutic avenues for human-microbiome engineering include modifying community
composition, tuning metabolic activity, mediating microbe-microbe relationships, and
modulating host-microbe interactions. Two current microbiota-associated treatments have shown
clinical efficacy: 1) fecal transplants for recurrent Clostridium difficile infection (207) and 2)
probiotics for pouchitis, which is inflammation of the ileal pouch that is created after surgical
removal of the colon in ulcerative colitis patients (208–210). The main challenge is transmission
of undesirable agents from donor feces to the recipient gut in fecal transplants, and native
colonization resistance that would impair infiltration and growth of new species in probiotics
(211, 212). Nevertheless, these successful approaches demonstrate the potential benefits of
leveraging natural microorganisms and entire microbial communities.
In fact, coupling organismal and functional gene level approaches would be a powerful
way to engineer the native microbiota. Microbiome engineering enables multiscale systems
design for the synthesis of nutrients and vitamins, enhanced digestion of gluten and lactose,
decreased acidity of the oral cavity, targeted elimination of multi-drug resistant pathogens, and
microbial modulation of the host immune system. As vehicles for drug delivery, commensal
bacteria designed to secrete heterologous genes have been explored for treating cancer (213–215),
diabetes (216), HIV (217), and IBD (172). For example, IL-10 has immunomodulatory effects in
31
IBD, but requires localized delivery at the intestinal lining to avoid the toxic side effects and low
efficacy of systemic IL-10 injection. Ingestion of modified Lactococcus lactis that secrete
recombinant IL-10 is safe and effective in animal models, and has been promising in human
clinical trials for IBD (173, 218).
Finally, besides addressing clinical safety and efficacy criteria for FDA regulatory
approval (219), overall safety precautions are critical considerations to minimize unintentional
risks in releasing genetically modified material into the natural environment. Rational design,
such as creating auxotrophic microbes (173), for robust stability, non-pathogenicity, and
containment of recombinant genetic systems will be essential in microbiome engineering.
1.2.6 Acknowledgements
H.H.W. acknowledges the generous support from the National Institutes of Health
Director’s Early Independence Award (grant 1DP5OD009172-01). S.J.Y. acknowledges support
from the National Science Foundation Graduate Research Fellowship and the MIT Neurometrix
Presidential Graduate Fellowship. G.M.C. acknowledges support from the Department of Energy
Genomes to Life Center (Grant DE-FG02-02ER63445).
32
Chapter 2
Improving microbial fitness in the
mammalian gut using in vivo
temporal functional
metagenomics sequencing
This chapter has been adapted from:
Stephanie J. Yaung, Luxue Deng, Ning Li, Jonathan L. Braff, George M. Church, Lynn Bry,
Harris H. Wang, Georg K. Gerber. Improving microbial fitness in the mammalian gut using in
vivo temporal functional metagenomics. Molecular Systems Biology 11(3):788 (2015). Ref. (220)
2.1
Abstract
Elucidating functions of commensal microbial genes in the mammalian gut is challenging
because many commensals are recalcitrant to laboratory cultivation and genetic manipulation.
We present TFUMseq (Temporal FUnctional Metagenomics sequencing), a platform to
functionally mine bacterial genomes for genes that contribute to fitness of commensal bacteria in
vivo. Our approach uses metagenomic DNA to construct large-scale heterologous expression
libraries that are tracked over time in vivo by deep sequencing and computational methods. To
demonstrate our approach, we built a TFUMseq plasmid library using the gut commensal
Bacteroides thetaiotaomicron (Bt) and introduced Escherichia coli carrying this library into
germfree mice. Population dynamics of library clones revealed Bt genes conferring significant
fitness advantages in E. coli over time, including carbohydrate utilization genes, with a Bt
33
galactokinase central to early colonization, and subsequent dominance by a Bt glycoside
hydrolase enabling sucrose metabolism coupled with co-evolution of the plasmid library and E.
coli genome driving increased galactose utilization. Our findings highlight the utility of
functional metagenomics for engineering commensal bacteria with improved properties,
including expanded colonization capabilities in vivo.
2.2
Introduction
The mammalian gastrointestinal (GI) tract is a hostile environment for poorly adapted
microbes. Nonetheless, diverse groups of microbes have evolved to prosper in the GI tract, in the
setting of intense interspecies competition, physical and chemical stressors, and the host immune
system (3, 6). These microorganisms also support the normal homeostatic functions of the host
by helping to extract nutrients, stimulate the immune system, and provide protection against
colonization by pathogens (3, 35, 36, 38, 40). Next-generation sequencing has enabled
systematic studies of the mammalian microbiota, and great strides have been made in
characterizing the structure of bacterial communities and their genetic potential in vivo. For
instance, the Human Microbiome Project (HMP) (2, 4, 221) and MetaHIT (7) have generated
maps of bacterial species abundances throughout the human body, reference genomes, and
catalogs of more than 100 million microbial genes assembled from shotgun sequencing of in vivo
communities. Although these studies have generated vast amounts of descriptive data, the
functions of most bacterial genes in these collections remain poorly characterized or wholly
unknown.
Traditional methods to characterize the functions of microbial genes require the isolation,
cultivation, and introduction of foreign DNA into a recipient organism. However, an estimated
60-80% of mammalian-associated microbiota species remain uncultivated (222). Even after
successful culture and introduction of genetic material into a microorganism, the DNA must
integrate into the microbial genome or be maintained episomally. This requires known
compatible replication and restriction-modification systems, which may not be feasible for many
microbes. If these barriers can be overcome, standard low-throughput methods for functional
characterization of genes may be employed, or newer approaches such as transposon
mutagenesis could be coupled with next-generation sequencing. In this latter approach, random
locations on the genome are disrupted with a transposon containing a selectable marker; the
resulting library is subjected to selection conditions and deep sequenced to determine enriched
and depleted mutants (149). A limitation of this approach is that essential genes or those that are
important to cell fitness are difficult to assay, since inactivation of these genes by transposon
mutagenesis would be lethal to the organism under study. An additional constraint is that
transposon mutagenesis may disrupt the expression of bystander genes that are near the relevant
locus, thus causing confounding phenotypic effects.
34
Here, we employ an alternative approach, by building large-scale shotgun expression
libraries that can confer a gain of function in the recipient bacterial strain. Our approach uses
physical shearing or restriction digestion of donor DNA to generate fragments that are cloned
into an expression vector and transformed into the recipient bacterial strain, for high-throughput
functional screening to identify genes that confer a fitness advantage in a particular context. This
approach has the advantage that the donor organism need not be readily culturable or genetically
manipulable in the laboratory; moreover, it allows investigation of essential genes or those
conferring a fitness advantage synergistic with the recipient organism. Functional metagenomics
using environmental samples was first established for communities derived from lignocellulosic
feedstocks (223), seawater (224), and soil (225). The use of shotgun libraries for functional
metagenomics of mammalian-associated microbiota has been demonstrated ex vivo, such as by
growing the library in media with different substrates to characterize carbohydrate active
enzymes (226), prebiotic metabolism (227), glucuronidase activity (228), salt tolerance (229),
and antibiotic resistance genes (152), or by using filtered lysates of the library to screen for
signal modulation in mammalian cell cultures (230). This metagenomic shotgun library approach
has yet to be carried out on a large-scale in vivo.
To demonstrate our TFUMseq (Temporal FUnctional Metagenomics sequencing)
approach, we used high-coverage genetic fragments from the genome of the fully sequenced
human gut commensal Bacteroides thetaiotaomicron (Bt) (231) and cloned the fragments into a
plasmid library in an Escherichia coli K-12 strain. We chose Bt because it is a common
commensal strain in the human gut that persistently colonizes and possesses a broad and wellcharacterized repertoire of catabolic activities, such as sensing polysaccharides and redirecting
metabolism to forage on host versus dietary glycans (232–234). We subjected the TFUMseq
library to in vitro and in vivo selective pressures, collected output samples at different time points
for high-throughput sequencing, and used computational methods to reconstruct the population
dynamics of clones harboring donor genes (Figure 2-1). Our work is an advance over previous
studies in two major aspects. First, to our knowledge, our study is the first to employ shotgun
expression libraries for functional metagenomics in vivo. Important features of the mammalian
gut are difficult to recapitulate in vitro, such as the host immune response. Thus, in vivo
experiments are essential for investigating the function of commensal microbiota genes in the
host. Second, our study leverages high-throughput sequencing and computational methods to
generate detailed dynamics of the entire population subject to selection over time. This kinetic
information is crucial for understanding succession events during the inherently dynamic and
complex process of host colonization.
35
Figure 2-1 Experimental design.
(Left panel) Map of the library backbone vector. The vector was linearized and ligated to sheared
fragments of donor genome to generate the heterologous insert library. (Right panels) Passaging of
the E. coli library in two liquid media conditions (top) and inoculation of the library or a control
luciferase plasmid into germ-free (GF) mice (bottom). Small boxes across the time line denote
sample collection points. Arrows indicate deep-sequenced samples.
2.3
Materials and Methods
2.3.1 Bacterial strains and growth conditions
Bacteroides thetaiotaomicron VPI-5482 (ATCC # 29148) was grown anaerobically in a
rich medium based on supplemented Brain Heart Infusion. The genomic library was maintained
in an Escherichia coli K-12 strain, NEB Turbo (New England Biolabs, Ipswich, MA). E. coli
strains were grown in Luria broth (LB) and supplemented with carbenicillin (final concentration
100 μg/mL) as needed. For anaerobic growth, an anaerobic jar (GasPak System, Becton
Dickinson, Franklin Lakes, NJ) was used. Mouse chow (MC) filtrate was prepared by adding
150 mL deionized water to 8 g of crushed mouse chow (Mouse Breeding Diet 5021, LabDiet, St.
36
Louis, MO). The mixture was heated at 95oC for 30 minutes with mixing, passed through a 0.22
μm filter, and autoclaved. The sterility of the MC filtrate was confirmed by incubating at 37oC in
aerobic and anaerobic conditions and observing no growth after several days.
2.3.2 Library generation
Bacteroides thetaiotaomicron genomic DNA was isolated (DNeasy Blood & Tissue Kit,
Qiagen, Venlo, Netherlands), fragmented by sonication to 3-5 kb (Covaris E210, Covaris,
Woburn, MA), and size-selected and extracted by gel electrophoresis (Pippin Prep, Sage
Sciences). The fragments were end-repaired (End-It DNA End-Repair Kit, Epicenter, Madison,
WI) and cloned into a PCR-amplified GMV1c backbone vector via blunt-end ligation. The
reaction was transformed into NEB Turbo electrocompetent E. coli cells (New England Biolabs).
The library size was quantified by counting colonies formed on selective media (LB
carbenicillin) after plating a fraction of the transformed cells. To assess the size of inserts
successfully cloned into the library, we picked colonies for PCR amplification using primers
ver2_f/r (Table 2-1) that flanked the insert site. We further confirmed the presence of inserts by
submitting amplified inserts for Sanger sequencing (Genewiz, South Plainfield, NJ) and aligning
sequences with the donor B. thetaiotaomicron genome.
2.3.3 Plasmid retention
Individual stool pellets from Days 0.75, 1.5, 1.75, 2.5, 4, 10, 14, 21, 25, and 28 were
homogenized in 10% PBS and plated on LB agar with or without carbenicillin (carb). To obtain
accurate counts, colony platings were performed in triplicate and repeated at 100X dilutions if
the plates were overgrown. Plasmid retention was calculated as the number of colonies grown on
LB-carb plates divided by the number of colonies grown on LB only plates.
2.3.4 In vitro selection
After inoculating the library in LB or MC broth, the cultures were passaged by diluting at
20X into fresh media. LB cultures were grown in aerobic conditions with shaking and passaged
every day for two weeks. MC cultures were grown in anaerobic conditions without shaking and
passaged every two days for two weeks, since the cultures took more time to reach saturation
compared to the LB condition.
37
2.3.5 In vivo selection
All of the mice used in this study were handled in accordance with protocols approved by
the Harvard Medical Area Standing Committee on Animals (HMA IACUC). Male C57BL/6
mice, 6–8 weeks of age, were used. The mice were bred in the Center for Clinical and
Translational Metagenomics facility and maintained in germfree conditions prior to the
experiments. Germ-free mice were orally gavaged with ~2 x 108 CFU of bacteria in a volume of
200 μL on Day 0. Mice inoculated with the library were separately housed. Fecal pellets were
collected at 0.5, 0.75, 1.5, 1.75, 2.5, 3, 4, 7, 10, 14, 21, 25, and 28 days post-inoculation and
stored at -80oC in 10% PBS buffer.
2.3.6 Colony PCR and Sanger sequencing
Individual colonies were isolated from stool samples streaked onto LB agar with
carbenicillin (100 μg/mL). Colonies were grown overnight at 37oC in a 96-well plate with 200
μL of LB+carbenicillin. 0.8 μL of the culture was added to a total PCR reaction volume of 20 μL.
The PCR mix (KAPA HiFi HotStart ReadyMix PCR Kit, Kapa Biosystems, Wilmington, MA)
contained primers ver2_f/r (Table 2-1) that flanked the insert site. PCR amplicons were
submitted for sequencing (Genewiz) and the insert sequence was mapped back to the B.
thetaiotaomicron genome using BLASTn. Primers for genotyping the galK locus on the E. coli
genome are listed in Table 2-1. The presence or absence of IS2 in galK was confirmed using
primers galK16_chk_f/r that flanked the expected insertion site in galK.
Name
Sequence (5’ -> 3’)
A_L
AGGACGCACTGACCGAATT
A_R
TTTATTTGATGCCTCTAGCACGC
ver2_f
TTTACTTTGCAGGGCTTCCC
ver2_r
ACTGAGCCTTTCGTTTTATTTGATG
galK16_chk_f
CCTGCCACTCACACCATTCAG
galK16_chk_r
TGGGCGCATCGAGGGA
GMV_amp_f
AACAAGCTTGATATCGAATTCCTGC
GMV_amp_r
GACGGTACCTTTCTCCTCTTTAATGA
Table 2-1 Primers used in the study.
38
2.3.7 DNA extraction and PCR amplification of inserts for Illumina
sequencing
DNA was extracted from collected samples in the in vitro experiment using the DNeasy
Blood & Tissue Kit (Qiagen). Inserts were PCR amplified using primers ver2_f/r (Table 2-1) in
KAPA HiFi HotStart Mix (Kapa Biosystems) and purified with Agencourt AMPure XP beads
(Beckman Coulter, Indianapolis, IN) at a beads:sample volumetric ratio of 0.5:1. The amplicons
were prepared for sequencing using the Nextera kit (Illumina, San Diego, CA). For all fecal
samples from the in vivo experiment, the QIAamp DNA Stool Mini Kit (Qiagen) was used.
Isolated DNA was digested with PspXI and AvrII enzymes (New England Biolabs) prior to
purification with QIAquick PCR Purification Kit (Qiagen) and subsequent PCR amplification
with primers A_L and A_R (Table 2-1) in KAPA HiFi HotStart Mix. The PCR reaction was
purified with AMPure beads at a beads:sample volumetric ratio of 0:5:1.
Initially, in our sequencing of the in vitro samples, we observed a high fraction (30-45%)
of reads mapping to the backbone vector and fewer reads (20%) mapping to the B.
thetaiotaomicron genome. Given the large (>3 kb) insert sizes of these libraries, traditional
amplification methods evidently over-amplify the smaller vector backbone (2 kb), thereby
overwhelming vectors containing actual genomic inserts. We therefore optimized the sample
preparation protocol by incorporating a double digestion strategy prior to PCR amplification of
the inserts (Figure 2-2). The two restriction sequences selected were the least common (of all
available sites on the plasmid) in the B. thetaiotaomicron genome. With this new protocol, in our
subsequent in vivo sequencing, we observed <4% of reads mapping to the backbone vector and
>90% of reads mapping to the B. thetaiotaomicron genome.
39
Figure 2-2 Double digestion and PCR protocol for sequencing.
(A) We used restriction sites PspXI and AvrII that flanked the insert site on the backbone vector
prior to PCR-amplification of the insert with primers A_L and A_R. These two enzymes had a
minimal number of restriction sites (29 for PspXI and 62 for AvrII) in the B. thetaiotaomicron
genome. The gel shows the result of library PCR with or without double digestion. Double digestion
appears to eliminate the dominating band corresponding to the backbone vector. (B) PCR
amplicons were prepared for sequencing by the Nextera kit and size selection. After trimming off
any backbone sequence, which would be present on end fragments, we mapped the reads back to
the B. thetaiotaomicron genome.
2.3.8 High-throughput sequencing and analysis of in vitro library selection
data
Samples were sequenced on the MiSeq (Illumina) instrument at the Molecular Biology
Core Facilities of the Dana-Farber Cancer Institute. Metrics for this sequencing run are provided
in Table 2-2. Due to the PCR amplification protocol prior to optimization (see previous section),
we observed large amounts of E. coli plasmid DNA in our sequencing reads. To maximize the
reads aligned to the B. thetaiotaomicron genome, we aggressively trimmed low quality bases and
removed sequences mapping to the E. coli genome or with length shorter than 20 nt. The
40
reference genome of B. thetaiotaomicron (NC_004663 and NC_004703) was downloaded from
the NCBI nucleotide database (http://www.ncbi.nlm.nih.gov/). Due to the aggressive
preprocessing of reads described, the length of trimmed sequences was shorter than 50 nt.
Therefore, Bowtie (235) was applied instead of Bowtie2 for higher sensitivity. Default
parameters were used for building a Bowtie index with the B. thetaiotaomicron chromosome and
plasmid sequences. Paired-end reads were aligned to the reference genome with parameter –X
300 using Bowtie. SAM files from the Bowtie alignment were converted to indexed and sorted
BAM files using SAMtools (Li, et al. 2009). Cuffdiff (236) was applied to test for differential
representation of genes (i.e., the library grown in rich medium at time 0 versus the library grown
in rich medium at Day 7, and the library grown in MC medium at time 0 versus the library grown
in MC medium at Day 6.
Media condition Timepoint (day) Paired raw reads Paired trimmed reads
input library
0
5675769
5671724
LB aerobic
7
8226863
8222571
MC anaerobic
6
6672316
6668006
Table 2-2 Summary of sequencing metrics for in vitro experiments.
Paired-end reads of 250 nt length were generated on the MiSeq instrument.
2.3.9 High-throughput sequencing and processing of in vivo selection data
B. thetaiotaomicron genomic DNA inserts were amplified from isolated E. coli plasmids
using our improved PCR protocol (Figure 2-2). After Nextera sequencing library preparation,
paired-end reads of 101 nt length were generated on the HiSeq 2500 (Illumina) instrument at the
Baylor College of Medicine Alkek Center for Metagenomics and Microbiome Research. Metrics
for this sequencing run are provided in Table 2-3. All reads passed quality control (base quality
>30) using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). To eliminate
plasmid DNA sequences in reads, the reads were trimmed using custom Perl scripts that removed
all flanking regions matching 15bp of the plasmid DNA on the 5’ and 3’ ends of insert fragment.
Reads less than 20bp after trimming were discarded, and the others were matched as pairs with
the forward read and reverse reads.
41
Sequencing reads were mapped onto the reference genome of B. thetaiotaomicron using
Bowtie2 (235). Default parameters were used for building the Bowtie2 index using the B.
thetaiotaomicron chromosome and plasmid sequences, and for aligning reads to the reference
sequence. SAM files generated from Bowtie2 alignment were converted to indexed and sorted
BAM files using SAMtools(237). In SAMtools, ‘mpileup’ with parameter ‘-B’ was used to
obtain the depth of coverage of the reference genome. Across all samples, the mean of the
mapped bases to the B. thetaiotaomicron genome was 1.17 x 109, with a minimum of 4.31 x 108
and maximum of 2.49 x 109 bases per sample.
42
Mouse
1
Timepoint (day) Paired raw reads
1.5
6863895
Paired trimmed reads
6859853
1
1
1.75
2.5
7759215
7365279
7751437
7362905
1
3
5818449
5818061
1
4
5531304
5530269
1
7
5662825
5662593
1
10
6698408
6697569
1
1
14
21
4723483
6687984
4723238
6687697
1
28
5839481
5839205
2
0.5
7228693
7225763
2
1.5
8918192
8916633
2
1.75
5888569
5887816
2
2
2.5
3
8785555
4342612
8784833
4342118
2
4
6837676
6835603
2
7
3967877
3967384
2
10
8601605
8601131
2
14
5849784
5849571
2
2
21
28
4556363
7655382
4556235
7655090
3
0.5
5010698
5005943
3
1.5
6155822
6155291
3
1.75
12620630
12619054
3
2.5
4684094
4683664
3
3
6101414
6100076
3
4
5942854
5942213
3
7
4172374
4171934
3
10
4801895
4801689
3
14
2164395
4812691
3
21
4812865
7188598
3
4
28
0.5
7188952
4533216
2164338
4526163
4
1.5
4334677
4331837
4
1.75
7468959
7466817
4
2.5
4985966
4984355
4
3
5103796
5102782
4
4
4
7
9074708
6749347
9073032
6748350
4
10
5542079
5541171
4
14
4411316
4410997
4
21
7574141
7573540
4
28
6262455
6262057
5
5
0.5
1.5
7060331
5588814
7051476
5583481
5
1.75
4776273
4771776
5
2.5
6594126
6590706
5
3
5617337
5616357
5
4
6090548
6089696
5
5
7
10
8025711
4246859
8024678
4246609
5
14
5401171
5400825
5
21
4771095
4770757
5
28
3923280
3922897
input library
replicate 1
7305414
7303389
input library
replicate 2
3852825
3849575
Table 2-3 Summary of sequencing metrics for in vivo experiments.
Of the 56 samples sequenced, two were the input library, 44 were from four mice
with 11 time-point stool collections, and 10 were from one mouse with a 10 timepoint collection. Paired-end reads of 101 nt length were generated on the HiSeq
instrument.
43
2.3.10 Statistical analyses of in vivo selection data
Analyses were performed using custom functions written in Matlab (MathWorks, Natick,
MA). The effective positional diversity (EPD), a genome-wide measure of the diversity of
library representation, was calculated using the formula:
Here, rti represents the fraction of reads at time t mapping to nucleotide i in a reference
sequence totaling P nucelotides (e.g., the Bt genome).
The time-averaged relative abundance (TA-RA), a gene-level measure of library
selection, was calculated using the formula:
Here, t1 and t2 denote the bounds of the time-interval of interest, and fg represents a
continuous-time function for gene g. The function fg was estimated as follows. We fit a cubic
smoothing spline, using the Matlab function csaps, applied to the log fold change in Fragments
Per Kilobase per Million mapped reads (FPKM) for gene g at each time-point t (i.e., the FPKM
value at time-point t divided by the FPKM value for the gene in the starting library). FPKM
values were generated using Cufflinks (236) with parameter --max-bundle-frags 40000000. The
smoothing spline was used to account for non-uniform temporal sampling and noise in the data.
The time-averaged normalized effective coverage (TA-NEC), a gene-level measure of
coverage, was calculated using the formula:
Here, lg denotes the length of gene g, and hg represents a continuous-time function for
gene g. The function hg was estimated as follows. We fit a cubic smoothing spline, using the
Matlab function csaps, applied to the effective coverage, EC(g,t) for the gene at each time-point:
44
Here, sg denotes the start of the gene.
To detect genes with significantly higher than expected selection, we performed a onesided t-test on Box-Cox transformed TA-RA and TA-NEC values, and corrected for multiple
hypothesis testing using the Matlab function mafdr. To estimate the relevant null hypotheses for
the t-tests, while taking into account possible biases due to differential representation of genes in
the input library, we used a robust regression algorithm (Matlab function robustfit) in which the
input library value served as the independent variable, and the TA-RA or TA-NEC value served
as the dependent variable.
2.3.11 Whole genome sequencing of isolated clones from in vivo selection
Whole genome sequencing of E. coli recipient isolates from seven Day 7 clones, six Day
28 clones, and two Day 28 luciferase control clones was performed on the MiSeq (Illumina)
instrument after Nextera (Illumina) sequencing library preparation at the Molecular Biology
Core Facilities of the Dana-Farber Cancer Institute. Metrics for this sequencing run are provided
in Table 2-4.
The raw data was processed with Millstone (http://churchlab.github.io/millstone), which
combines BWA alignment, GATK for BAM realignment and cleanup, and SnpEff for variant
effect prediction. Reads were aligned to E. coli K-12 DH10B as well as MG1655 to identify any
variants not in common with the starting library strain NEB Turbo. The average genome
coverage of each sequenced strain ranged from 20 to 140X. Alignments were also performed
against the F plasmid (which is present in the starting recipient strain) and a library plasmid with
the expected insert (as confirmed by Sanger-sequencing of individual clones).
45
Sample
Paired raw reads
NEB Turbo control
357506
Day 7 Mouse 1 clone 3
376194
Day 7 Mouse 1 clone 1
785257
Day 7 Mouse 2 clone 5
1291948
Day 7 Mouse 3 clone 1
1257935
Day 7 Mouse 4 clone 4
1133157
Day 7 Mouse 5 clone 2
1049238
Day 7 Mouse 5 clone 4
170610
Day 28 Mouse 1 clone 1
1112924
Day 28 Mouse 2 clone 1
574731
Day 28 Mouse 3 clone 1
543707
Day 28 Mouse 4 clone 1
618713
Day 28 Mouse 5 clone 1
169684
Day 28 Mouse 5 clone 4
687173
Day 28 Mouse 7 clone 1 lux control
1194434
Day 28 Mouse 10 clone 2 lux control
931238
Table 2-4 Summary of metrics for whole genome sequencing of E. coli strains.
Paired-end reads of 300 nt length were generated on the MiSeq instrument.
2.3.12 Growth assays
Cells were pre-conditioned by growth in minimal media (M9) supplemented with 0.2%
glucose. Then, 1 μL of the culture was inoculated into a final volume of 200 μL of M9
supplemented with 0.2%, unless otherwise noted, of a sole carbon source, such as glucose,
lactose, galactose, or sucrose. When needed, MacConkey base agar with a final concentration of
1% lactose or galactose was also used to characterize lactose or galactose utilization.
46
2.4
Results
2.4.1 Library construction and characterization
A 2.2 kb E. coli expression vector, GMV1c, was constructed to include the strong
constitutive promoter pL and a ribosomal binding site upstream of the cloning site for input
DNA fragments (Figure 2-1). We cloned in 2-5 kb fragments of donor genomic DNA from Bt,
and generated a library of ~100,000 members, corresponding to >50X coverage of the donor
genome. We sequenced the library on the Illumina HiSeq instrument to confirm sufficient
coverage of the Bt genome (Figure 2-3 and Figure 2-4A). The distribution of member insert sizes
in the input library was verified to be centered around 2-3 kb (Figure 2-4B), a size range
allowing for the full-length representation of almost all Bt genes.
Figure 2-3 Technical reproducibility of library sequencing protocol.
The input library was prepared in duplicate for deep sequencing
using our double digestion and PCR strategy. The coverage for each
gene in replicate 1 is plotted against that of replicate 2.
47
Figure 2-4 Input library characterization.
(A) Even coverage of the Bacteroides thetaiotaomicron genome. The blue and purple lines represent
per-base coverage values for the chromosome and native B. thetaiotaomicron p5482 plasmid,
respectively. The histogram (top right) shows the distribution of genes by their coverage
(normalized to gene length). (B) Insert size distribution of library. (C) Plasmid retention calculated
by comparing number of colonies on LB vs. LB+carbenicillin plates from in vitro passaging
experiments in aerobic LB or anaerobic mouse chow (MC) filtrate.
48
2.4.2 In vitro stability and selection by media condition
To determine vector stability in vitro, we performed serial batch passaging of cells
carrying GMV1c every one to two days over two weeks in two media conditions: aerobic Luria
broth (LB) and anaerobic mouse-chow filtrate (MC). We expected the MC medium and
anaerobic conditions to better reflect aspects of the nutritional content and oxygenation status in
the mouse gut than the rich LB medium in aerobic conditions. In both conditions, the vector was
maintained in over 80% of library members without antibiotic selection throughout two weeks of
in vitro passaging (~70 generations) (Figure 2-4C), suggesting general stability of the medium
copy vector (~40 copies per cell). Clones harboring the empty vector (i.e., plasmid with no Bt
insert) were the most fit library member: in both LB and MC conditions, these clones initially
constituted 70% of the library and increased to 90% by the end of two weeks (Figure 2-5), albeit
at a slower rate in anaerobic MC (Figure 2-6).
Figure 2-5 Insert distribution over time in in vitro selection.
Distribution of inserts in the initial library and at various time points in the
(A) in vitro and (B) in vivo experiments. Multiple colonies were picked from
each mouse and the total insert sizes were tabulated for each time point.
49
Figure 2-6 Insert distribution over time in in vivo selection.
Distribution of inserts in the initial library and at various time
points in the (A) in vitro and (B) in vivo experiments. Multiple
colonies were picked from each mouse and the total insert sizes
were tabulated for each time point.
To identify Bt genes with differential in vitro selection in LB and MC conditions relative
to the input library, we isolated DNA from Day 0 and Day 6 or 7 cultures, amplified the inserts
by PCR for deep sequencing on the Illumina MiSeq platform and used computational methods to
determine donor genes that were differentially enriched or depleted. In each condition, we found
a number of significantly enriched Bt genes (Table 2-5). At Day 7 in aerobic LB, enriched genes
included metabolic enzymes, such as chitobiase (BT_0865), which degrades chitin, and stress
response proteins, such as glycine betaine/L-proline transport system permease (BT_1750),
which is involved in the import of osmoprotectants glycine betaine or proline that mitigate
effects of high osmolarity (238). At Day 6 in anaerobic MC, a different set of genes was
significantly enriched, particularly the locus consisting of endo-1,4-beta-xylanase (BT_0369),
galactokinase (BT_0370), glucose/galactose transporter (BT_0371), and aldose 1-epimerase
(BT_0372). These results highlight that our functional metagenomics approach is able to enrich
for likely bioactive donor genes that improve fitness of the recipient cells in in vitro passaging
conditions. Enolase (BT_4572), the only common hit among annotated genes in both media
conditions, was found to be depleted relative to the input library. This enzyme catalyzes the
penultimate step of glycolysis, and its overexpression may be toxic in E. coli (239).
50
Gene
Gene product
log2(fold
change)
q value
Enrichment at Day 6 in anaerobic MC passaging
BT_0370
galactokinase
3.51
8.30E-05
BT_0372
aldose 1-epimerase
3.21
8.30E-05
BT_0371
glucose/galactose transporter
3.59
1.53E-04
BT_0478
hypothetical protein
3.59
3.70E-04
BT_0369
endo-1,4-beta-xylanase D
2.51
2.63E-03
Enrichment at Day 7 in aerobic LB passaging
BT_1750
glycine betaine/L-proline transport system permease
9.18
0.00E+00
BT_2055
biopolymer transport protein
3.64
3.03E-05
BT_4358
hypothetical protein
2.84
6.31E-03
BT_1922
N-acetylmuramoyl-L-alanine amidase
2.62
7.24E-03
BT_0659
hypothetical protein
2.59
7.24E-03
BT_4333
hypothetical protein
2.57
7.84E-03
BT_2054
hypothetical protein
2.90
8.97E-03
BT_0757
beta-galactosidase
3.00
1.44E-02
BT_0660
hypothetical protein
2.36
1.48E-02
BT_2732
hypothetical protein
2.48
1.48E-02
BT_2843
integrase
3.34
1.51E-02
BT_3927
hypothetical protein
2.43
1.56E-02
BT_3612
FKBP-type peptidylprolyl isomerase
2.23
2.36E-02
BT_3821
5,10-methylenetetrahydrofolate reductase
2.31
2.36E-02
BT_0865
chitobiase
2.39
2.36E-02
BT_0973
hypothetical protein
2.21
2.36E-02
BT_0676
N-acetylglucosamine-6-phosphate deacetylase
2.17
2.59E-02
BT_2408
LuxR family transcriptional regulator
2.22
2.59E-02
BT_1038
hypothetical protein
2.24
2.59E-02
BT_3985
hypothetical protein
2.16
2.59E-02
BT_2917
hypothetical protein
2.16
2.80E-02
51
BT_0972
oxidoreductase
2.13
3.17E-02
BT_1923
O-acetylhomoserine (thiol)-lyase
2.43
3.35E-02
BT_1006
nitroreductase
2.10
3.64E-02
BT_1004
hypothetical protein
2.11
3.94E-02
BT_4544
transposase
1.99
4.64E-02
BT_2379
hypothetical protein
5.81
4.64E-02
BT_0974
hypothetical protein
2.08
4.64E-02
BT_0011
hypothetical protein
2.00
4.64E-02
BT_0510
heme biosynthesis protein
2.22
4.64E-02
Depletion at Day 6 in anaerobic MC passaging
BT_1771
cell surface protein
-3.25
3.70E-04
BT_4572
phosphopyruvate hydratase (enolase)
-3.48
1.32E-03
BT_2959
hypothetical protein
-2.97
5.18E-03
BT_3089
hypothetical protein
-2.73
7.92E-03
BT_3528
hypothetical protein
-2.95
1.69E-02
BT_2051
hypothetical protein
-4.21
1.69E-02
BT_3577
hypothetical protein
-2.16
2.01E-02
Depletion at Day 7 in aerobic LB passaging
BT_4572
phosphopyruvate hydratase (enolase)
-3.17
2.78E-03
BT_1538
hemagglutinin
-3.46
7.24E-03
BT_3395
acetylglutamate kinase
-2.42
1.48E-02
BT_1817
RNA polymerase ECF-type sigma factor
-2.19
2.36E-02
BT_1818
hypothetical protein
-2.54
2.36E-02
BT_2961
hypothetical protein
-2.32
2.59E-02
BT_4571
RNA polymerase ECF-type sigma factor
-3.91
3.31E-02
BT_2959
hypothetical protein
-2.18
3.45E-02
Table 2-5 Bt genes significantly enriched or depleted at Day 6 or 7 in vitro.
Statistically significant genes (q < 0.05) enriched (blue) or depleted (red) at Day 6 or 7 relative
to Day 0 in the in vitro passaging experiment are listed for the anaerobic mouse chow (MC) and
aerobic Luria broth (LB) conditions.
52
2.4.3 In vivo library selection in germ-free mice
To investigate in vivo gene selection in our library, we inoculated two cohorts of
C57BL/6 male 6-8 week old germ-free mice (n=5 per group) and maintained the mice for 28
days under gnotobiotic conditions. One cohort was colonized with our library; the other cohort
with a control GMV1c vector carrying the 5.9 kb luciferase operon (luxCDABE from
Photorhabdus luminescens, Winson et al, 1998). Fecal pellets were collected on days 0.5, 0.75,
1.5, 1.75, 2.5, 3, 4, 7, 10, 14, 21, 25, and 28 after inoculation.
To determine in vivo vector stability, we plated fecal pellets on LB, on which E. coli
either with or without vectors would grow, and on LB+carbenicillin, selective for E. coli
harboring our vectors. Strains carrying the luciferase vector dropped by ~100,000-fold by Day
28 compared to the earliest plated time-point (18 hours), presumably due to negative selective
pressures from the energy consumption of the vector-borne luciferase in E. coli (Figure 2-7A). In
contrast, our library was well-maintained in vivo throughout the 28 days of the experiment,
suggesting at least minimal fitness cost to maintain the Bt insert library. Furthermore, unlike in
the in vitro experiment, where clones containing the empty vector were enriched over time, these
clones were virtually absent by the end of the in vivo experiments (Figure 2-6), suggesting
positive selection had taken place.
Figure 2-7 In vivo selection experiments.
(A) Plasmid retention calculated by comparing number of colonies on LB vs. LB+carbenicillin
plates from mouse fecal samples. n = 5 mice; error bars = standard deviation
(B) Effective positional coverage across the entire Bt genome for each mouse, begins with essentially
even coverage of the Bt genome of ~6 Mb, but drops rapidly over the experimental time-course,
representative of selection at specific loci.
53
2.4.4 Characterization of in vivo library population dynamics
To characterize the entire in vivo selected library over time, we extracted DNA from
collected stool samples, PCR amplified the donor inserts, prepared sequencing libraries of the
amplicons, sequenced libraries on the Illumina HiSeq 2500 instrument, and used computational
techniques to detect selected genes in the donor genome that were uniformly covered over time
by more than the expected background number of sequencing reads. Each sample resulted in ~7
million 101 nt paired-end reads (Table 2-3) that were mapped back to the donor genome (Figure
2-2). We also employed Sanger-sequencing of vectors from clones directly isolated from stool
samples to confirm deep-sequencing results and obtain insights into the structure of full-length
inserts.
To obtain a genome-wide view of library selection over time and across the different
mice, we calculated an information theoretic measure, termed effective positional diversity,
similar to that commonly used to quantify population diversity in macroscopic and microscopic
ecology studies (241, 242) (Figure 2-7B). This measure, equal to the exponentiated Shannon
entropy over all positions in the Bt genome, reflects how many positions in the donor genome
are evenly represented in the population. Effective positional diversity values of the initial library
were ~6 Mb, indicating essentially even coverage of the entire Bt genome. From Day 1.75 to
Day 7 and continuing until the end of the experiment at Day 28, there was a rapid decline in
effective positional diversity, which signifies expansion in the population of clones harboring
inserts at a limited number of Bt genomic loci.
To explore the kinetics of gene selection in vivo, we plotted the percentage of sequencing
reads mapped to genes in the Bt genome over time, and examined genes constituting >0.2% of
total reads. As noted, prior to inoculation, the read coverage was even over the entire Bt genome
and corresponded to <0.2% per gene. A visualization of Bt gene selection for each mouse is
shown in Figure 2-8. By 36 hours post-inoculation, five genes, alpha-L-arabinofuranosidase,
endo-1,4-beta-xylanase, galactokinase, glucose/galactose transporter, and aldose 1-epimerase
(BT_0368 to BT_0372), comprised over half of the reads mapped. At Day 2.5, glucose/galactose
transporter (BT_1758) and glycoside hydrolase (BT_1759) became noticeable and continued to
increase until they saturated all reads at Day 14. Then, fructokinase (BT_1757) emerged and
stabilized at around 6% of the reads throughout the remaining two weeks of the experiment.
These observations are generally consistent across all five mice, though the selection kinetics
varied slightly (Figure 2-8). For example, the transition from galactokinase and
glucose/galactose transporter (BT_0370 and BT_0371) to glycoside hydrolase (BT_1759)
occurred four days earlier in Mouse 5 than in Mouse 2, and the emergence of fructokinase
(BT_1757) was detectable only in Mice 2, 4, and 5.
54
Figure 2-8 Distribution of mapped bases to each Bt gene by mouse.
For each mouse and time point, ~109 sequenced bases were mapped to the B. thetaiotaomicron
genome. Of those mapped bases, the percentage mapping to each gene is shown. Genes with < 0.2%
are grouped together (dark gray bars). Specific genes >= 0.2% that were present in clones in one
mouse but not in the others are indicated in smaller font and colored differently.
55
In terms of functional groups rather than individual genes, of the 51.4% Bt genes with
COG annotations, those related to carbohydrate transport and metabolism comprised 10% of the
input library. Averaged across the five mice, these carbohydrate transport and metabolism genes
increased to 25% of reads on Day 0.5, 72% on Day 1.5, and essentially 100% by Day 7 (Figure
2-9), suggesting the importance of carbohydrate transport and metabolism in in vivo fitness.
Figure 2-9 COG functional categories of bases mapped to the entire Bt genome averaged across the
five mice.
56
To rigorously determine the Bt genes that are differentially represented in the population
over time and to localize putatively selected regions to specific genes, we applied information
theoretic and statistical techniques for longitudinal analysis (243). In our analyses, transient
dominance of clones in vivo is of particular interest as different genes may confer fitness
advantages at distinct stages of host colonization. Further, our experiments capture competition
among ~100,000 strains harboring distinct genetic fragments, rather than traditional binary
competition experiments. Thus, we are interested in not only clones harboring Bt fragments that
show an increase over time in relative abundance, but also those clones that show a significantly
slower rate of depletion than other clones. To methodically detect these effects, for every Bt gene,
we computed two measures: (1) time-averaged relative abundance (TA-RA), and (2) timeaveraged normalized effective coverage (TA-NEC). The TA-RA value is conceptually similar to
a time-integrated pharmacological dose value (244); in our analysis, it represents the average
“dose” of a particular donor gene, relative to all other donor genes present in vivo over a period
of time. The TA-NEC value quantifies the fraction of the gene that is effectively covered by
reads over a period of time. These measures are important to evaluate in tandem, since bystander
genetic loci may be differentially abundant in clones (i.e., high TA-RA values) simply because
they are contiguous with genes under selection; however, these loci are likely to be detectable as
spurious (i.e., low TA-NEC values) because they will often include only fragments of genes.
2.4.5 Genes showing transient selection during early gut colonization
We found 13 Bt genes during the early stage of gut colonization (up to Day 4) with
significantly larger than expected TA-RA and TA-NEC values (q-values < 0.05; Table 2-6).
These genes include those coding for enzymes involved in synthesis of extracellular capsular
polysaccharides and lipopolysaccharides (LPS), specifically D-glycero-alpha-D-manno-heptose1,7-bisphosphate 7-phosphatase (gmhB) (BT_0477) and dTDP-4-dehydrorhamnose reductase
(rfbD; rmlD) (BT_1730). There are two biosynthesis pathways of nucleotide-activated glyceromanno-heptose that result in either L-β-D-heptose or D-α-D-heptose, which serve as precursors
or subunits in LPS, S-layer glycoproteins, and capsular polysaccharides (245). The E. coli GmhB
is critical for complete synthesis of the LPS core (246). The selection for Bt gmhB could allow E.
coli to expand its extracellular glycoprotein display, since E. coli GmhB is highly selective for βanomers while Bt GmhB prefers α-anomers during hydrolysis of D-glycero-D-manno-heptose
1β,7-bisphosphate (247). BT_1730 (rfbD; rmlD) is involved in dTDP-rhamnose biosynthesis
involved in production of O-antigen, a repetitive glycan polymer in LPS, and potentially other
cell-membrane components. Deletion of rmlD in Vibrio cholerae results in a severe defect in
colonization of an infant mouse model (248), and uropathogenic E. coli lacking functional RmlD
lose serum resistance (249). Thus, expressing Bt rmlD could allow the recipient E. coli to alter
its antigenicity or resistance to host factors that would impede its initial colonization of the
mammalian gut.
57
TA-RA
q-value
TA-NEC
q-value
3.06E-02
4.08E-04
1.14E-03
(3.25E-03)
1.14E-03
(3.14E-03)
5.94E-06
(6.95E-09)
3.50E-02
(4.21E-05)
1.67E-02
1.32E-02
BT_0478 hypothetical protein
1.77E-03
2.47E-02
BT_1510 hypothetical protein
3.86E-02
1.10E-03
BT_1511 outer membrane protein OmpA
4.38E-02
7.33E-04
3.86E-02
3.45E-04
BT_1731 hypothetical protein
4.38E-02
8.80E-03
BT_1757 fructokinase
1.58E-02
2.50E-03
BT_1759 glycoside hydrolase
1.19E-02
(2.48E-07)
2.58E-04
(1.21E-09)
BT_1771 cell surface protein
4.38E-02
4.00E-03
BT_4265 GMP synthase (guaA)
3.86E-02
3.33E-02
Gene
Annotation
BT_0297 outer membrane lipoprotein SilC
BT_0370 galactokinase
BT_0371 glucose/galactose transporter
BT_0477
BT_1730
D-glycero-alpha-D-manno-heptose-1,7bisphosphate 7-phosphatase (gmhB)
dTDP-4-dehydrorhamnose reductase
(rfbD; rmlD)
Table 2-6 Statistical testing of in vivo selection of Bt genes.
Genes demonstrating significant in vivo selection profiles were determined via statistical
testing of time-averaged relative abundance (TA-RA) and time-averaged normalized
effective coverage (TA-NEC) values up to either Day 4 or Day 28 of host colonization.
Genes showing significant selection up to Day 4 are in white. Genes showing significant
selection up to both Day 4 and Day 28 are highlighted in orange. q-values for Day 28 are
listed in parentheses.
58
Several other genes with membrane-associated functions also showed increased selection
at Day 4, including outer membrane lipoprotein SilC (BT_0297), cell surface protein (BT_1771),
and outer membrane protein OmpA (BT_1511). These genes could confer increased capabilities
for E. coli to attach to the mucosal surface of the mammalian GI tract, or increased adaptations to
the gut chemical environment. For instance, Bacteroides fragilis lacking OmpA are more
sensitive to SDS, high salt, and oxygen exposure (250). In Bacteroides vulgatus, OmpA
additionally plays a role in intestinal adherence (251), and in Klebsiella pneumoniae, activates
macrophages (252).
Since nucleotide pools are tightly controlled in E. coli (253), the selection for Bt GMP
synthase guaA (BT_4265) may substantially affect intracellular guanine concentration,
translation regulation, and cell signaling. Inhibiting GMP synthase induces stationary phase
genes in Bacillus subtilis (254), and nucleotide concentrations drop when E. coli transition from
growth to stationary phase (255). These observations suggest that a copy of heterologous guaA
could enable escape of native tight regulation of the guanine pool to prolong the cell’s
exponential growth phase. Moreover, extra GMP synthase may further protect E. coli from
incorporating mutagenic deaminated nucleobases that would interfere with RNA function and
gene expression (256).
2.4.6 Genes showing long-term selection during gut colonization
We found three Bt genes over the entire period of colonization (up to Day 28) with
significantly larger than expected TA-RA and TA-NEC values (q-values < 0.05; Table 1); these
genes also showed significant selection during early colonization (up to Day 4). All three genes
are involved in sugar metabolism and transport, suggesting they may act to unlock more nutrient
resources for E. coli in the gut. We performed in vitro experiments, described below, to further
characterize the functions of these strongly selected loci, centered around a Bt glycoside
hydrolase (BT_1759) and galactokinase (BT_0370).
59
2.4.6.1 Glycoside hydrolase (BT_1759)
From Day 1.5 to Day 3 in the high-throughput sequencing data, we observed sharply
positive selection of glycoside hydrolase (BT_1759), which stabilized and continued to be
strongly selected for from Day 4 to Day 28 across all mice (Figure 2-10). We confirmed these
results with Sanger sequencing, which additionally allowed us to identify exact junctions and
directionality of isolated inserts. In clones from Days 7, 14, and 28, we observed the primary
selected insert to be 2.5 kb in length, beginning four nucleotides after the annotated glycoside
hydrolase (BT_1759) start codon, and ending about one-third of the way into the downstream
gene (glucose/galactose transporter). Notably, we also detected other inserts containing different
5’ truncated versions of the glycoside hydrolase in the late time points, both in our highthroughput and Sanger sequencing data (Figure 2-11).
Figure 2-10 BT_1759 glycoside hydrolase selection kinetics.
Graphs show Fragments Per Kilobase Mapped (FPKM) fold change and normalized effective
coverage of genes BT_1757, BT_1758, and BT_1759. m1-5 = Mouse 1-5.
60
Figure 2-11 BT_1759 glycoside hydrolase read mapping profile.
The chart shows reads to each base in the region with deep sequencing and Sanger sequencing of
isolated clones (below, length of inserts are to scale to the gene map). Read values are the mean
across five mice and were normalized to 1 billion mapped bases per run to compare across time
points. Sanger sequencing was performed on ten clones per mouse at Day 7 and eight clones per
mouse at Day 28. nt = nucleotides.
61
Sonnenburg et al. previously demonstrated that periplasmic BT_1759 in Bt hydrolyzes
smaller fructooligosaccharides and sucrose (257). To functionally characterize BT_1759 and
surrounding genes when heterologously expressed in E. coli, we cloned the CDS of each into the
backbone vector and transformed it into the starting E. coli strain. None of the full-length genes
conferred growth in M9 minimal media with sucrose as the sole carbon source (Figure 2-12).
However, clones isolated from mice on Day 28 were able to metabolize sucrose. Furthermore,
retransformation of the DNA vectors from these clones into the starting E. coli strain also
conferred growth on sucrose, indicating that the phenotype was plasmid-borne. Interestingly,
sucrose utilization was enabled when we reconstituted the 4 nt truncation found in many of the
Day 7 and Day 28 Sanger-sequenced clones into the starting E. coli strain. These results suggest
that the truncation allows for appropriate processing of the signal sequence to express and
localize the Bt enzyme in the periplasmic space of E. coli, where sucrose is capable of entering
by diffusion.
Figure 2-12 BT_1759 glycoside hydrolase functional characterization in sucrose media.
Three sets of strains were studied in minimal media with sucrose as the sole carbon
source: 1) starting E. coli strains transformed with the CDS of each gene cloned into the
backbone vector, 2) E. coli clones directly isolated from stool samples, and 3) starting E.
coli strains re-transformed with individual plasmids isolated from stool samples. All
clones isolated at Day 28 carried the BT_1759 locus. Lines represent the mean.
62
2.4.6.2 Galactokinase (BT_0370), glucose/galactose transporter (BT_0371), and native
galactokinase reversion
In contrast to the selection profile of glycoside hydrolase (BT_1759), the galactokinase
(BT_0370) and glucose/galactose transporter (BT_0371) exhibited an earlier increase in
abundance that peaked at Day 2.5 and gradually declined over the remainder of the experiment
(Figure 2-13A). We observed a similar trend in Day 7 clones by Sanger sequencing, with no
clones containing BT_0370 or BT_0371 present at Day 28 (Figure 2-13B).
We confirmed that individually cloned BT_0370 and BT_0371 genes confer galactose
utilization in the starting E. coli strain when grown using M9 minimal media supplemented with
0.5% galactose as the sole carbon source (Figure 2-13C). To our surprise, E. coli isolated from
mouse stool at later time points were able to grow on galactose even though they carried
plasmids with glycoside hydrolase (BT_1759), and not the Bt galactose utilization genes
(BT_0370 and BT_0371). However, strains retransformed with BT_1759 were unable to grow
on galactose, suggesting that the stool-isolated strains gained the capability to use galactose
through mutations independent of the expression plasmid, namely in the recipient E. coli genome.
After confirmation that our starting E. coli strain was galK- due to the presence of an insertion
sequence (IS2), we hypothesized that stool isolates reverted to galK+ via loss of IS2. In stoolisolated clones from Days 7, 14, and 28, we found that the galK reversion occurred after Day 7
and was found in >75% of clones in four of five mice at Day 14 (Figure 2-13D). Interestingly, E.
coli harboring the insert library exhibited accelerated galK reversion in the mouse gut; in the
luciferase control mice, there was an overall reversion rate of only 50% by Day 28, as opposed to
100% in the mice that had been inoculated with the Bt library. The genomic galK reversion by
~Day 14 suggests that there is early selection for Bt galactokinase (BT_0370), but this foreign
gene is subsequently lost as the recipient E. coli regain native galactokinase activity, which
seems to have a fitness advantage over the heterologously expressed Bt galactokinase gene.
63
Figure 2-13 BT_0370 galactokinase and BT_371 glucose/galactose transporter.
(A) Selection kinetics by Fragments Per Kilobase Mapped (FPKM) fold change and normalized
effective coverage of genes BT_0368, BT_0369, BT_0370, BT_0371, and BT_0372.
(B) Mapped reads to each base in the region with deep sequencing and Sanger sequencing of
isolated clones (below). Read values are the mean across five mice and were normalized to 1 billion
mapped bases per run to compare across time points. Isolation of individual clones allowed for
insert size profiling at Day 7. Screened isolates from Day 28 did not reveal any galactokinase inserts.
(C) Functional characterization in minimal media with galactose as the sole carbon source. Three
sets of strains were studied: 1) starting E. coli strains transformed with the CDS of each gene
cloned into the backbone vector, 2) E. coli clones directly isolated from stool samples, and 3)
starting E. coli strains re-transformed with individual plasmids isolated from stool samples. All
clones isolated at Day 28 carried the BT_1759 locus. Lines represent the mean.
(D) Genotyping of the background E. coli genome at the galK locus. 20 clones isolated from mice
inoculated with the library at each of the indicated time-points were screened , while 30 clones
isolated from the lux control mice at Day 28 were screened.
64
2.4.7 In vivo genomic stability of E. coli recipient strain
Given the observed genomic galK reversion, we investigated whether other changes
occurred in the E. coli genome over the course of our in vivo experiments. Genomic stability of
bacterial cells in the gastrointestinal tract in vivo has not been characterized in great detail and
microbial mutation rates in vivo are also not well-characterized. We performed whole genome
sequencing of 13 E. coli isolates from stool of mice inoculated with the library and two E. coli
isolates from stool of mice inoculated with the control luciferase construct. Of the isolates from
mice that were inoculated with the library, seven were from Day 7 samples containing either
BT_1759 or BT_0370 inserts and six were from Day 28 samples containing BT_1759 inserts. In
addition to searching for variants in the E. coli genome, we also looked for variants on the library
plasmid with the known insert locus, and the F plasmid, which was present in the starting E. coli
strain. Overall, we found single nucleotide variants (SNVs) in only three of the 15 isolates (Table
2-7).
One of the isolates from a luciferase control mouse harbored three mutations. One SNV
was in the coding sequence of adenylate cyclase cyaA, while the other two SNVs were in
intergenic regions, between tRNAs lysW and valZ, and between traJ and traY on the F plasmid.
The functional effects of these SNVs, if any, are unclear. The operon structure of the tRNA
region may be lysT-valT-lysW-valZ-lysYZQ (258), or valZ-lysY could be a separate operon as
predicted in EcoCyc (259), in which case the SNV could affect transcription of the downstream
tRNAs. As for the traY promoter variant, the -35 hexamer has been documented to be TTTACC
(260). The SNV T>C changes it to CTTACC, which could weaken the promoter to decrease
expression of TraY, a DNA-binding protein involved in initiation of DNA transfer during
conjugation.
One E. coli isolate from a library-inoculated mouse had a genomic change that conferred
increased growth on galactose. Isolate 1 from Mouse 1 on Day 28 had a mutation in the lactose /
melibiose:H+ symporter, lacY (F27S), which is a missense mutation in the first transmembrane
region (261). We did not observe phenotypic differences on MacConkey-lactose plates, since the
E. coli recipient strain has a deletion in lacZ, and thus all of our isolates were Lac-. However, the
lacY (F27S) mutant reached a higher density in M9 galactose compared to other Day 28 isolates,
which also carried the same plasmid-borne Bt glycoside hydrolase (Figure 2-14A). This clone
also grew to a greater density than E. coli recipient strains in which we had cloned the Bt
galactokinase operon (BT_0370-BT_0372) (Figure 2-14B). The lacY transporter can transport
galactose in addition to lactose, and lacY mutants have been shown previously to confer faster
growth of E. coli MG1655 on galactose (262).
65
Sample
NEB
Turbo
control
Day 7
Mouse 1
clone 3
Day 7
Mouse 1
clone 1
Day 7
Mouse 2
clone 5
Day 7
Mouse 3
clone 1
Day 7
Mouse 4
clone 4
Day 7
Mouse 5
clone 2
Day 7
Mouse 5
clone 4
Day 28
Mouse 1
clone 1
Day 28
Mouse 2
clone 1
Day 28
Mouse 3
clone 1
Day 28
Mouse 4
clone 1
66
Insert locus
& size (kb)
Genomic
galK+/-
-
galK-
BT_0370
(4.0)
galK-
BT_1759
(2.5)
galK-
BT_0370
(4.3)
galK-
BT_1759
(3.1)
galK-
BT_0370
(4.1)
galK-
BT_1759
(2.5)
galK+
BT_1759
(3.1)
galK-
BT_1759
(2.5)
galK+
BT_1759
(2.5)
galK+
BT_1759
(2.5)
galK+
BT_1759
(2.5)
galK+
Variant position on E.
coli genome
Variant impact &
coverage
SNV 2976657 G>T
galR (R20L)
[34/34 reads]
SNV 363100 A>G
lacY (F27S)
[173/173 reads]
Day 28
Mouse 5
clone 1
Day 28
Mouse 5
clone 4
Day 28
Mouse 7
clone 1
lux
control
BT_1759
(2.5)
galK+
BT_1759
(4.2)
galK+
-
galK+
1. SNV 3991675 G>A
Day 28
Mouse 10
clone 2
lux
control
-
galK-
1. cyaA (G175S)
[122/122 reads]
2. SNV 780994 G>A
2. intergenic, between
lysW and valZ
[108/108 reads]
3. F plasmid SNV 67772 3. traY promoter (-35)
T>C
[229/229 reads]
Table 2-7 Genetic variants in mouse-isolated clones identified by whole genome sequencing.
Remarkably, we found an interaction between the E. coli genome and a heterologously
expressed Bt gene. Isolate 3 from Mouse 1 on Day 7 had an SNV in the galactose repressor, galR
(R20L), in its DNA binding domain (263). E. coli GalR binds operator sequences upstream of
the galETK operon (264), and the amino acid substitution of arginine for leucine could be
disruptive to binding. Using MacConkey-galactose plates, we found that the galR (R20L) isolate
was Gal+, whereas a similar Day 7 clone, which also had a genomic galK- genotype and a Bt
galactokinase (BT_0370) insert but no galR SNV, exhibited a Gal- phenotype. Since the
BT_0370 inserts in the Day 7 clones were not identical (Figure 2-14B), we re-transformed the
plasmids into the starting E. coli strain to confirm the phenotype and rule out effects from an
underlying chromosomal galR mutation. In M9 galactose medium, the galR (R20L) mutant grew
to a higher cell density than a wild-type galR strain with the same Bt galactokinase plasmid
(Figure 2-14C). These findings indicate that the E. coli genome had co-evolved with the in vivo
selection of plasmids carrying Bt genes for galactose utilization.
67
We found no mutations in the library plasmids or Bt genes, and, aside from loss of the
IS2 element in the galK gene, all other IS elements were intact on the E. coli genome. In
aggregate, these small numbers of variants (~10-8 mutations per bp per day) in the E. coli
recipient strain suggest that outside of genetic loci with selective pressures exerted upon them,
the organism remained genetically stable in the mammalian gut over the course of our
experiment.
Figure 2-14 Growth characterization of clones with genomic SNVs.
Growth curves over 42 hours at 37oC in M9 with 0.2% galactose and carbenicillin of (A)
mouse-isolated clones from Day 28 and (B) BT_0369, BT_0370, BT_0371, BT_0372, and
BT_0370-BT_0372 cloned into the starting recipient E. coli strain. The mean of four
replicates is plotted in filled circles; error bars represent the standard deviation. (C)
Endpoint optical density after 96 hours of growth. Two mouse-isolated strains with the
BT_0370 insert were compared to isogenic strains transformed with those plasmids (4.0
or 4.3 kb insert). The strain with the galR SNV is shown in red. Lines represent the mean.
68
2.5
Discussion
We have demonstrated the use of TFUMseq for high-throughput in vivo screening of
genetic fragments from an entire donor genome from a commensal microbe to increase the
fitness of a phylogenetically distant bacterial species in the mammalian gut. To our knowledge,
this is the first demonstration of temporal functional metagenomics using shotgun libraries
applied to the in vivo mammalian gut environment. Our findings attest to the value of a timeseries approach, as the shifts in population dynamics of clones harboring different gene
fragments would not have been discovered if we had only obtained endpoint data. Further, we
introduced computational methods using information theoretic measures and statistical
longitudinal analysis techniques that allowed us to identify and localize significant selection of
donor genes over time.
In this demonstration of the TFUMseq approach using an E. coli plasmid library of Bt
genes, we uncovered sequential selection of clones with different carbohydrate utilization genes
– first for galactose and then for sucrose metabolism. Galactose plays a substantial role in
selection in our experiment, as all three of the observed E. coli genomic mutations (in galK, lacY,
and galR) affected galactose utilization, and we observed selection for Bt galactokinase
(BT_0370) and glucose/galactose transporter (BT_0371) in vivo. Galactose is a component of the
hemi-cellulose that makes up part of the 15.2% neutral detergent fiber in mouse chow, although
galactose composition was not explicitly provided by the manufacturer. Galactose is also a
component of mammalian mucin in the GI tract (265). However, our observation that in vitro
selection occurs for the BT_0370 and BT_0371 galactose utilization locus in MC medium
indicates that the mouse chow diet itself is providing sufficient galactose to exert selective
pressure at least in part. During our in vivo colonization experiments, once E. coli restored native
galactokinase (galK) activity in its genome through loss of IS2, Bt genes that catabolized a
second carbon source, sucrose, became dominant. Sucrose is a dominant simple carbohydrate in
mouse chow, present at 0.71% (w/w) in comparison to 0.22% for glucose and fructose. Per
Freter’s nutrient-niche hypothesis, which described substrate-level competition and substratelimited population levels (266), our results suggest that galactose is preferred over sucrose, and
that a clone capable of utilizing both carbon sources will outcompete clones capable of only
using one of the sources. Nutrient-based niches have been documented in the mammalian GI
tract, including the varying sugar preferences among commensal and pathogenic E. coli strains
(267), and polysaccharide utilization loci (PULs) in Bacteroides species that promote long-term
colonization (268). In fact, the enterohemorrhaghic E. coli strain EDL933 can use sucrose, while
commensal E. coli strains K-12 MG1655, HS, and Nissle 1917 cannot (267). Incorporating
sucrose utilization, such as through the truncated Bt glycoside hydrolase (BT_1759) identified in
this study, could enhance retention of probiotic E. coli strains. Pre-colonization with sucroseutilizing probiotic strains to occupy the sucrose niche could also be an effective strategy to resist
pathogen colonization.
69
Bt has been investigated previously using transposon mutagenesis systems coupled to
mouse gut colonization experiments (104), facilitating comparison of our results to the prior
study. Goodman et al. found no difference in abundances of galactokinase (BT_0370) mutants in
vitro but BT_0370 mutants were underrepresented in vivo. In contrast, in our study, the Bt
galactokinase was selected for not only in vivo, but also in vitro. Furthermore, Goodman et al.
found dTDP-4-dehydrorhamnose reductase (BT_1730) and GMP synthase (BT_4265) mutants
were underrepresented both in vitro and in vivo. However, in our study, BT_1730 and BT_4265
seemed to confer fitness only in vivo. The in vitro discrepancies may be a result of slightly
different culturing and media conditions. The in vivo results are in agreement for BT_0370,
BT_1730, and BT_4265, though the other genes we identified in our experiments were not
significantly altered in representation in the transposon mutagenesis experiments, highlighting
the different capabilities of the two approaches.
Overall, we expect TFUMseq to be a powerful tool for engineering commensal microbes
with new or enhanced capabilities, as it provides a general approach to functionally identifying
genes from metagenomic DNA that enhance microbial fitness in vivo. Going forward, there are
two primary considerations for designing future TFUMseq experiments: the choice of the
bacterial strain to receive the donor plasmid library, and the mammalian host environment. In
this study, we used a cloning strain of E. coli as the recipient bacteria, which enabled the
generation of a robust, high-quality library. This strain has inactivated restriction systems, thus
preventing underrepresentation of DNA inserts in the library that may contain otherwise
recognized methylated sites from the donor source. Further, the lack of prior host-adaptation of
this laboratory strain in vivo, in comparison to a wild-type adapted commensal strain, allows for
stronger selection signals from clones harboring functional donor genes. As we saw, the recipient
strain also plays a role in the co-evolution of the insert library and the bacterial genome. We
observed a genomic change, specifically the galK reversion, driving the shift in library selection
from Bt galactokinase (BT_0370) to Bt glycoside hydrolase (BT_1759). Furthermore, we found
single nucleotide variations in E. coli galR and lacY loci that boosted galactose utilization in
clones harboring functional Bt genes. Given that co-evolution drives genomic changes in the
recipient strain, using a well-characterized recipient strain facilitates mechanistic interpretation
of these changes.
The state of the mammalian host is also a critical variable in our approach. In this work,
germfree mice were mono-associated with the library. We expect that the results of in vivo
selection may differ when mice are pre-colonized with a microbiota due to changes in nutrient
availability and other ecological interactions, including competition or syntrophy. For instance,
co-colonization experiments demonstrated that probiotic strains and commensal bacteria have
adaptive substrate utilization. Bt shifts its metabolism from mucosal glycans to dietary plant
polysaccharides when in the presence of Bifidobacterium animalis, Bifidobacterium longum, or
70
Lactobacillus casei (199). Bacteroides species are also known to engage in public-goods based
syntrophy by releasing outer membrane vesicles (OMVs) that contain surface glycoside
hydrolases or polysaccharide lyases (269). These enzymes catabolize large polysaccharides into
smaller units, which can then be utilized by other species in the community. Given the
complexities of multispecies bacterial communities, TFUMseq’s ability to track large numbers of
clones over time will be important for detecting relevant genes that confer a fitness advantage
within dynamically changing communities.
Our results suggest several future studies using TFUMseq. Replication of our
experiments in additional cohorts of mice would be valuable. In this study, mice were separately
caged in the same gnotobiotic isolator, and we employed meticulous techniques to avoid crosscontamination. We did not observe evidence of isolates being exchanged between mice, and in
fact, saw unique selection patterns for each mouse (Figure 2-8) and were able to isolate different
clones carrying non-identical fragments from different mice. Nonetheless, future experiments in
which our study was repeated in a different gnotobiotic isolator would be useful to characterize
the variability of the entire process. Further, it would be of interest to understand the influence of
host genetics and nutrition on the selection of genes in our library, which could be investigated
by repeating our study using different strains of mice or placing mice on different diets such as
high-fat/high-sugar chow. Also, potential investigations could use total metagenomic DNA from
stool samples, rather than DNA from cultured organisms. Another area of interest would be
probing community composition and dynamics of selection in different regions of the gut. These
studies would provide insights into biogeographical niches coupled with temporal data provided
by our method. TFUMseq could also be used to build a better probiotic strain. One could
incorporate a metagenomic plasmid library into a probiotic strain and introduce the strain into a
complex host-bacterial community to isolate genes that increase the strain’s fitness in vivo. We
have already identified sucrose utilization as an important and feasible trait to incorporate into an
enhanced probiotic strain. Ultimately, TFUMseq-based studies could enable the rational design
of probiotic or commensal strains for various clinical applications, such as resisting pathogen
colonization, compensating for a high-fat/high-sucrose Western diet, or tempering host
autoimmunity.
2.6
Data Availability
All sequencing data generated in this study are publicly available at NCBI SRA under
accession number SRP051326. Detailed protocols and calculated effective gene coverage and
FPKM values for each gene and mouse from the in vivo experiment are available online at
http://msb.embopress.org/content/11/3/788 as Supplementary Materials and Datasets.
71
2.7
Acknowledgements
This work was supported by grants from the Harvard Digestive Diseases Center (Pilot
and Feasibility Grant to GKG, under grant P30DK034854), the National Institutes of Health
Director’s Early Independence Award (grant 1DP5OD009172-01 to HHW), the US Department
of Energy (grant DE-FG02-02ER63445 to GMC), and the Wyss Institute for Biologically
Inspired Engineering. SJY also acknowledges support from the National Science Foundation
Graduate Research Fellowship and the MIT Neurometrix Presidential Graduate Fellowship.
GKG also acknowledges support from the Brigham and Women’s Department of Pathology.
72
Chapter 3
Delivering and maintaining
genetic elements
3.1
Background
This chapter explains our work in two areas: 1) introducing genetic elements into
complex microbial systems, and 2) limiting the transfer of elements that pose a threat by
immunizing native microbial strains. Our efforts to transfer genetic elements, to target specific
species or particular genes, and to stably maintain the elements depend on bacterial conjugation
and the prokaryotic adaptive immune system, CRISPR-Cas9. Since conjugation is thought to be
most relevant in highly dense communities such as the human gut, we were interested in
harnessing conjugative plasmids as a better alternative to current microbiota manipulations,
including administering probiotics, antibiotics, and fecal transplants. (Limitations of these
approaches are described in more detail in Section 3.1.1, while a brief primer on horizontal gene
transfer is presented in Section 3.1.2.) To enable more precise and longer-lasting treatments, we
investigated the potential of delivering self-transmissible vectors that carried Cas9 cassettes,
which could not only prevent cell uptake of pathogenesis genes, but also copy themselves into
specific strains to more stably immunize the cell.
In Section 3.2, we propose to mobilize a vector from a donor strain into the native
microbiota to circumvent the problem of colonization resistance. These vectors can be utilized to
transfer immunomodulators or fitness genes identified in Chapter 2. In order to measure
conjugation rates, we developed universal media to co-culture different representative microbiota
species, profiled antibiotic resistance, and generated molecular identification and quantification
methods for each species. These efforts form the basis for characterizing transfer efficiencies of
different conjugative plasmids across various bacterial species.
73
In Section 3.3, we introduce a Cas9 payload that inhibits the acquisition of antibiotic
resistance and toxin genes. This approach immunizes cells against entry of these mobile genetic
elements and actively eliminates any that are already present in the cell. While building large
arrays of CRISPR spacers that would target multiple sequences, we encountered issues with
synthesizing and maintaining these highly repetitive constructs (due to the same CRISPR repeat
sequence that must be interspersed between the spacers). Therefore we built alternate CRISPR
structures that would lower the instability from recombination at the repeat sequences. Then, we
demonstrated the feasibility of a “genome-copying” version of the Cas9 payload that can
integrate itself into a pre-defined location on the bacterial genome. Combining efficient
conjugative plasmids for a particular donor-recipient pair of strains and the genome-copying
Cas9 cassette provides enhanced long-term stability of the engineered payload in the recipient
population.
3.1.1 Limitations of current microbiota manipulations
In spite of the popularity of probiotics (i.e., live microbes that may promote human
health), few studies have proven their clinical benefit, with only examples in specific clinical
conditions such as prevention of pouchitis and atopic dermatitis (270). Unless native microbiota
are cleared, probiotic strains do not colonize the gut and provide only transient effects for the
length of their passage through the GI tract. Similarly, probiotic strains engineered as vehicles
for immuno-modulators (e.g., IL-10) and various mucosal vaccines are designed to survive long
enough in the GI tract to delivery their payload (172, 271–273). Colonization resistance (CR)
describes stably established microbiota that prevents introduced species from colonizing in the
same niche. While the mechanisms of CR are incompletely understood, it is thought that the
native microbiota deplete nutrients, directly inhibit pathogens, and stimulate host defenses (274).
Antibiotics are commonly prescribed for bacterial infections, and may be effective
against Clostridium difficile infection and IBD (275). However, widespread antibiotic use has led
to rapid dissemination of antibiotic resistance genes through MGEs (276, 277). Antibiotics
nonspecifically kill native microbiota while selecting for antibiotic-resistant strains that can now
expand into cleared niches. This is a major concern in hospitals, where healthcare providers and
common facilities can be transmission vectors, and patients may already be immunecompromised. Compounded with the diminishing arsenal of novel antibiotic compounds, multidrug resistance strains of opportunistic pathogens are an increasing threat in both hospital- and
community-acquired infections.
Documented in as early as the 4th century by a Chinese doctor, fecal microbiota
transplants (FMT) has received renewed interest in the past few years as a cure for refractory C.
74
difficile infection (278). Current FMT practice involves a colonoscopy procedure in which a
homogenized stool filtrate prepared from a healthy donor is infused into the patient’s colon (279)
to restore microbial diversity and stability, though more work is needed to characterize long-term
colonization and address concerns of pathogen transmission.
3.1.2 Horizontal gene transfer
Lateral or horizontal gene transfer (HGT) is the incorporation of genetic material from
another organism without being its offspring. Bacterial DNA transfer can occur via conjugation,
transduction, or transformation. Conjugation occurs via direct cell-to-cell contact, transduction
via bactierophage, and transformation via naked DNA uptake from the environment by induced
or naturally competent cells. Multispecies communities harbor a dynamic gene pool consisting of
mobile genetic elements (MGEs), such as transposons, plasmids, and bacteriophages, which
serve as a source of HGT to share beneficial functions with neighbors to preserve community
stability (24, 25). Dense communities are active sites for gene transfer and reservoirs for
antibiotic resistance genes (12, 28–30, 280). In particular, bacterial conjugation can occur
frequently in the densely populated mammalian gut (281, 282).
75
3.2
Engineering horizontal gene transfer networks
3.2.1 Introduction
Of the 1011 cells in the human body, 90% are microbes that naturally inhabit the
gastrointestinal tract, oral cavity, skin, and other mucosal surfaces. This commensal microbial
community, called the microbiota, has intricate effects on human health, such as the development
and function of the host’s metabolism and immune system. The human gut is home to the most
densely populated microbial community characterized to date and is an active site of horizontal
gene transfer. We propose to control DNA transfer using bacterial conjugation, the transfer of
genetic material between bacteria in close contact. This work contributes to the broader goal of
engineering the human microbiome for prophylactic and therapeutic applications. Mobile genetic
elements, such as broadly conjugative plasmids, would be useful vehicles for delivering and
propagating heterologous genes in the microbiota in a controlled manner.
To allow for simultaneous growth of different species in conjugal matings, we developed
universal growth media that were optimized for pH and supplementation of amino acids, sugars,
fatty acids, vitamins, and minerals. For each species, we characterized the antibiotic resistance
profile, which is used for selection. To confirm the identity of each strain, we designed speciesspecific primers to perform colony PCR on individual isolates after a conjugation and confirm
the expected band size of the amplicon by gel electrophoresis. Furthermore, we developed
quantitative real-time PCR primers to quantify each species – this enables measuring transfer
rates in mixed cultures.
The conjugation system we used is based on RK2 (283), a member of the IncPα plasmid
family, whose only required cis-acting element for DNA transfer is the transfer origin (oriT). We
successfully transferred plasmids carrying RK2-oriT from E. coli into Gram-negative species
Bacteroides fragilis, B. thetaiotaomicron, B. vulgatus, and B. uniformis, and Gram-positive
species Enterococcus faecalis, Lactobacillus reuteri, and Streptococcus mutans. We also
demonstrated the ability of B. fragilis and B. vulgatus to transfer the plasmid back into E. coli.
3.2.2 Materials and Methods
3.2.2.1 Strains and constructs
Our studies included the following strains: Escherichia coli K-12 MG1655,
Bifidobacterium adolescentis ATCC 15703, Bacteroides fragilis ATCC 25285, Bacteroides
thetaiotaomicron ATCC 29148, Bacteroides uniformis ATCC 8492, Bacteroides vulgatus ATCC
76
8482, Enterococcus faecalis ATCC 29200, Lactobacillus reuteri ATCC 23272, Lactobacillus
rhamnosus GG ATCC 53103, Lactobacillus paracasei ATCC 25302, Salmonella enterica,
Streptococcus mutans ATCC 700610, and Streptococcus sanguinis ATCC BAA-1455. These
strains will be abbreviated MG, Bado, Bthe, Buni, Bvul, Efae, Lreu, Lrha, Lpar, Sent, Smut, and
Ssan, respectively, hereafter. MG strains with the conjugative plasmid RK2 are referred to as
MGRK2.
For conjugations, we used E. coli strains S17-1λpir (a gift from Andy Goodman). We
focused on conjugative plasmid pFD340 (284) (a gift from C. Jeffrey Smith) and a newly
constructed plasmid pBC003 (Figure 3-1).
pFD340 is an E. coli-Bacteroides shuttle vector constructed by merging RK2-oriT, E.
coli origin of replication pBR322, and a selectable marker (bla) that functions in E. coli with a
Bacteroides cryptic plasmid pBI143 (284) and a selectable marker that functions in Bacteroides
– this was ermFS from a Bacteroidies transposon, Tn4351 (285).
pBC003 combines the RK2-oriT, Bacteroides origin of replication pBI143, E. coli origin
of replication pBR322, bla marker, and ermFS marker from pFD340 with the origin of
replication from plasmid pAMbeta1 (a gift from Todd Klaenhammer) and ermBP marker from
pTRKH3-ldhGFP (Addgene plasmid 27167, (112)). pAMbeta1 is a broad host range plasmid
in Gram-positives. While ermFS is known to confer erythromycin resistance in Bacteroides, we
included ermBP for erythromycin selection in Gram-positives.
Figure 3-1 Maps of plasmids used in this study.
77
3.2.2.2 Microbiological selection methods
We tested three sets of media formulations. Their compositions are listed in detail in
Table 3-1, Table 3-2, and Table 3-3. To minimize the effect of evaporation in wells on the edges
of our 96-well plates, we systematically altered the order of strains and media conditions across
three plates, as depicted in Figure 3-2. The final 1X concentrations of antibiotics or other
selective compounds we used in our study were: carbenicillin 50 μg/mL, chloramphenicol 20
μg/mL, cefoxitin 20 μg/mL, 2-deoxy-d-galactose 0.05%, erythromycin 25 μg/mL, gentamicin
200 μg/mL, kanamycin 50 μg/mL, rif: rifampicin 20 μg/mL, sodium dodecyl sulfate 0.01%,
spectinomycin 100 μg/mL, tetracycline 10 μg/mL, and trimethoprim 10 μg/mL.
Table 3-1 Composition of first set of growth media.
78
Table 3-2 Composition of second set of growth media.
79
Table 3-3 Composition of third set of growth media.
80
Figure 3-2 Triplicate design to minimize effects of evaporation in edge wells.
3.2.2.3 Molecular identification
We wrote custom Perl scripts to interface with locally installed BLAST (286) to design
potentially species-specific primers that could be validated experimentally. Our algorithm
entailed the following steps:
1. Download nucleic acid and amino acid sequences of all genes in each strain.
2. Perform “all-against-all” BLASTp. This carries out BLASTp between all pairs of sequences
in the set of strains.
3. Create table of genes not listed in the hit table (since hits correspond to highly similar
sequences) – this minimizes inter-species and intra-species non-specificity.
4. Select longest genes for each strain to proceed with primer design.
5. Optimize primers for melting temperature (Tm), amplicon length, and species-specificity.
a. Amplicon length and Tm are critical for PCR and qPCR considerations. For screening
by PCR, one may consider designing different sized bands to easily identify different
strains by running PCR products out on a gel. For qPCR, all amplicons were designed
to be ~100 bp in length.
b. To check species-specificity, run BLASTn of the candidate primers against the
database of interested strains, as well as the entire NCBI database.
81
3.2.2.4 Conjugation experiments
Prior to conjugation, donor and recipient strains were grown to saturation. Except for E.
coli strains, which were grown aerobically in LB at 37oC, all other strains were grown
anaerobically (GasPak 100 System, Becton Dickinson, Franklin Lakes, NJ) in rich medium, such
as supplemented Brain Heart Infusion or 3:2pas (Table 3-4). In a typical conjugation experiment
(Figure 3-3), 1 mL per conjugal mating of saturated cultures were washed by spinning down at
5000 rpm for 2.5 min. The pellet was resuspended with the rich medium. This was repeated once
to wash out antibiotics. Then equal volumes of resuspended donor and recipient strains were
combined and spun down again. The concentrated mixture was plated out as three 25 μL puddles
on rich medium agar without antibiotics. The puddles were allowed to air-dry for 10 min on the
bench. Then plates were incubated agar-down at 37oC for 5 hours aerobically and protected from
light. The puddles were collected with 1 mL of media – a sterile cell scraper was used to gently
detach cells from the agar surface. The collected mixture was plated at various dilutions on
selection agar plates that would quantify all donors, all recipients, and all transconjugants. scrape
off the cells. To calculate a transfer frequency, the number of transconjugants was divided by the
number of recipients. To further confirm the stability of the transferred plasmid in
transconjugants, isolated colonies were re-streaked onto fresh plates. The presence of the plasmid
and the identity of the strain were confirmed by PCR using species- or plasmid-specific primers.
Figure 3-3 Conjugation mating experimental workflow.
82
3.2.3 Results
3.2.3.1 Differential growth media and antibiotics
We selected representative microbiota species (7) to use in our growth media studies,
which explored various components, such as supplements and pH. From growth curves grouped
by strain (Figure 3-5) and by media (Figure 3-6), we found that in the first set of media
conditions, FF, HHB, FFB, and FFBC provided sufficient nutrients for all 11 microbial species.
These results were consistent with the second set of media conditions, in which we explored the
effect of pH (Figure 3-7 by strain and Figure 3-8 by media). FF media at pH 6.5, 7.0, and 7.5, in
addition to defined medium EZm, allowed growth of all species. However, the more complex
media, HB, that was supplemented with vitamins, minerial mix, and fatty acids proved
detrimental to growth of several species. Therefore, we investigated the simpler HHB media
from the first set of experiments and changed the half to half ratio of BHI to MRS mix to 3/5 to
2/5. Comparing variations of this medium and defined medium AZ (Figure 3-9), we determined
that the optimal universal growth media would be “3:2pas” (Table 3-4). Using 3:2pas, we
profiled the antibiotic resistance of each strain (Figure 3-10).
3.2.3.2 Species-specific PCR and qPCR primers
We developed molecular identification and quantification methods for representative
microbiota species. Validated PCR primers for each are listed in Table 3-5. For real-time qPCR,
standard curves were validated for ten species (representative standard shown for B. vulgatus in
Figure 3-4). A threshold cycle number was directly converted to an equivalent OD value, which
was used to calculate the number of cells. Cell numbers were calibrated from serial dilutions and
plating on solid agar to count colony forming units. We confirmed linearity of the standard
curves and specificity by analyzing the melting curve. Final qPCR primers are listed in Table 3-6.
Figure 3-4 Example validation for qPCR primer pair.
83
Figure 3-5 First set of growth curves by bacterial strain.
n = 3; error bars: min and max; blank values for each media condition have been subtracted.
84
Figure 3-6 First set of growth curves by media condition.
n = 3; error bars: min and max; blank values for each media condition have been subtracted.
85
Figure 3-7 Second set of growth curves by bacterial strain.
n = 3; error bars: min and max; blank values for each media condition have been subtracted.
86
Figure 3-8 Second set of growth curves by media condition.
n = 3; error bars: min and max; blank values for each media condition have been subtracted.
87
Figure 3-9 Third set of growth data.
Anaerobic growth after 48 hours. AZ = rich defined media from Teknova based on MOPS buffer
and amino acids. 3:2 = rich undefined media in a 3:2 mix of Brain Heart Infusion to de Man,
Rogosa & Sharpe. a = Vitamin K1 & Hemin. p = undefined protein mix including peptone. S =
sugars maltose, fructose, & cellobiose. V = vitamin & mineral mix from ATCC. n = 3
88
Table 3-4 Composition of the “3:2pas” medium.
89
Figure 3-10 Antibiotic resistance profiles of representative microbiota species.
Example dose response curves are shown in the top panel. The table shows minimum inhibitory
concentrations (MIC) for 12 drugs derived from dose response curves using five dosage levels (0.1,
0.5, 1, 5, 10 X) and three timepoints (0, 2, 5 days).
Antibiotic minimal inhibitory concentrations. carb: carbenicillin 50 μg/mL, chlor: chloramphenicol
20 μg/mL, cfx: cefoxitin 20 μg/mL, 2dog: 2-deoxy-d-galactose 0.05%, erm: erythromycin 25 μg/mL,
gent: gentamicin 200 μg/mL, kan: kanamycin 50 μg/mL, rif: rifampicin 20 μg/mL, sds: sodium
dodecyl sulfate 0.01%, spec: spectinomycin 100 μg/mL, tet: tetracycline 10 μg/mL, tri:
trimethoprim 10 μg/mL
90
Primer
Sequence
Bado_purB_f
GTGTCTCACGACTTCCCAACC
Bado_purB_r
GGAAACCACACTTTGCAGCC
Bfra_pATP_f
ATGAATTCAACTTTTGACATACGCAG
Bfra_pATP_r
CTGAAACCCCATATAGTTGCATGG
Bthe_silC_f
ATGACCTTTATATCTAATATACAATCGGTAGC
Bthe_silC_r
AGAAAGATAGCCAGGCCAATAATG
Buni_hypo_f
ATGATAAGCAAACCTCACGGTCT
Buni_hypo_r
ATCGCTCCCTCCTTATTGATGG
Bvul_hyp1_f
ATGGACATTAGTTCTATATTATGGGGCT
Bvul_hyp1_r
TCTTAAGTTTCGTATTGGTTCTAACCTC
Ecoli_glcB_f
ATGAGTCAAACCATAACCCAGAGC
Ecoli_glcB_r
ACGATTTTCTGGTGCCAGATCAT
Efae_ASP_f
ATGAAAAAAATGTTTAGTTTTGAGTTTTGGC
Efae_ASP_r
TAACAATTCAATATTTCCAAACGAATGCAC
Amplicon size
315 bp
400 bp
500 bp
599bp
700 bp
153 bp
1200 bp
Lpar_ABCt_f ATGAAGTTAGATTTGGAACTACGCC
Lpar_ABCt_r TAAAGGTATGACCTTGCGGATGA
Lreu_rihA _r
CCCTAAATTCAGTCGGTTTTTCAAGAT
Lreu_rihA_f
ATGTTAGATATCCTGGATTACACGAAACA
Lrha_PrtP_f
ATGCAAACAAAAAGGAAAGGGCTAT
Lrha_PrtP_r
CCTTTTTAGTATCACTAAGCCGCATG
RK2_trbL_f
ATGAAAATCCAGACTAGAGCTGCC
RK2_trbL_r
CTAATCACGGTCAAGGTCCAGAA
Sent_tviC_f
ATGAATTTAATGAAATCGTCAGGGATGTTT
Sent_tviC_r
TGAAGATTACGGACCGAAGTTGG
Smut_purE_f
ATGAAAAACAGACTGCTATTTTTAGAAGGT
Smut_purE_r
CTTCCTCATGTGTCGGCAAAATAG
Ssan_dexS_f
ATGAAAAAACAAGTTTCTTACAAGCAGC
Ssan_dexS_r
CGATATTGTTTAATGGCAAGGCTTG
400 bp
100 bp
700 bp
194 bp
1000 bp
250 bp
850 bp
Table 3-5 List of species-specific primers.
91
Primer
Sequence
Bado_98_f
CTTGGTACTTACCTCAACTGGAA
Bado_98_r
TTGGAGAAGAAGTCGGGAATG
Bfra_101_f
TATAAAAGCACGGAGATAGTGAAGA
Bfra_101_r
ACGAGATACTTCAGTTCGGC
Bthe_100_f
GACCTTTATATCTAATATACAATCGGTAGC
Bthe_100_r
GATAGTTACAGCGAGTACCGTG
Buni_100_f2
ATGTTTTTAATGTTTATGAGCGCTTG
Buni_100_r2
ACATACCATCTTCTATTGAAACGC
Bvul_99_f
TGGACATTAGTTCTATATTATGGGGC
Bvul_99_r
ACGTTGTTTTATCCTTCGTTGAA
Ecoli_102_f3
TTGATATCGGTATTGCCAGTTAAAC
Ecoli_102_r3
CATATAGGTGTCGTAAGCATGAAC
Efae_99_f
ATGAAAAAAATGTTTAGTTTTGAGTTTTGG
Efae_99_r
AATACTAATCATTAAACCCGCTGC
Lpar_100_f
ATGAAGTTAGATTTGGAACTACGC
Lpar_100_r
GTTGATTAAAATCCGCTAAGATCGTA
Lreu_100_f
ATGTTAGATATCCTGGATTACACGAA
Lreu_100_r
CCCTAAATTCAGTCGGTTTTTCAA
Lrha_100_f
GGTCTAATTACAAGTATAAAGGGGAAG
Lrha_100_r
TTTTTAGTATCACTAAGCCGCATG
Smut_100_f
AAAAACAGACTGCTATTTTTAGAAGGT
Smut_100_r
CAGGAGACAGGACATCAACTTT
Table 3-6 List of species-specific qPCR primers.
92
3.2.3.3 Rates of conjugation
First, we tested transfer rates of pFD340, which is an E. coli-Bacteroides shuttle vector
constructed by merging RK2-oriT, an E. coli origin of replication, and a selectable marker that
functions in E. coli with a Bacteroides cryptic plasmid pBI143 (284) and a selectable marker that
functions in Bacteroides – this was ermFS from a Bacteroidies transposon, Tn4351 (285). For
these conjugations, we used a 1:1 donor:recipient ratio and incubated the matings for 18 h. To
select for Bacteroides, we used Brucella agar with kanamycin and vancomycin (BKV); to select
for Bacteroides transconjugants, we added erythromycin. The conjugation rates are around 10-4
in B. fragilis, B. thetaiotaomicron, and B. uniformis, though the rate is a few orders of magnitude
less efficient in B. vulgatus (Table 3-7).
Donor
E. coli S17-1λpir
pFD340
Recipient
log10(transfer frequency)
Bacteroides fragilis
-4.7
Bacteroides thetaiotaomicron
-3.7
Bacteroides uniformis
-4.4
Bacteroides vulgatus
-7.2
Table 3-7 Conjugation frequencies of pFD340 into Bacteroides.
In secondary transfers, we grew verified and purified B. fragilis and B. vulgatus
transconjugants from the previous conjugation to serve as new donors of pFD340. Using LB agar
with carbenicillin to select for E. coli transconjugants, we measured transfer efficiencies of 10-7
into the original E. coli donor strain and almost the same from B. vulgatus into E. coli MG1655
(Table 3-8).
Donor
Recipient
log10(transfer frequency)
Bacteroides fragilis
pFD340
E. coli S17-1λpir
-6.2
E. coli MG1655
-9.5
Bacteroides vulgatus
pFD340
E. coli S17-1λpir
-7.7
E. coli MG1655
-8.4
Table 3-8 Secondary transfers from Bacteroides into E. coli.
93
For plasmid pBC003, we measured transfer rates into E. faecalis, L. reuteri, and S.
mutans around 10-8, and could not detect conjugation into B. fragilis and B. thetaiotaomicron
(Table 3-9). For E. faecalis, our selection media was 3:2pas supplemented with gentamycin and
erythromycin; for L. reuteri and S. mutans, we used MRS agar with erythromycin.
Donor
E. coli S17-1λpir
pBC003
Recipient
log10(transfer frequency)
Bacteroides fragilis
not detected
Bacteroides thetaiotaomicron
not detected
Enterococcus faecalis
-8.4
Lactobacillus reuteri
-7.7
Streptococcus mutans
-8.0
Table 3-9 Conjugation frequencies of pBC003.
3.2.4 Discussion
To investigate and differentiate species in a complex microbial system by microbiology
and molecular biology methods, we successfully developed growth media, antibiotic selections,
and species-specific primers. With regard to conjugation rates, our findings are in line with prior
frequencies in the literature (Table 3-10), though direct comparisons are difficult given that
conjugation depends on numerous factors, such as the donor and recipient strains, media
conditions, plasmids, and conjugation parameters. In particular, conjugation can vary by the
growth phase of donors and recipients, their relative ratios, conjugation mixture density, and
mating time.
Shuttle vectors are often constructed by adding an origin of replication and a selection
marker that function in E. coli to isolated cryptic plasmids from a species of interest – an origin
of transfer is also included if conjugation will be used to introduce the plasmid. We expanded
upon this strategy and combined multiple replication and mobilization machinery, though this
may have rendered the pBC003 plasmid unstable in certain species; we observed lower rates of
transfer with pBC003 than with the simpler pFD340 from E. coli to Bacteroides species. In
future studies, it may be useful to isolate other cryptic plasmids in order to expand the set of
possible origins of replication to test, or even explore ways to engineer conjugative transposons,
such as Tn916 (86).
94
Plasmid features
pBR322-oriR +
RK2-oriT + pCP1
pRRI2 + pUC-oriR
(+ RK2-oriT?)
R6K-oriR + RK2-oriT
+ transposase
pBR322 + RK2-oriT
pAMbeta1-oriR
pAMbeta1
pBR322 + RK2-oriT
+ pAMbeta1-oriR-tra
RK2-oriT + ColE1
+ pB44-rep
Donor
Recipient
log10(Freq.)
Ref.
E. coli
Bacteroides fragilis
-6
(287)
Bacteroides uniformis
Bacteroides vulgatus
Bacteroides
thetaiotaomicron
E. coli
Enterococcus faecalis
-4.3
-4.6
(288)
-5.4
(104)
-1.7
-6.3
(289)
Enterococcus faecalis
-2
(290)
E. coli
-8.3
(290)
Bifidobacterium breve
Bifidobacterium
bifidium
-6
E. coli
E. coli
E. coli
Enterococcus
faecalis
Enterococcus
faecalis
E. coli
-6
(291)
Table 3-10 Conjugation frequencies from literature.
Plasmid features indicate the relevant origins and proteins for replication and conjugation on the
vector used in the specific donor and recipient mating associated with the listed log10(transfer
efficiency).
3.2.5 Acknowledgements
This work was supported by grants from the National Institutes of Health Director’s
Early Independence Award (grant 1DP5OD009172-01 to HHW), the US Department of Energy
(grant DE-FG02-02ER63445 to GMC), and the Wyss Institute for Biologically Inspired
Engineering. SJY also acknowledges support from the National Science Foundation Graduate
Research Fellowship and the MIT Neurometrix Presidential Graduate Fellowship.
We thank Tara Gianoulis for bioinformatics assistance, Pooja Jethani for contributions to
qPCR primer design and validation, Mary Delaney and Andrea DuBois for guidance on
anaerobic culturing and techniques, Marc Lajoie for defined media, and Andy Goodman for
conjugation advice.
95
3.3 Immunizing strains against acquisition of antibiotic resistance
and toxins
3.3.1 Introduction
Many prokaryotes use Clustered Regularly Interspaced Short Palindromic Repeats
(CRISPR) and CRISPR-associated (Cas) genes to limit horizontal gene transfer (HGT) from the
environment. In these CRISPR-Cas systems, an RNA-guided protein complex recognizes a
target sequence (or “protospacer”) on an invading plasmid or phage genome. The host-encoded
sequence for transcribing the RNA guide is called a spacer; spacer acquisition allows microbes to
be immune against subsequent viral infection or plasmid transfer (292). In fact, strains without
CRISPR-Cas more readily acquire plasmids and pathogenicity islands or become infected with
bacteriophage (293). This is concerning given that bacteriophage can carry genes encoding
virulence factors, such as diphtheria toxin on phage β in Corynebacteria diphtheria (294, 295),
enterotoxin A on phage PS42-D in Staphylococcus aureus (296), shiga toxin (stx) on lambdoid
phages in pathogenic Escherichia coli (297), and cholera toxin (ctx) on phage CTXΦ in Vibrio
cholerae (298).
In this section, we harness CRISPR-Cas9 and HGT to prevent the acquisition of
undesirable antibiotic resistance or pathogenesis genes, as well as demonstrate a method to more
stably introduce our engineered elements that expands upon conjugative plasmids used in the
previous chapter. First, using the Streptococcus pyogenes CRISPR-Cas9 system (299), we
validated spacers that prevent the transfer of several antibiotic resistance and toxin genes.
Critically, we targeted multiple sequences within a target gene in order to diminish the likelihood
of escape that could arise from mutations. We demonstrated applications of Cas9-mediated
immunization in E. coli (against Shiga toxin and numerous clinically relevant beta-lactamases)
and V. cholerae (against cholera toxin, to enhance live attenuated cholera vaccines).
Second, over the course of constructing and testing multiple CRISPR spacers, we
encountered difficulties with synthesis and stability. The natural structure of CRISPR is an array
of spacer sequences flanked by the same repeat sequence. Not only were we limited by
commercial gene synthesis services to construct these largely repetitive constructs, but we also
observed that the repeats allowed for recombination and subsequent escape of targeted species
from the loss of CRISPR spacers. We tested a variety of mutations in the repeat sequence that
would not compromise Cas9 function; these alternative CRISPR repeats could then be
interspersed in an array to minimize recombination risk and improve constructability.
Third, we were interested in leveraging cell-to-cell transfer via conjugation as described
in the previous section to propagate the Cas9 cassette to native microbiota. Instead of simply
96
using Cas9 to eliminate pathogens as suggested by others (300), we present a novel method to
permanently immunize endogenous microbes using a “genome-copying” version of the cassette
(Figure 3-11). This design contains spacers against a recombination hotspot on the genome,
defined by nearby crossover hotspot instigator, or Chi (χ), sites (301), and has homology arms to
serve as a donor repair template during homologous recombination. Once transferred into a
recipient cell, the cassette copies itself into the bacterial chromosome. The continued distribution
of the mobile element in the population will expand the proportion of immunized cells and limit
the spread of Cas9-targeted sequences such as antibiotic resistance genes. Here, we demonstrated
successful genome-copying in E. coli.
Figure 3-11 Design of Cas9 cassette with genome-copying feature.
A Cas9 cassette contains spacers targeting a region “A” on the genome and is flanked
by homology arms for the genomic target site. The cassette can be carried on a
conjugative plasmid or a prophage to propagate and replace genetic elements in a
microbial community.
3.3.2 Materials and Methods
3.3.2.1 Strains and plasmids
Chemically competent E. coli (NEB Turbo, New England Biolabs, Ipswich, MA) were
used for routine cloning. Wild-type E. coli K-12 MG1655 and E. coli B were also used for
97
CRISPR assays. For studies with V. cholerae, we used the El Tor strain Bah-2 (E7946 ΔattRS1,
(302)). For conjugations, we used E. coli MFDpir (a gift from Jean-Marc Ghigo), which is a
diaminopimelic acid (DAP) auxotroph and free of the Mu prophage (85). Phage T4 and T7
stocks were propagated in E. coli K-12. pCTX-Km, the replicative form of CTX-KmΦ (ΔctxAB),
was prepared from V. cholerae O395 (298).
E. coli were grown at 37oC in LB broth and supplemented with antibiotics as needed at
final concentrations of 100 μg/mL spectinomycin, 30 μg/mL chloramphenicol, 300 μg/mL
erythromycin, 50 μg/mL kanamycin, 100 μg/mL carbenicillin, and 10 μg/mL tetracycline. V.
cholerae were grown at 37oC in LB broth and supplemented with antibiotics as needed at final
concentrations of 50 μg/mL kanamycin, 100 μg/mL carbenicillin, and 10 μg/mL tetracycline.
E. coli cells expressing SpCas9 were constructed by transforming in DS-SPcas (Addgene
plasmid 48645, (303)), which encodes SpCas9 and its cognate tracrRNA on a backbone with a
cloDF13 origin of replication and aadA gene. We assembled compatible protospacer plasmids
encoding target or control sequences with their PAMs on a plasmid with a pBR322 origin of
replication and a bla gene or a plasmid with a colE1 origin and kan marker. In some assays, the
spacer was expressed on DS-SPcas such that there was no separate spacer plasmid. In all other
experiments, we maintained the designed spacer on a separate plasmid (based on PM-SP!TB,
Addgene plasmid 48650, (303)) that expressed one spacer followed by the SpCas9 repeat on a
backbone with a p15a origin of replication and cat gene.
V. cholerae cells expressing SpCas9 were constructed by encoding SpCas9, the tracrRNA,
and spacers on a backbone with a pBR322 origin of replication, bla gene, and RK2-oriT
sequence for conjugation from E. coli MFDpir into the strain. Targeted protospacers were placed
on plasmids with an SC101 origin of replication, tetracycline resistance, and RK2-oriT for
conjugation from E. coli MFDpir into the strain.
3.3.2.2 Spacer validation
Spacers were validated by transformation, conjugation, or phage infection. In general,
“protected” cells were first prepared by introducing the plasmid(s) that encoded Cas9, tracrRNA,
and the spacer of interest. Then, we transformed equimolar amounts of either the targeted
protospacer plasmid or a control untargeted protospacer plasmid into the protected cells. After
selecting for the co-existence of all plasmids (Cas9, spacer, and protospacer), we quantified Cas9
activity as the number of transformants in the targeted protospacer plasmid condition relative to
the non-targeted plasmid.
In conjugation assays, we prepared equal numbers of donor and recipient cells across
98
experimental conditions. Recipient cells were either protected E. coli or V. cholerae with Cas9,
tracrRNA, and spacers to be validated, or unprotected cells with similar plasmids that lacked the
Cas9 machinery. Donor cells were E. coli MFDpir with the targeted protospacer plasmid or a
control untargeted plasmid. For each conjugal mating, we washed 1 mL of overnight donor and
recipient cell cultures by spinning down at 8000 x g for 3 min, resuspending in PBS, repeating
the wash, mixing the two resuspensions for another spin down, and transferring the final ~30 μL
of densely resuspended pellet onto LB agar plates as spots of 5-10 μL puddles. DAP was
supplemented in the final resuspension media prior to transferring onto the agar plate. The spots
were allowed to air dry for 10 min before incubating face up (i.e., agar on the bottom) at 37oC for
5 h. The cells were then collected with 1 mL PBS and a sterile scraper. Dilutions were plated on
selective media to quantify total number of recipients, donors, and transconjugants.
To characterize the level of phage resistance conferred by Cas9, we infected normalized
densities of protected E. coli with equal titers of phages and counted the number of formed
plaques. We obtained equal cell densities by diluting an overnight culture and normalizing to an
OD600nm of 0.3 after several hours of growth. Then we added 2 μL of phage to 120 μL of cells,
mixed them in 1 mL of 0.6% top agar with appropriate antibiotics within 20 minutes, and poured
the mixture onto 3 mL of 1.5% solid agar. Replicate experiments were performed with different
phage dilutions. We measured Cas9 activity by comparing the number of plaques formed on a
protected strain to the number formed on a susceptible strain.
In CTXΦ transduction assays, AKI media was used for TCP induction of V. cholerae
Bah-2 (304). Cas9 activity was measured as the relative pCTX-Km transduction efficiency of
protected Bah-2 strains compared to an unprotected Bah-2 strain.
3.3.3 Results
3.3.3.1 Characterization of spacers against antibiotic resistance and toxins
We designed spacers targeting aminoglycoside resistance (aphA), beta-lactamase (bla),
Klebsiella pneumoniae carbapenemase (blakpc, (305)), New Delhi metallo-beta-lactamase
1 (blaNDM-1, (306)), vancomycin resistance (vanA and vanB, (307)), Shiga toxin (stx2A and
stx2B), the primase/helicase gene in phage T7, and the major capsid protein in phage T4. For V.
cholerae, we targeted tetracycline resistance genes carried on mobile genetic elements, such as
the conjugative plasmid RK2 and the integrating conjugative element SXT, which can spread
antibiotic resistance (308, 309). We also designed CRISPR spacers against cholera toxin (ctxA
and ctxB) and rstA, required for replication of phage CTXΦ. Overall, we observed three to five
orders of magnitude of Cas9-mediated protection in our transformation, conjugation, and
99
transduction assays in E. coli and V. cholerae for spacers targeting antibiotic resistance, toxin,
and phage genes (Table 3-11). All spacers were first validated in a plasmid transformation assay
in E. coli. Anti-phage spacers were further tested with T4 and T7 phage infection experiments in
E. coli and CTXΦ transduction assays in V. cholerae. We also carried out conjugation assays for
all V. cholerae spacers using E. coli MFDpir donors and V. cholerae Bah-2 recipients – a
representative assay is shown in Figure 3-12.
.
E. coli
Spacer
Sequence
V. cholerae
PAM . Spacer
.
Antibiotic resistance
aphA.1
CACTCATCCAATCTCACTGA
C
.
aphA.2
CTGCTGGACGAACTTTTCTA
A
.
bla
ACTTTAAAAGTGCTCATCAT
T
.
kpc.1
GCATTTTTGCCGTAACGGAT
G
.
NDM.1
GAAGTGTGCTGCCAGACATT
C
.
vanA.1
GCTGTTTCGGGCTGTGAGGT
C
.
vanB.1
GCGATTTCGGGCTGTGAGGT
C
.
Sequence
Antibiotic resistance
RK2tetA.2
ATCTTGCTCGTCTCGCTGGC
C
SXTtetA.1
CGGCGAGTAAGATTAATGTA
G
SXTtetA.2
ATATTACTACTATCTCTTGC
A
.
Toxins
PAM
Toxins
. ctxA.1
TAAACAAAGGGAGCATTATA
T
stx2A.1 CCCTCTTGAACATATATCTC
A
. ctxA.2
GGATTTGTTAGGCACGATGA
T
stx2A.2 CCCTGAGATATATGTTCAAG
A
. ctxA.3
CATCCATATATTTGGGAGTA
T
stx2A.3 GGGAGAGGATGGTGTCAGAG
T
. ctxA.4
TTTGTCTTTTAACTTTAGAT
T
stx2B.4 AAACTGCACTTCAGCAAATC
C
. ctxB.5
ATTATGATTAAATTAAAATT
T
. ctxB.6
GAATCTATATGTTGACTACC
T
. ctxA.7
TTTAACGTTAATGATGTATT
A
CCTGATGAAATAAAGCAGTC
A
Phage
T4.Y
AAGAACTTCCAACCGGTAAT
G
. ctxA.8
T7.7
TTCGGGAAGCACTTGTGGAA
T
.
T7.8
GATGCTTGAGGAGTCCGTTG
A
.
. rstA.1
Phage
TTTTTGTCGATTATCTTGCT
T
Table 3-11 Validated spacers for E. coli and V. cholerae applications.
All spacers listed conferred three to five orders of magnitude of Cas9-mediated protection relative
to a control sequence across a variety of assays, including transformation, transduction, and
conjugation.
100
Figure 3-12 Example CRISPR spacer validation assay in V. cholerae.
In this test, we validated spacers against rstA, which is required for DNA replication of phage
CTXΦ. We constructed two E. coli MFDpir donors, one with the rstA protospacer (“rst”) and
the other without (“-”). We also prepared two V. cholerae Bah-2 recipients, one with a Cas9
plasmid targeting rstA (“!rst”) and another without the rstA spacer (“!con”). Then, we
performed four conjugal matings (i.e., all the combinations). The purpose of the control
conjugations is to account for possible differences in the background transfer rate of various
plasmids or differences in the ability of various recipients to receive a plasmid.
A transfer efficiency was calculated for each conjugation by dividing the number of
transconjugants by total recipients. Here, the normalized ratio of transfer frequencies between
the two plasmids into the control recipient was 10-3:10-3 = 1 as expected, but is critical to check
in all conjugation experiments. The level of Cas9 activity was measured as the relative
normalized transfer efficiency of the properly targeted protospacer plasmid compared to the
non-targeted plasmid. In this experiment, the plasmid with a spacer targeting rstA provided
Bah-2 at least four orders of magnitude of protection against an invading plasmid carrying the
rstA sequence.
101
3.3.3.2 Investigating stability and designing alternative CRISPR-Cas9 repeat sequences
Once we validated individual spacers, we sought to combine them into large multi-spacer
arrays to build broadly immunized E. coli and V. cholerae strains. In conjugation assays using V.
cholerae Bah-2 carrying Cas9 and one, three, or five different spacers against ctxA, we observed
comparable levels of protection against the introduction of a plasmid carrying the ctxA
protospacers. However, as we attempted to build even larger arrays, we encountered difficulties
synthesizing these highly repetitive constructs.
Furthermore, when we began to characterize escapees from our Cas9 assays, we found
that recombination at the repeat sequences was a common mode of escape. In the simplest case,
we began with an array of one spacer flanked by two repeats – this is the structure of native
CRISPR systems, though we demonstrated previously that a spacer-repeat is equivalent in
activity to a repeat-spacer-repeat format (303). In half of the clones we sequenced, we found
recombination that excised the spacer and one repeat (Figure 3-13A). With another larger array
of different spacers (numbered 1 through 11), we also found escapees with deletions of portions
of the array. For example, when spacer 5 was targeted, spacers 3 through 8 were deleted, and
when spacer 4 was targeted, spacers 3 through 5 or 4 through 9 were deleted (Figure 3-13B).
Therefore, we designed alternative repeat sequences that would preserve Cas9 activity
but improve array stability and ease of synthesis. We used previously validated spacers and
transformation assays to benchmark our alternative repeat designs. First, we varied up to three
bases in the repeat without making changes in the tracrRNA sequence (Figure 3-14A), and found
they were comparable in Cas9 activity using the bla spacer. Then, we introduced more variations
in the repeat sequence and made corresponding changes in the base-paired positions in the
tracrRNA (Figure 3-14B); these maintained Cas9 activity in our transformation assays using a
control spacer sequence, ACTTTAAAAGTATTCGCCAT, which is four bases different from the
bla spacer
Third, we shortened the repeat and tracrRNA lengths, from the native 36 nt to 28, 22, 16,
and 14 nt (Figure 3-15A). We found that 16 and 14 nt versions no longer provided Cas9 activity
with the control spacer, possibly because we destabilized the duplex structure from A6 to G9 of
the repeat (310). Thus, we focused on 18 nt and 16 nt variations that maintained the AGAG at
positions 6 to 9 while introducing a single mismatch in other regions (Figure 3-15B). Since the
16 nt version with a mismatch also disrupted Cas9 activity, this time in transformation assays
using the T4.Y spacer, we proceeded to use variants of 18 nt with either one or two mismatches
(Figure 3-16). We found that all of these designs functioned as well as the wild-type in
transformation assays with the T4.Y spacer.
102
A
B
Figure 3-13 Escapees recombine at repeat regions to excise the spacer.
A. Sequence view of recombined escapees from single spacer array in repeat-spacer-repeat format.
B. An array of 11 spacers (colored boxes) with wild-type (wt) repeat sequences (gray diamonds).
When protospacer 5 or 4 plasmids were introduced, escapees had lost portions of the array.
103
Figure 3-14 Alternative CRISPR repeats with base substitutions.
The wild-type SPcas9 repeat (crRNA) and tracrRNA are displayed at the top for comparison.
Mutations are in blue text and yellow highlighting.
104
Figure 3-15 Alternative CRISPR repeats with truncations.
The wild-type SPcas9 repeat (crRNA) and tracrRNA are displayed at the top for comparison.
Mutations are in blue text and yellow highlighting. Introduced mismatches between the crRNA
and tracrRNA are in red text.
105
Figure 3-16 Alternative CRISPR repeats with length 18 nt and one to two mismatches.
The modified 18 nt SPcas9 repeat (crRNA) and its tracrRNA are displayed at the top. Further
introduced mismatches between the crRNA and tracrRNA for other 18 nt length versions are
in red text.
106
3.3.3.3 Genome-copying for stable incorporation of engineered mobile elements in E. coli
We identified a recombination hotspot on the E. coli genome (at psiE) that was flanked
by two sets of appropriately oriented chi (χ) sites – basically “chi> chi> <chi <chi”, where “chi>”
is 5’-GCTGGTGG-3’ and “<chi” is 5’-CCACCAGC-3’ (Figure 3-17). To demonstrate genomecopying, we constructed a plasmid with 2 kb homology arms to the region flanking a cassette
with SPcas9, tracrRNA, and spacers targeting two sites on psiE. To avoid self-cutting of the
plasmid, we recoded the protospacer regions within the 2 kb homology arms. We confirmed the
self-insertion of the cassette with PCR. We also performed a control transformation of the
plasmid into RecA-deficient E. coli; as expected, we did not observe viable transformants.
Figure 3-17 Stable incorporation of engineered Cas9 mobile elements in E. coli.
The cassette includes spacers targeting a recombination hotspot (flanked by χ sites). The dark
vertical lines in the repaired version indicate recoded protospacers to preclude self-cutting.
107
3.3.4 Discussion
Now that we have confirmed several spacers for antibiotic resistance genes and toxins in
E. coli and Vibrio cholerae, we can design more spacers to target many other clinically relevant
sequences (311). But to combine numerous spacers into one array requires novel CRISPR repeat
sequences to facilitate array-building and promote in vivo stability. We have shown viable
substitutions and truncations for single spacers; the next step is to construct arrays of these
alternative repeats flanking multiple spacers targeting various antibiotic resistance, toxin, and
phage genes. These arrays can then be incorporated in a Cas9 cassette for self-copying onto a
bacterial genome.
While we have shown the feasibility of genome-copying, there are important parameters
to characterize, such as the optimal number of chi sites and length of homology arms, since the
presence of more chi sites is synergistic for recombination (312). Given that the psiE site in E.
coli was a viable genome-copying site, we searched for similar regions with nested chi sites. For
notation, we labeled chi sites in this structure as “χ outer >”, “χ inner >”, “< χ inner”, and “< χ outer”.
For searching through the genome, we used three criteria: 1) the inner chi sites are within 1 kb of
each other, 2) the outer chi sites are within 8 kb of each other, and 3) the two chi sites on either
side in the same orientation are within 4 kb of each other.
Out of 1008 chi sites in E. coli K-12 MG1655, we found 30 unique “χ inner >” sites and 29
unique “< χ inner” sites satisfying the 1 kb requirement. For sites within 8 kb, there were 199
unique “χ outer >” sites and 225 unique “< χ outer” sites. There were 6 regions with the nested
structure, which were essentially 4 due to multiple chi sites satisfying the criteria nearby (Table
3-12). We also considered the probiotic E. coli strain Nissle 1917, which exerts antagonistic
effects against pathogenic enterobacteria (313) through nutrient competition (267). Of the 1100
chi sites in E. coli Nissle 1917, we identified 35 unique “χ inner >” sites and 34 unique “< χ inner”
sites satisfying the 1 kb criterion. There were 230 unique “χ outer >” sites and 247 unique “< χ outer”
sites within 8 kb of each other. 15 genomic regions contained the nested structure; 6 were unique
(Table 3-13).
Besides copying the Cas9-enabled immunization vector into E. coli, our approach can be
applied to other species in the microbiota. This can be most readily achieved if the speciesspecific chi site is already known; otherwise candidate chi sties can be identified with a statistical
model applied to the core genome (314). Then drawing upon methods for cross-species
conjugation as described in the previous section, we can promote active immunization of
targeted species in the native microbiota against antibiotic resistance, toxins, and other clinically
relevant genes carried on mobile genetic elements.
108
Genomic locations of nested χ sites
on MG1655 (GenBank: U00096.3)
719280, 722817, 723361, 723873
719280, 722817, 723361, 725739
1643621, 1643673, 1644198, 1644255
1891128, 1892555, 1893108, 1894619
1891128, 1892555, 1893108, 1895241
4237067, 4240099, 4240945, 4243580
Genes in region
speF, ybfK, kdpE, kdpD, kdpC, kdpB
ydfU (part of cryptic prophage Qin/Kim)
tsaB, yoaA, yoaB, yoaC, yoaH, pabB
yjbG, yjbH, yjbT, psiE, xylE, malG, malF
Table 3-12 Nested chi site regions in E. coli MG1655.
Genomic locations of nested χ sites on
Nissle 1917 (GenBank: CP007799.1)
Genes in region
765030, 768660, 769204, 771582
speF, kdpE, kdpD, kdpC, kdpB
1310494, 1311443, 1312297, 1313501
several hypothetical proteins, DNA primase
1310494, 1311443, 1312297, 1314360
1825808, 1829538, 1829741, 1831893
hypothetical proteins, tRNA-Val
1829538, 1831690, 1831893, 1833996
1975113, 1976540, 1977093, 1978604
1975113, 1976540, 1977093, 1979226
ATP-dependent helicase, endoribonuclease LPSP, hypothetical proteins, pabB
2062892, 2066233, 2066848, 2067119
2062892, 2066233, 2066848, 2067971
2062892, 2066233, 2066848, 2068122
2062892, 2066233, 2067119, 2067971
2062892, 2066233, 2067119, 2068122
DNA adenine methylase, serine protease,
antitermination protein, antirepressor, crossover
junction
endodeoxyribnuclease,
adenine
methyltransferase, GntR family transcriptional
regulator, hypothetical proteins
2062892, 2066233, 2067119, 2070916
2945787, 2948884, 2949574, 2950449
2945787, 2948884, 2949574, 2953293
L-aspartate oxidase, srmB, LysR family
transcriptional regulator, grcA, uracil-DNA
glycosylase
Table 3-13 Nested chi site regions in E. coli Nissle 1917.
109
3.3.5 Acknowledgements
This work was supported by US Department of Energy grant DE-FG02-02ER63445 (to
GMC) and the Wyss Institute for Biologically Inspired Engineering. SJY was supported by a
National Science Foundation Graduate Research Fellowship and KME by the Wyss Technology
Development Fellowship. We thank members of the Waldor lab, including Yoshi Yamaichi,
Brigid Davis, and Bill Robins, for assistance with V. cholerae methods.
110
Chapter 4
Replacing gut microbial strains
with precision using phages and
CRISPR
4.1
Background
Although probiotics are advertised as health-promoting, these bacteria have limited
efficacy due to their inability to persist in the gut for more than a few days because of
competition from endogenous bacteria already residing in all available niches. One approach to
displace native microbes is to administer antibiotics, but these are often too broadly acting,
harming bystander bacteria as collateral damage. The long-term goal of this chapter is to
modulate community composition by eliminating a specific native strain, thus emptying its niche
for an engineered version.
The use of bacteriophages for precise microbiota perturbations is promising for several
reasons. Phages are highly specific to their bacterial host species, have been naturally used by
bacteria to dominate a niche in the gut (315), and can be administered as a small cocktail to
deplete a large fraction of bacteria – a collection of four T4-like phages decreases E. coli by 60%
in the mouse gut (316). Furthermore, bacteriophages have been successfully applied in phage
therapy against pathogenic bacteria in Eastern Europe (317).
As described in the previous chapter, immunizing strains against acquiring pathogenic
elements from the environment is important for preventing the spread of toxin or antibiotic
resistance genes. With the rise of multidrug resistant pathogens in both hospital- and community-
111
acquired infections, there is increasing concern that unprotected probiotics or “naïve”
endogenous microbiota may be compromised and lead to continued spread of virulence factors
(318). We leverage a specific type of bacterial adaptive immune system, called CRISPR-Cas9,
which destroys foreign DNA carried on bacteriophages or plasmids. In native prokaryotic Type
II CRISPR systems, transcribed arrays are processed into CRISPR RNAs (crRNAs) that form a
complex with Cas9 and a trans-activating RNA (tracrRNA) (299). The crRNA guides Cas9 to
double-stranded DNA sequences called protospacers that match the sequence of the spacer and
are flanked by a protospacer adjacent motif (PAM) unique to the CRISPR system (319). If
spacer-protospacer base-pairing is a close match, Cas9 cuts both strands of DNA.
We propose a protocol in which a targeted endogenous bacterial species is depleted with
phages to benefit an engineered strain having Cas9-mediated phage immunity. This ecological
approach creates a competitive advantage for the protected cells, which will more stably colonize.
To prevent the introduced strain from losing its engineered function or acquiring pathogenic
elements, we designed a population cycling method to systematically replace the strain with
different versions (A, B, C, etc.) (Figure 4-1). The versions may be governed by phages with
naturally different host ranges, or recoded sequences in the same parent phage. The CRISPR-Cas
defense system in each bacterial version will carry corresponding spacers against its phage.
Therefore, the wild-type occupant is first eliminated by phage A. Then strain A (which is
immune to phage A but susceptible to all other versions) is introduced. After some time, phage B
is introduced to clear out strain A for subsequent colonization of strain B.
Figure 4-1 Strain rotation scheme using phage and corresponding
susceptible and Cas9-mediated resistant host strains.
112
We sought to demonstrate this approach with E. coli strains and T4-like phages that have
wide host ranges (320). Section 4.2 describes our pilot mouse experiments in which we tested
our best spacers at the time against bacteriophages T6 and RB15. Recognizing that we needed to
further optimize several aspects of the mouse experiment, such as the level of protection
conferred by Cas9 against phage infection, we began to characterize other anti-phage spacers.
For reasons not yet understood, CRISPR spacers vary greatly (up to six orders of
magnitude) in their ability to confer resistance against phage infection, even though the spacers
target the same phage gene. In Section 4.3, we investigated whether this was due to DNA
modifications found in T4-like phages. We discovered that Cas9 could provide resistance against
infection by phage T4, which has all its cytosines replaced with glucosyl hydroxymethylated
cytosines. Since DNA modification is one method for phage to escape host restriction, this result
suggests Cas9 can overcome a variety of DNA modifications and thus provide the cell with
protection against phage.
To develop a cocktail of phages to use in cyclic strain replacement, we needed to
characterize CRISPR spacers against different phages, which in turn required the availability of
genome sequences for T4-like phages of interest. In Section 4.4, we sequenced the genomes of
T4-like phages RB3, RB5, RB6, RB7, RB9, RB10, RB15, RB27, RB33, RB55, RB59, and RB68.
With sequenced phages and an understanding that phage-encoded DNA modifications
will not impede Cas9 activity, we returned to identifying high-activity CRISPR spacers against
phages. Section 4.5 describes our development of a high-throughput library selection for highly
effective spacers against several T4-like phages. These validated spacers can then be used in a
follow-up mouse experiment to demonstrate phage-assisted niche replacement in vivo.
113
4.2
Phage-assisted niche depletion in the murine gut
4.2.1 Introduction
In this section, we explore the feasibility of using bacteriophages to deplete a niche in the
microbiota and replace it with an engineered strain that is protected against phage infection by
Cas9-mediated immunity. We sought to selectively eliminate existing strains of E. coli in the
mouse gut with a probiotic E. coli strain, Nissle 1917 (EcN) that is a supplement in Europe
marketed under trade name Mutaflor®, and then replace EcN with an engineered EcN strain that
has phage immunity and is labeled with a distinct marker.
To minimize conditions that may limit phage-mediated depletion, particularly the
possibility that bacteria in physically isolated microenvironments such as crypts or mucus might
be shielded from phage depletion at a given time of phage introduction, we also studied the
effect of supplementing the drinking water with sugars and repeating phage dosing.
We conducted two mouse experiments based on the model of streptomycin treatment to
eliminate facultative anaerobes such as E. coli and its competitors in the mouse gut (321). In the
first study, we tested whether phages T6 and RB33 would deplete the starting EcN strain we
introduced to allow for colonization of a second phage-resistant EcN strain. Phage-resistance
was encoded using Cas9 and a CRISPR spacer that had activity against phage T6 and RB33. We
also tested whether sugar would enhance the effect of phage predation. Since phage-susceptible
bacteria can resist infection if they are in stationary phase, we tested the hypothesis that
delivering phages with a nutrient source would induce cells into exponential phase and thus
become vulnerable to phages. We gave mice drinking water with phages and 2% arabinose, a
sugar that is preferentially utilized by E. coli strains but poorly absorbed by animals (267). In the
second study, we tested the re-administration of phage to enhance depletion of phage-susceptible
cells and enrichment of phage-resistant strains.
4.2.2 Materials and methods
4.2.2.1 Strain construction and verification
Two plasmids were used in the study. The control plasmid contains aadA (for
streptomycin/spectinomycin resistance), an SC101 origin of replication, a non-fluorescent YFP
(R96A mutation) driven by the pLlacO promoter, lacIq, and the Streptococcus pyogenes Cas9
and its cognate tracrRNA. The anti-phage plasmid (denoted “!phage” hereafter), is identical to
114
the control plasmid except for two key features: YFP is fluorescent, and there is a CRISPR
spacer “T6.Y” encoded by the 20 nucleotide sequence 5’-AAGAACTTCCAACCAGTAAT-3’. For
proper Cas9 processing of the T6.Y CRISPR RNA, there is the JS23119 promoter upstream and
S. pyogenes CRISPR repeat downstream of the spacer.
For the first mouse experiment, the two plasmids were introduced into probiotic E. coli
strain Nissle 1917 to construct strains “EcN control” and “EcN !phage”. In the second
experiment, we transformed plasmids into E. coli B to make “EcB control” and “EcB !phage”.
We verified the activity of spacer T6.Y against phages T6 and RB33 using plaque assays
in both E. coli strains. In a typical plaque assay, we infected normalized densities of E. coli with
equal titers of phages, mixed them in 0.6% top agar to overlay on 1.5% solid agar, and counted
the number of plaques that formed. To calculate the level of protection conferred by the CRISPR
spacer, we divided the number of plaques formed on a protected strain by the number of plaques
formed on a susceptible strain.
4.2.2.2 Batch phage production
Small-scale phage stocks were first prepared by diluting an overnight bacterial host
culture at 1:100 into Luria broth, inoculating it with phage, and growing the culture for 2.5-5
hours. Phage lysates were purified by centrifugation at 8000 x g for 5 minutes at 4oC to remove
cell debris. The supernatant was then filtered through a 0.45 μM membrane and stored at 4oC.
To obtain highly concentrated phage for resuspension in drinking water for animal
experiments, we modified a protocol described for large-scale production of T4-like phages
(322). First, an overnight bacterial host culture (E. coli B) was diluted 1:500 into 1 L of LB,
grown to an OD600nm of 0.2, and inoculated with 1 mL of phage stock. The cultures were grown
on a shaker at 37oC for 3-5 hours, depending on how much time each bacteriophage required to
clear the bacterial culture; phage T6 took 3 hours and phage RB33 4.5 hours. We periodically
checked the culture turbidity during the incubation period to confirm host bacterial strain growth
and subsequent phage lysis, which would be reflected by an initial increase in turbidity followed
by clearance.
Phage lysates were cleared by centrifugation at 4oC for 10 min at 8000 x g using 500 mL
Nalgene bottles (Thermo Scientific, Waltham, MA) in a Sorvall RC-6 Plus Superspeed
Centrifuge with a F12S-6x500 LEX rotor (Thermo Scientific). For cultures with substantial cell
debris, we used smaller volumes in 50 mL conical tubes with a typical tabletop centrifuge (7100
x g for 8 min). Then, we passed the supernatant through a 0.45 µm filter membrane and pelleted
the phages by ultracentrifugation (28,880 x g for 1 hr at 4oC) using 50 mL conical tubes in a
115
Sorvall RC-6 Plus centrifuge with a F13-14x50cy rotor (Thermo Scientific). The supernatant was
gently pour off into another collection bottle and any remaining liquid removed by pipet. The
small white, opaque pellets of phage were resuspended in ddH2O at 1/100 of the starting volume
and stored at 4oC.
To quantify the concentrated phage, we performed plaque assays with serial dilutions of
the resuspended phage pellets as well as the saved supernatants. Overall, we found that our
method achieves 100-1000X concentration. Of note, there were still viable phage particles in the
supernatant, albeit at lower concentrations (about 1000X lower than the resuspended pellets).
4.2.2.3 Animal experiments
All of the mice used in this study were handled in accordance with protocols approved by
the Harvard Medical Area Standing Committee on Animals (HMA IACUC). Female C57BL/6
mice (Charles River Laboratories, Wilmington, MA; 8-12 weeks of age) were individually
housed to prevent cross-contamination of bacteria and feces between cage mates. Experiments
were double-blinded.
To prepare bacterial E. coli for administration in the drinking water, 30 mL of cultures
were grown to late exponential phase, spun down at 8,000 rpm for 5 min, washed with 10%
glycerol, spun down again, and resuspended in ddH2O with appropriate antibiotics or sugars
according to the animal protocol. All antibiotics and sugars used were filter-sterilized and
prepared as 10X or 1000X stock solutions in ddH2O to add to the drinking water.
Primer
Sequence
EcN_fim_f
CAATGCATGGGCTGATGATTCA
EcN_fim_r
ATACCCTTTTTTTGAAAACTTACCGAGATC
Ecoli_glcB_f
ATGAGTCAAACCATAACCCAGAGC
Ecoli_glcB_r
ACGATTTTCTGGTGCCAGATCAT
Amplicon size
E. coli specificity
102 bp
Nissle
153 bp
Nissle, B, MG1655
Table 4-1 Primers to identify E. coli Nissle 1917.
Using both sets of primers, E. coli B would only show a 153 bp band, while E. coli Nissle would
show both 102 bp and 153 bp bands. Details on generating species-specific primers based on
potentially species-specific genes are described in Chapter 3.
116
Mouse experiment 1 (Figure 4-2):
On Day 1, mice were given streptomycin in their drinking water at a final concentration
of 5 mg/mL. The next day, the water was changed to a solution of EcN control cells at a final
concentration of 105 CFU/mL, supplemented with 2% sucrose and 0.1 mg/mL streptomycin.
Sucrose was included to boost the palatability of the water. The lower streptomycin
concentration (referred to as “low strep” hereafter) was necessary to ensure EcN cells maintained
the plasmid. The drinking water was switched to simply low strep on Day 3.
On Day 4, four different treatments were administered via another water change. All
water bottles contained low strep and 2% sucrose, with or without additional supplements. Group
A (n = 4 mice) received no additional supplements in the drinking water, Group B (n = 4 mice)
received 2% L-arabinose, Group C (n = 12 mice) received 1010 PFU/mL of phages T6 and RB33,
and Group D (n = 12 mice) received 2% L-arabinose and 1010 PFU/mL of phages T6 and RB33.
On Day 5, EcN !phage cells were administered in the drinking water at a final concentration of
108 CFU/mL, with 2% sucrose and low strep. The water was switched back to low strep only on
Day 6 and maintained for the remainder of the experiment (to Day 16).
Figure 4-2 Mouse experiment 1 design to test effect of phage and/or sugar.
Phages T6 and RB33 were used with or without arabinose as the sugar source.
Triangles represent days of fecal pellet collection and plating. Groups A and B: n =
4 mice each. Groups C and D: n = 12 mice each.
117
Mouse experiment 2 (Figure 4-3):
Streptomycin at 5 mg/mL final concentration in drinking water was re-administered to
the mice a day after the first experiment ended. The next day, now Day 2 of this second
experiment, the water was changed to a solution of EcB control cells at a final concentration of
105 CFU/mL with 0.1 mg/mL streptomycin. On Day 3, the water was switched to low strep only.
Since we noticed a mixture of two morphologies on the platings from Day 3, we
suspected that there may be remaining EcN cells in the mouse gut from the previous experiment.
Using EcN-specific primers (Table 4-1) in colony PCRs, we confirmed that two out of two
colonies per mouse for 15 different mice across various groups were actually EcN and not EcB.
Therefore, to properly test strain replacement, we proceeded with EcN !phage cells to displace
the EcN control population. On Day 4, a different treatment was administered to mice reassigned to the four different groups. Group A continued to receive low strep water, while
Groups B, C, and D all received 1010 PFU/mL of phage T6. On Day 5, 108 CFU/mL of
EcN !phage cells were administered in low strep drinking water to all groups. On Day 6, Groups
A and B returned to receiving low strep only water, while Group C received a low re-dosing of
108 PFU/mL of phage T6 and Group D received a regular re-dosing of 1010 PFU/mL of phage T6
for the next four days, after which both groups also returned to receiving low strep only water.
Figure 4-3 Mouse experiment 2 design to test effect of repeated phage dosing.
Phage T6 at two different concentrations were used in the repeat dosing after
introduction of EcN !phage. Triangles represent days of fecal pellet collection and
plating. Groups A and B: n = 4 mice each. Groups C and D: n = 12 mice each.
118
4.2.2.4 Plating
Fecal pellets were collected throughout the experiments to measure the efficacy of the
initial streptomycin treatment, colonization of control cells, depletion by phages, and relative
levels of control and !phage populations in the mouse gut across different treatment groups. On
the day of collection, pellets were weighed, resuspended in 1 mL 10% PBS (in ddH2O), and
homogenized at 4oC for an hour with a tabletop vortexer fitted with an adapter for holding
multiple 1.5 mL Eppendorf tubes. Multiple dilutions were plated on MacConkey agar with 1%
lactose to quantify growth of all E. coli cells, including those native to the mouse (323).
MacConkey agar plates with 1% lactose, 0.1 mg/mL streptomycin, and 100 μM isopropyl β-D-1thiogalactopyranoside (IPTG) were used to quantify administered YFP- versus YFP+ cells that
survived the in vivo mouse gut.
4.2.3 Results
4.2.3.1 Spacer validation
We observed that the T6.Y spacer provided both E. coli Nissle 1917 and E. coli B with
6X and 67X protection against infection by phages RB33 and T6, respectively, compared to
unprotected E. coli (n = 4 replicate plaque assays). We proceeded to test whether this level of
protection against phage infection could allow EcN or EcB !phage strains to displace precolonized control strains under in vivo phage selective pressures.
4.2.3.2 Mouse experiment 1 with L-arabinose and phages T6 and RB33
We observed 106 CFU/g of stool of lactose-fermenting Gram-negative enteric bacilli in
the endogenous microbiota across the 32 mice at Day 0 (Figure 4-4). As expected, no CFU were
detectable on MacConkey lactose plates on Day 1 after the 5 mg/mL streptomycin treatment.
After the introduction of 105 CFU/mL of YFP- (EcN control) cells in the drinking water on Day
2, this population expanded to 109 CFU/g of stool on Day 3 in all four groups.
In Group B, where mice received only additional sugar (L-arabinose) on Day 4, the
biomass of YFP- cells increased by an order of magnitude. In Groups C and D, which received
phages on Day 4, YFP- cells decreased by two orders of magnitude compared to the starting
biomass on Day 3. However, regardless of the treatment, there was an overall decline of biomass
of E. coli across all groups, and the introduced YFP+ (EcN !phage) cells did not persist at more
than 106 CFU/g of stool beyond Day 7.
119
To measure the extent of strain replacement, we calculated the ratio of YFP+ to total cells
after phage and/or sugar treatments (Figure 4-5). For several mice (8 out of 12) in Group D, there
was an increase in the fraction of YFP+ cells in the first two days, but YFP+ cells were no longer
detectable at high ratios after the fifth day post-phage (Day 10). There was complete replacement
in mice 24 and 27 for at least one day – individual biomass values are shown for those two in
Figure 4-11 for added clarity. In contrast, there were only 2 out of 12 mice in Group C where
there was any increase in the ratio from Day 6 to Day 7. The low ratios in Group B are a result of
the YFP- bloom from the sugar only treatment. The data from Group A suggest that the when a
second population is introduction, it can persist for one day, but will be nearly undetectable a few
days later, with the exception of mouse 32.
Figure 4-4 Biomass of YFP- and YFP+ cells in mouse experiment 1.
The quantified population of YFP- (EcN control) cells and YFP+ (EcN !phage) cells are plotted as
boxplots at each day of fecal pellet collection. Whiskers on each boxplot represent the minimum
and maximum values. Raw data points are shown in supplementary Figure 4-8.
120
Figure 4-5 Fraction of replaced cells in mouse experiment 1.
The proportion of YFP+ cells out of total cells from quantitative culturing is
shown for each mouse in each group. For each mouse, the values are plotted
across four fecal pellet collection times after YFP+ introduction in the
drinking water (increasingly lighter shades of blue). Group A: control. Group
B: sugar only. Group C: phage only. Group D: sugar and phage.
4.2.3.3 Mouse experiment 2 with repeated phage T6 dosing
Since we assumed that the same 5 mg/mL streptomycin treatment would eliminate the
EcN cells we introduced in mice in the first experiment, we only began plating fecal pellets after
the introduction of EcB control cells in this second experiment. Given that EcB control cells
were YFP-, we suspected that there were residual EcN cells when we found YFP+ cells in a few
of the mice. Moreover, we noticed a mixture of small and large colonies in platings from several
mice. To determine if these cells were EcB or EcN, we performed PCRs on colonies using
diagnostic primer sets that could distinguish E. coli Nissle from other E. coli, and found that all
colonies we characterized were indeed residual EcN control cells from the previous experiment.
121
The presence of YFP+ cells after introduction of YFP- cells only at Day 3 as well as
overall loss of all E. coli over the course of the experiment confounded the interpretation of the
results (Figure 4-6). We could not determine if phage re-dosing improved colonization or
persistence. The most conclusive observation was that phage T6 was able to bring down YFPcells by one to two orders of magnitude in Groups B, C, and D. Since there were no colonies on
many of the platings, we focused on results from individual mice with any non-zero YFP+
counts (Figure 4-10 in Supplementary section).
Figure 4-6 Biomass of YFP- and YFP+ cells in mouse experiment 2.
The quantified population of YFP- (control) cells and YFP+ (!phage) cells are plotted as
boxplots at each day of fecal pellet collection. Whiskers on each boxplot represent the
minimum and maximum values. Raw data points are shown in supplementary Figure 4-9.
122
Interestingly, YFP+ cells were able to colonize in two mice in Group A, without phage
treatment. These were consistent with the calculated ratios of YFP+ to total cells (Figure 4-7).
Besides mice 24 and 28, which were anomalies from the first experiment, there were only six
other mice with any YFP+ cells at any time point during this experiment. YFP- and YFP+ coexist at roughly equivalent levels in mouse 32, though that was likely carried over from the
previous experiment. In Group B, only mouse 27 exhibited YFP+ enrichment after the first
phage dose. In Group D, only mouse 15 had YFP+ cells appear transiently during the second
phage dose.
Figure 4-7 Fraction of replaced cells in mouse experiment 2.
The proportion of YFP+ cells out of total cells from quantitative culturing is
shown for each mouse in each group across the three time points, the initial
value (orange) and two time points after introduction of YFP+ cells in the
drinking water (increasingly lighter shades of purple). Group A: control.
Group B: phage only. Group C: phage and low phage re-dose. Group D:
phage and regular phage re-dose.
123
4.2.4 Discussion
In the first experiment, we have preliminary evidence that L-arabinose may enhance
phage selection against phage-susceptible strains, but it appeared inconsistent across the 12 mice
in Group D (Figure 4-5). Furthermore, L-arabinose placed the YFP+ cells at a lower starting
biomass relative to the pre-colonized YFP- cells; there were roughly equivalent CFU/g of stool
values for YFP+ and YFP- cells at Day 6 in Groups A and C, which did not receive L-arabinose,
while the YFP- biomass was at least 10X lower at Day 6 in Groups B and D, which received Larabinose (Figure 4-4). And the sugar accelerated the loss of YFP- cells in Group D compared to
Group C, at Days 10 and 16.
In the second mouse experiment, it is unclear whether repeated phage dosing enhanced
the colonization of YFP+ cells, since only one mouse from Group B (without repeat phage) and
one mouse from Group D (with repeated phage at full dose) exhibited new YFP+ enrichment.
From these studies, we have learned valuable lessons about how to carry out future
mouse experiments. First, we need to address the overall decline in the biomass of introduced E.
coli after we drop the concentration of streptomycin from 5 mg/mL to 0.1 mg/mL, which reflects
the recolonization of endogenous gut flora (324). We chose to maintain streptomycin in the
drinking water to ensure the engineered E. coli maintained our plasmid, yet we could not
continue with the 5 mg/mL concentration to keep endogenous gut microbiota in check because
the level of streptomycin resistance conferred by aadA on our plasmid was 0.05 to 0.2 mg/mL.
We also measured that 0.025 mg/mL streptomycin was too low to select against E. coli without
the aadA plasmid. For follow-up studies, we have already used recombineering to construct a
streptomycin-resistant EcN strain with the well-documented rpsL mutation (K43R; AAA to
AGA) that confers high levels of streptomycin resistance in E. coli (325). In fact, we have
confirmed that this EcN strR strain can grow in 5 mg/mL of streptomycin.
Besides improving streptomycin resistance of our strains and preventing biomass loss in
the streptomycin-treated mouse model, we also need to characterize more active CRISPR spacers
against the phages we use in the animal experiment. Spacer T6.Y only confers at most 100X
protection against infection by phage T6; this level of immunity is three to four orders of
magnitude lower than the high-activity spacers we later characterized for phage T4 (Section 4.3).
Using a high-throughput library screen (see Section 4.5) for effective anti-phage spacers, we
have validated six spacers against phage T6 that are four orders of magnitude more protective
than T6.Y. Using these new spacers, we can increase the phage resistance of our YFP+ EcN
strain, which should in turn improve its fitness advantage under phage selection to more stably
colonize the murine gut.
124
4.2.5 Acknowledgements
This work was supported by the Wyss Institute for Biologically Inspired Engineering.
SJY was also supported by a National Science Foundation Graduate Research Fellowship and
KME by the Wyss Technology Development Fellowship. We thank Amanda Graveline and
Andyna Vernet for assistance and training with mouse experiments.
125
4.2.6 Supplementary figures
Figure 4-8 Raw data points from mouse experiment 1.
Dots represent the mean for YFP+ (blue) or YFP- (orange) across mice for that day. Group A:
control. Group B: sugar only. Group C: phage only. Group D: sugar and phage.
126
Figure 4-9 Raw data points from mouse experiment 2.
Dots represent the mean for YFP+ (blue) or YFP- (orange) across mice for that day. Group A:
control. Group B: phage only. Group C: phage and low phage re-dose. Group D: phage and
regular phage re-dose.
127
Figure 4-10 Individual mouse data from experiment 2.
M# represents the mouse number, assigned to Group A/B/C/D labeled above. Group A:
control. Group B: phage only. Group C: phage and low phage re-dose. Group D: phage and
regular phage re-dose.
128
Figure 4-11 Raw data points for mice #24 and #27.
The top two graphs are from mouse experiment 1, in which mice 24 and 27
were in Group D (sugar and phage). The bottom two graphs are from mouse
experiment 2, in which mouse 24 was assigned to Group C (phage and low
phage re-dose) and mouse 27 to Group B (phage only).
129
4.3 CRISPR/Cas9-mediated phage resistance is not impeded by T4
DNA modifications
This section has been adapted from:
Stephanie J. Yaung, Kevin M. Esvelt, George M. Church. CRISPR/Cas9-mediated phage
resistance is not impeded by the DNA modifications of phage T4. PLOS ONE 9(6):e98811
(2014). Ref. (326)
4.3.1 Abstract
Bacteria rely on two known DNA-level defenses against their bacteriophage predators:
restriction-modification and Clustered Regularly Interspaced Short Palindromic Repeats
(CRISPR)-CRISPR-associated (Cas) systems. Certain phages have evolved countermeasures that
are known to block endonucleases. For example, phage T4 not only adds hydroxymethyl groups
to all of its cytosines, but also glucosylates them, a strategy that defeats almost all restriction
enzymes. We sought to determine whether these DNA modifications can similarly impede
CRISPR-based defenses.
In a bioinformatics search, we found naturally occurring CRISPR spacers that potentially
target phages known to modify their DNA. Experimentally, we show that the Cas9 nuclease from
the Type II CRISPR system of Streptococcus pyogenes can overcome a variety of DNA
modifications in Escherichia coli. The levels of Cas9-mediated phage resistance to bacteriophage
T4 and the mutant phage T4 gt, which contains hydroxymethylated but not glucosylated
cytosines, were comparable to phages with unmodified cytosines, T7 and the T4-like phage
RB49. Our results demonstrate that Cas9 is not impeded by N6-methyladenine, 5-methylcytosine,
5-hydroxymethylated cytosine, or glucosylated 5-hydroxymethylated cytosine.
4.3.2 Introduction
Bacteria utilize an assortment of anti-phage defense mechanisms, including two that act
at the nucleic acid level: restriction-modification and Clustered Regularly Interspaced Short
Palindromic Repeats (CRISPR)-CRISPR-associated (Cas) systems. Some bacteriophages have
developed extensive modifications to their DNA that enable them to evade host restriction
endonucleases. For example, phage T4 replaces each cytosine with hydroxymethylated cytosine
(hmC), then glucosylates the hydroxymethyl group to form glucosylated hmC (ghmC) (327). The
bound glucose shelters the phage genome from the host’s modified cytosine restriction systems,
McrA, McrBC, and Mrr, which recognize methylcytosines and hmCs but not ghmCs (328).
130
CRISPR-Cas systems also function as endonucleases, though unlike restriction enzymes,
their recognition sites are programmable by CRISPR RNAs (crRNAs) (329). As an adaptive
immune system, CRISPR-Cas components incorporate fragments of DNA from invading viruses
or plasmids into arrays composed of spacers interspersed with repeats on the genome (330, 331) .
In Type II CRISPR systems, transcribed arrays are processed into crRNAs that form a complex
with the RNA-guided Cas9 nuclease and a trans-activating RNA (tracrRNA) (299). The crRNA
guides the complex to double-stranded DNA “protospacer” sequences that match the sequence of
the spacer and are flanked by a “protospacer adjacent motif” (PAM) unique to the CRISPR
system (319). If spacer-protospacer base-pairing is a close match, Cas9 cuts both strands of DNA,
often eliminating the plasmid or phage. We sought to determine whether various DNA
modifications known to block restriction systems can similarly impede CRISPR-Cas defenses.
4.3.3 Materials and methods
4.3.3.1 Bioinformatics search
We derived a list of 1749 unique spacers from several sources: 49 E. coli strains with
CRISPR structures in the CRISPRdb database (http://crispr.u-psud.fr/crispr/, (332)), 72 strains in
the ECOR collection (333), 263 strains isolated from humans or animals in various regions of
France (334), and 194 Shiga toxin-producing E. coli (STEC) strains (335). CRISPR array
sequences were processed in CRISPRfinder (http://crispr.u-psud.fr/Server/, (336)) to extract
spacer sequences.
We performed BLASTn searches (http://blast.ncbi.nlm.nih.gov/, (286)) with a word size
of seven optimized for short sequences and an E-value of less than 0.1, which corresponded to
roughly at least 14 matched nucleotides in the T2/T4/T6 genomes search and at least 17 matched
nucleotides in the all T4-like genomes search. We screened hits by first looking for a
concentration of exact nucleotide matches at the 5’ end, which would be consistent with a sevennucleotide “seed” region that does not tolerate mismatches (337). Outside the seed sequence, at
least five mismatches are tolerated (337), though the upper limit of tolerable mismatches has not
been characterized in the E. coli CRISPR system. We then checked for a properly oriented E.
coli Type I-E CRISPR PAM such as AAG, ATG, AGG, and GAG in the targeted sequence.
4.3.3.2 Bacterial strains and plasmid construction
In addition to wild-type E. coli K-12 MG1655 and E. coli B, we used methyltransferasedeficient (dam–/dcm–) E. coli K-12 (ER2925, New England Biolabs, Ipswich, MA) and
131
restriction-deficient (mcrA– mcrBC– mrr– hsdR–) E. coli K-12 (ER1821, New England Biolabs).
E. coli were grown at 37oC in LB broth and supplemented with antibiotics as needed at final
concentrations of 100 μg/mL spectinomycin, 30 μg/mL chloramphenicol, 300 μg/mL
erythromycin, and 100 μg/mL carbenicillin.
Cells expressing SpCas9 were constructed by transforming in DS-SPcas (Addgene
plasmid 48645, (303)), which encodes SpCas9 and its cognate tracrRNA on a backbone with a
cloDF13 origin of replication and aadA gene. In the dam/dcm methylation studies, we assembled
a compatible protospacer plasmid encoding all five of the target sequences with their PAMs; we
placed the control, dam1, and dcm1 sequences after a pBR322 origin of replication, and the
dam2 and dcm2 sequences after a bla gene. In the T7 infection assays, the spacer was expressed
on DS-SPcas such that there was no separate spacer plasmid. In all other experiments, we
maintained the designed spacer on a separate plasmid (based on PM-SP!TB, Addgene plasmid
48650, (303)) that expressed one spacer followed by the SpCas9 repeat on a backbone with a
p15a origin of replication and cat gene. When a different resistance marker was needed, we
switched cat with EryR.
4.3.3.3 Bacteriophage strains and propagation
Phage T7 stock was propagated in E. coli K-12 MG1655 and RB49 stock (obtained from
H. M. Krisch) propagated in E. coli B. Wild-type T4 stock was propagated in E. coli K-12
MG1655. Phage T4 gt (a gift from New England Biolabs) is T4 α-gt57 β-gt14, which does not
have functional α- and β-glucosyltransferases (338). Because the E. coli restriction system
recognizes and cleaves hmC, preventing T4 gt from plaquing efficiently, we conducted all
experiments involving this phage in the restriction-deficient E. coli K-12 host ER1821.
In phage stock preparation, an overnight bacterial host culture was diluted 1:100 in LB,
inoculated with phage, and grown for 2.5-5 hours (during which the turbidity of cultures rose and
then fell due to lysis). The lysates were spun down at 8000 x g for 5 minutes at 4oC to remove
cell debris. The supernatant was filtered through a 0.45 μM membrane and stored at 4oC.
4.3.3.4 Transformation assays
We prepared protospacer and spacer plasmids from a dam+/dcm+ strain, NEB Turbo
(New England Biolabs), and performed transformation assays using E. coli K-12 MG1655
bacteria containing the protospacer plasmid and DS-SPcas. After transforming equimolar
amounts of each spacer plasmid and selecting for all three plasmids (DS-SPcas, protospacer, and
spacer), we quantified the number of transformants relative to a transformed spacer plasmid that
did not target the protospacer plasmid. We also reversed the transformation order for one set of
132
experiments; that is, we transformed the protospacer plasmid into E. coli already carrying DSSPcas and each spacer plasmid. We observed comparable numbers of transformants regardless of
order. We repeated the same transformations in methyltransferase-deficient E. coli K-12 using
equimolar unmethylated protospacer and spacer plasmids, which were prepared from E. coli K12 dam–/dcm–. Again, for one set of experiments, we reversed the transformation order and noted
similar numbers of transformants.
4.3.3.5 Plaque assays and efficiency-of-plating calculations
To characterize the level of phage resistance conferred by Cas9, we infected normalized
densities of protected E. coli with equal titers of phages and counted the number of plaques.
Equal cell densities were obtained by diluting an overnight culture and normalizing to an
OD600nm of 0.3 after several hours of growth. We added 2 μL of phage to 120 μL of cells, mixed
them thoroughly in 1 mL of 0.6% top agar with appropriate antibiotics within 20 minutes, and
poured the mixture onto 3 mL of 1.5% solid agar. Independent experiments were performed with
different phage dilutions. To calculate an efficiency of plating (EOP), we divided the phage titer
from plating the phage on a protected strain by the phage titer from plating the phage on a
susceptible wild-type strain.
4.3.4 Results
4.3.4.1 Natural spacers target phages with modified DNA
We began by attempting to discover naturally acquired spacers in bacteria that target
phages known to contain modified DNA. Only a handful of phage families have been identified
with completely modified DNA, including Bacillus subtilis phage PBS2, Synechococcus
elongates phage S2L, and Escherichia coli phage T4 (339). Since CRISPR-Cas systems and
phages of E. coli have been better studied than those of the other bacterial hosts, we focused on
1749 unique E. coli spacers in available array sequences from the ECOR collection, Shiga toxinproducing E. coli (STEC), and other databases.
Upon searching for candidate protospacers in phages T2, T4, and T6, all of which contain
ghmC DNA (327), we found one hit that matched 25 of 32 nucleotides in T2’s gene 38, although
this spacer was only found in one human-associated E. coli (Figure 4-12A). In an expanded
search including T4-like phages, we identified another hit with 29 nucleotides matching phage
CC31’s gp35 (Figure 4-12B). CC31 is the only known non-T-even type phage with predicted
glucosyltransferase genes (340), which are required for generating ghmC from hmC. This spacer
was found in many different E. coli isolates.
133
Figure 4-12 Native E. coli spacers target phage with modified DNA.
In a BLASTn search, 1749 unique spacers from sequenced E. coli CRISPR arrays were queried
against T4-like phage genomes. (A) Spacer S641 matches 25 of 32 nucleotides in phage T2. The
putative protospacer has a permissible E. coli CRISPR PAM AAG and the matching
nucleotides are concentrated at the 5’ end as a seed sequence. The spacer originated from the
CRISPR1 locus of E. coli strain 579, a human-associated isolate from France. (B) Spacer S134
matches 29 of 32 nucleotides in phage CC31. While the protospacer in phage CC31 has five
nucleotides inserted in the center of the sequence, there are 15 exactly matched nucleotides at
the 5’ end in addition to 14 matched nucleotides after the insertion. The PAM GAG and
strongly matched seed region suggest it is a plausible E. coli CRISPR target. This spacer was
found in several strains, including E. coli C str. ATCC 8739, ECOR strains 17 through 21, one
farm pig and two human fecal samples in France, duck and cattle fecal samples in Australia
(341), and enterotoxigenic E. coli (ETEC) strain UMNK88. The spacer and matching
protospacer are in blue, the transcribed CRISPR RNA (crRNA) in bold black, and PAM
sequence in red.
The potential presence of natural spacers targeting phage with modified DNA suggests
that CRISPR-Cas systems may overcome this form of phage defense. To test this hypothesis, we
explored the extent to which the Type II-A Streptococcus pyogenes Cas9 (SpCas9), the most
commonly used CRISPR-Cas system for genome engineering, is able to cleave various forms of
modified DNA.
134
4.3.4.2 Cas9 cuts N6-methyladenine and 5-methylcytosine in E. coli
DNA adenine methyltransferase (dam) methylates the adenine in 5’-GATC-3’, while
DNA cytosine methyltransferase (dcm) methylates the internal cytosine in 5’-CCTGG-3’ and 5’CCAGG-3’ in E. coli. We designed target sequences containing one to two dam or dcm sites as
well as a control target sequence with no methylation sites (Figure 4-13A). We prepared spacer
and protospacer plasmids from a dam+/dcm+ strain and selected for the coexistence of each
spacer and its targeted protospacer in transformation assays using dam+/dcm+ cells expressing
SpCas9. All targeted sequences yielded 102 to 103 fewer transformants than the non-targeted
control regardless of whether they contained dam or dcm methylation sites (Figure 4-13B). We
observed similar values in methyltransferase-deficient (dam–/dcm–) E. coli K-12, in which all
plasmids were prepared from a dam–/dcm– strain and were thus unmethylated. Overall, we
detected no difference in Cas9 activity on adenine-methylated, cytosine-methylated, and
unmethylated target sequences. These results are consistent with reports showing adenine
methylation does not affect CRISPR-mediated phage resistance in Streptococcus thermophilus
(342) and cytosine methylation does not affect SpCas9 activity on sequences with CpG sites in
human cells (343).
Figure 4-13 Cas9 cuts methylated cytosines and adenosines in E. coli.
(A) Synthetic targets were designed to contain one to two dam (orange) or dcm (blue) sites. A
control unmethylated sequence (+) was included. The PAM sequence NGG for SpCas9 recognition
is underlined. (B) In serial transformations, we selected for the coexistence of DS-SPcas, the
protospacer plasmid, and each spacer plasmid. The number of transformants was divided by the
number of colonies resulting from a control transformation using a spacer plasmid (-) that did not
target the protospacer plasmid. This relative number of transformants is plotted for E. coli K-12
and E. coli K-12 dam–/dcm– from three independent experiments. Lines represent the median.
135
4.3.4.3 Cas9 provides resistance against phages T7 and T4-like RB49 with unmodified DNA
We next tested the ability of SpCas9 to provide resistance to lytic phages without DNA
modifications by constructing spacers against phages T7 and RB49, neither of which contains
modified DNA. RB49 is a T4-like phage that is missing hydroxymethylase and βglucosyltransferase, which are required for modifying cytosine to hmC and hmC to β-ghmC,
respectively (344). We designed four spacers: two targeting the gene encoding the
primase/helicase enzyme of T7 (Figure 4-14A) and two targeting the gene encoding the major
capsid protein of RB49 (gp23), which is one of the most conserved regions across T-even phages
(344) (Figure 4-14C). We transformed each spacer-encoding plasmid into SpCas9-expressing E.
coli K-12 MG1655 and E. coli B to create strains protected from T7 and RB49 infection. We
challenged these strains with phage to calculate an efficiency of plating (EOP) compared to
unprotected strains; representative plaque plates are included (Figure 4-14A and Figure 4-14C).
In E. coli B, T7 had an EOP of 10-3 on cells expressing spacer 1 or 2 relative to cells
without spacers (Figure 4-14B). In E. coli K-12, spacer 1 reduced sensitivity to T7 infection by
four orders of magnitude, though spacer 2 only lowered sensitivity by one order of magnitude for
unknown reasons. RB49 had an EOP of 10-6 on E. coli B with spacer 1 or 2, and an EOP of 10-5
on E. coli K-12 with spacer 1 or 2 (Figure 4-14D). The decreased plaquing efficiencies of T7 and
RB49 on protected strains reflect Cas9 activity against invading unmodified phage DNA.
136
Figure 4-14 Cas9 reduces E. coli susceptibility to phages T7 and RB49.
(A) Spacers against T7 were targeted against the primase/helicase gene (gene 4A and 4B). The PAM
is underlined in the sequence and shown as a black box in the diagram showing the orientation and
location of the protospacer (white box) on the gene. In a representative T7 plaque assay of
protected and unprotected strains, there is substantial lysis on wild-type (wt) E. coli K-12, visible
plaquing on cells with spacer 2 (sp 2), and no plaques on cells with spacer 1 (sp 1). (B) The
efficiency of plating of T7 was calculated for each protected strain relative to the unprotected wildtype strain. Independent replicates of E. coli B (n = 4, 3, 3) and E. coli K-12 (n = 5, 5, 7) are plotted.
Lines represent the median. (C) Spacers against RB49 were constructed against the major capsid
protein (gp23). In a typical RB49 plaque assay, there is notable lysis on wild-type E. coli B, some
plaques on cells with spacer 1, and a few plaques on cells protected with spacer 2. (D) The efficiency
of plating of RB49 was quantified for each protected strain relative to the unprotected wild-type
strain. Shown are independent replicates of E. coli B (n = 5, 3, 3) and E. coli K-12 (n = 3, 3, 3). Lines
represent the median.
137
4.3.4.4 Cas9 provides resistance against mutant phage T4 with hmC DNA and wild-type T4
with ghmC DNA
Having established that Cas9 can confer resistance against non-modified phage, we
proceeded to challenge it with T4 phage containing either hmC or ghmC DNA. During
replication, wild-type T4 synthesizes hmC, which contains a hydroxymethyl group attached to
the C5 position of cytosine, by using hydroxymethylated dCTP serially converted from dCTP
(345). Then phage-encoded glucosyltransferases add a glucose group to the hydroxymethyl
group in α- or β-configuration (346) (Figure 4-15A). To investigate Cas9 activity against T4
without glucosylated DNA, we included mutant phage “T4 gt”, which has hmC rather than
ghmC due to non-functional glucosyltransferases (338). By using restriction enzymes with
varying sensitivity to modified cytosines (according to REBASE, http://rebase.neb.com/), we
confirmed that our stocks of phage T4 had ghmC, phage T4 gt had hmC, and phage RB49 did
not have ghmC or hmC (Figure 4-16).
Since T4 gp23 is homologous to gp23 from RB49, we modified our two spacers against
RB49 to match the sequences of T4, and also designed an additional spacer Figure 4-15B). We
tested these spacers using efficiency-of-plating experiments as before; representative plaque
plates are shown (Figure 4-15C). Assays involving T4 gt used restriction-less E. coli K-12
because wild-type K-12 restricts hmC DNA; the EOP of T4 gt on E. coli K-12 MG1655 is 10-4
compared to T4 on MG1655 (Figure 4-17). In restriction-less E. coli K-12, T4 gt exhibited an
EOP of 10-6 to 10-5 on cells carrying any one of the three spacers (Figure 4-15D). Wild-type T4
displayed an EOP of 10-5 on E. coli K-12 MG1655 with spacers 1 or 2, and an EOP of 10-3 on
cells expressing spacer 3. On E. coli B with any three spacers, T4 had an EOP of 10-6 to 10-4. As
the difference in EOP values for both T4 gt and wild-type T4 phages were comparable to those
of the non-modified T4-like phage RB49, our results demonstrate that SpCas9 is not impeded by
hydroxymethylation or glucosyl-hydroxymethylation of phage DNA.
138
Figure 4-15 Cas9 reduces E. coli susceptibility to phages T4 and T4 gt.
(A) The structures of cytosine and modified cytosines are shown. T4 gt has 100%
hydroxymethylated cytosines (hmCs). T4 has 100% glucosyl-hydroxymethylated cytosines (ghmCs),
specifically 70% α- and 30% β-ghmCs. The ghmC structure shown is in the β-configuration. (B)
Spacers against T4 were also designed against the major capsid protein (gp23), which is
homologous to that of RB49. For comparison, the RB49 protospacers are aligned below in italics,
where dots indicate identical nucleotides. In the T4 sequences, the PAM is underlined. The PAM
(black box) and protospacer (white box) are represented on the gene. (C) In a typical plaque assay
with T4 gt (left plate), there was complete lysis on wild-type (wt) restriction-less (r-l) E. coli K-12
and few plaques on cells with spacers 1, 2, or 3 (sp 1, sp 2, or sp 3). In an assay with T4 (right plate),
there was complete lysis on wild-type E. coli K-12 MG1655, numerous plaques on cells with spacer
1 or 3, and about a dozen on spacer 2. (D) The efficiency of plating of T4 and T4 gt was quantified
for each protected strain relative to the unprotected wild-type strain. Independent replicates of
restriction-less E. coli K-12 (n = 5, 3, 3, 5), E. coli K-12 (n = 4, 4, 5, 6), and E. coli B (n = 5, 3, 3, 3)
are plotted. Lines represent the median.
139
Figure 4-16 Restriction digest of phages.
Phage DNA was extracted by using the Qiagen Blood and Tissue Kit on 200 μL of phage stock. 10
or 20 U of each enzyme (1 μL) was added to 5 μL of 10X CutSmart Buffer (NEB) in a 50 μL
reaction volume containing approximately 100 ng of phage RB49 or T4 DNA, or 800 ng of T4 gt
DNA. The reactions were incubated at 37oC for 4 hours before visualizing on a 1% agarose gel
stained with SYBR Gold. As expected, DraI cuts all RB49, T4 gt, and T4; HpaII and NheI are
sensitive to methylated cytosines and only cut RB49; and XbaI has 50% activity on hmC and
partially cuts T4 gt. Blue text denotes cutting.
140
4.3.5 Discussion
Our discovery that S. pyogenes Cas9 is insensitive to methylation, hydroxymethylation,
and glucosyl-hydroxymethylation renders it unique among current genome-targeting
technologies, as both zinc-fingers (ZFs) and transcription activator-like (TAL) effectors can be
engineered to discriminate 5-methylcytosine from cytosine (347, 348). This difference may be
useful for biotechnological applications.
In our bioinformatics search for candidate natural spacers, we were only able to identify
two possible sequences against T4-like phages. This type of bioinformatics search is hampered
by the currently limited knowledge of specificity and tolerability of mutations in both the
acquisition and interference stages of CRISPR systems. While this paper was under review,
Fineran et al. published a report exploring the robustness of the E. coli CRISPR system, in which
degenerate target regions with up to 13 mutations in the protospacer and PAM can promote
“priming,” a positive-feedback mechanism to incorporate new spacers based on mutated or
outdated spacers (349). This suggests more lenient bioinformatics searches would be allowable.
Furthermore, our search is limited by available sequences of E. coli and phages known to modify
their DNA, as well as the possibility that these isolates do not encounter T4-like phages in their
environments. Future searches may provide additional evidence of CRISPR-based immunity to
DNA-modifying phages.
Interestingly, we observed that different spacers conferred differing levels of resistance
against phage infection. Since mutations in the protospacer or PAM can allow phage to escape
(350, 351), we sequenced Cas9-targeted regions of plaques that appeared on protected strains.
Indeed, T4 and T7 plaques on protected E. coli had mutated one nucleotide in the PAM, or one
to two nucleotides in the protospacer (Table 4-2). Less effective spacers may be targeting
sequences that are more readily mutated, though we cannot rule out the non-mutually exclusive
possibility that Cas9 acts more slowly on certain sequences and thus allows phage-induced lysis
to outpace Cas9-enabled protection. In S. thermophilus CRISPR1 and CRISPR3 systems, the
uncut phage genome can still be observed in bacteriophage-insensitive mutants (352, 353).
Further investigation of how some but not all phage DNA molecules escape Cas9 cutting during
phage infection is needed.
While phages may inactivate CRISPR proteins (354) or encode their own CRISPR-Cas
systems (355), we have demonstrated that DNA modifications that normally circumvent bacterial
restriction systems do not impede Type II CRISPR systems. Our findings may help explain why
DNA modifications remain uncommon among bacteriophages characterized to date whereas
nearly half of bacteria have CRISPR structures (332).
141
Figure 4-17 Efficiency of plating of T4 gt on wild-type E. coli K-12.
Calculated relative to either T4 infecting E. coli K-12 or T4 gt infecting restriction-less E. coli K-12,
phage T4 gt forms plaques on E. coli K-12 at four orders of magnitude less efficiently (red data
points). As a general comparison of restriction-modification versus Cas9-mediated protection, Cas9
provides around an order of magnitude greater resistance to phage infection on average, though the
level of resistance varies by sequence (blue data points). Independent replicates (n = 11, 5, 5, 5, 4, 15)
are plotted; lines represent the median. Cas9+ data were compiled from experiments with various
spacer sequences as described in the text.
142
Phage
Target
Spacer-PAM sequence
Mutation in phage
Host
T4
spacer 1
ATATCGAAAGCAATCAGGTTAGG
ATATCGAAAGCAATCACGTTAGG
ER1821
T4 gt
spacer 1
ATATCGAAAGCAATCAGGTTAGG
ATATCGAAAGCAATCAGGTTAGC
ER1821
T4
spacer 2
AAGAACTTCCAACCGGTAATGGG
AAGAACTTCCAACCGGTAATGGC
MG1655
T4
spacer 2
AAGAACTTCCAACCGGTAATGGG
AAGAACTTCCAACCGGTAATGGC
MG1655
T4
spacer 3
GATGCTGATGCTGAACTGTCTGG
GATGCTGATGCTGAACTGTCTGA
MG1655
T4
spacer 3
GATGCTGATGCTGAACTGTCTGG
GAAGCTGATGCTGAACTGTCTGG
MG1655
T4 gt
spacer 3
GATGCTGATGCTGAACTGTCTGG
GATGCTGATGCTGAACTGTCTGT
ER1821
T7
spacer 1
TTCGGGAAGCACTTGTGGAATGG
TTCGGGAAGCACTTGTGGAATGT
MG1655
T7
spacer 1
TTCGGGAAGCACTTGTGGAATGG
TTCGGGAAGCACTTGTGGAATTG
MG1655
T7
spacer 2
GATGCTTGAGGAGTCCGTTGAGG
GATGCTTGAGGAGACCGCTGAGG
MG1655
T7
spacer 2
GATGCTTGAGGAGTCCGTTGAGG
GATGCTTGAGGAGACCGCTGAGG
MG1655
T7
spacer 2
GATGCTTGAGGAGTCCGTTGAGG
GATGCTTGAGGAGACCGCTGAGG
B
T7
spacer 2
GATGCTTGAGGAGTCCGTTGAGG
GATGCTTGAGGAGACCGCTGAGG
B
Table 4-2 Phage escapee analysis.
13 plaques that formed on Cas9-protected host E. coli strains were sequenced at targeted sites to
identify mutations. PAM sequences are underlined. Mutations are bolded and double-underlined.
4.3.6 Acknowledgements
This work was supported by US Department of Energy grant DE-FG02-02ER63445 (to
GMC) and the Wyss Institute for Biologically Inspired Engineering. SJY was supported by a
National Science Foundation Graduate Research Fellowship and KME by the Wyss Technology
Development Fellowship.
143
4.4
Complete genome sequences of 11 T4-like bacteriophages
This section has been adapted from:
Stephanie J. Yaung, Kevin M. Esvelt, George M. Church. Complete Genome Sequences of T4like Bacteriophages RB3, RB5, RB6, RB7, RB9, RB10, RB27, RB33, RB55, RB59, and RB68.
Genome Announcements 3(1):e01122-14 (2015). Ref. (356)
4.4.1 Abstract
T4-like bacteriophages have been explored for phage therapy and are model organisms
for phage genomics and evolution. Here we describe the sequencing of 11 T4-like phages. We
find high nucleotide similarity among T4, RB55, and RB59; RB32 and RB33; and RB3, RB5,
RB6, RB7, RB9, and RB10.
4.4.2 Genome announcement
Complete sequences of T4-like myoviruses would enhance studies of phage evolution
and genomics as well as biotechnology applications involving phage cocktails. In this study, we
sequenced RB3, RB5, RB6, RB7, RB9, RB10, RB27, RB33, RB55, RB59, and RB68. “RB”
phages were originally isolated by Rosina Berry in 1964 from six sewage treatment plants in
Long Island, New York for studies on T-even phage speciation (357).
We prepared phage lysates as previously described (326) from host Escherichia coli B
(CGSC 5365), extracted DNA with the Phage DNA Isolation Kit (Bio-world, Dublin, OH), and
sequenced the samples as paired-end 250 bp reads on the MiSeq instrument (Illumina, San Diego,
CA). 789,300 (RB6) to 3,932,449 (RB7) paired reads were generated per sample. On average,
82.8% pairs survived quality control and trimming with Trimmomatic (358). Insert sizes were
~330 bp; the median coverage of sequenced phages was 2,966 X, ranging from 259 X (RB55) to
6,985 X (RB7). We performed de novo assembly using Velvet (359) version 1.2.08 with k-mer
lengths of K51, K57, and K63, and were able to obtain a single ~168 kbp contig from at least one
of the assemblies. We used Geneious version 7.1.7 for post-assembly processing and filled any
assembly gaps by iterative mapping of reads to the scaffold.
The circularly permuted linear double-stranded DNA genomes of the 11 RB phages have
lengths of ~168 kbp. Approximately 270 open reading frames (ORFs) per phage were predicted
with Glimmer 3 (360). Annotations were transferred from published genomes of T4 and T4-like
phages with at least 98% similarity. Remaining ORFs were annotated by lowering the similarity
144
cutoff to 70% or performing BLAST searches (286). Eight to ten tRNAs were predicted in each
genome by tRNAscan-SE 1.21 (http://lowelab.ucsc.edu/tRNAscan-SE/, (361)). Following the
convention in T4-like phages, we oriented completed genomes to start with rIIA.
The sequenced phages share similar genome organization and nucleotide identity. Using
progressiveMauve alignment (362), we found that RB7, RB27, RB33, and RB68 are 73-86%
similar to one another, and are ~75% identical to T4. Furthermore, RB33 shares 99.93%
similarity with RB32. RB55 and RB59 are 99.8% similar to T4 and are 99.96% identical to each
other. We noted a high nucleotide similarity (99.99%) amongst RB3, RB5, RB6, RB7, RB9, and
RB10. RB5 differs from RB6 by four bases (one nonsynonymous, one synonymous, and two
intergenic); the nonsynonymous difference occurs in the baseplate wedge subunit and tail pin,
gene product 11 (gp11). RB7 and RB9 differ by three nucleotides (two nonsynonymous and one
intergenic); the two nonsynonymous bases are in the baseplate hub subunit tail length
determinator (gp29) and hypothetical protein NrdC.4. The extent to which these differences
affect host range is unclear given limited data on the total number but not exact profile of
susceptible E. coli strains within the ECOR collection for each phage (320). Relationships
between genome and host range variation could provide insights into mechanisms of host
specificity.
Nucleotide sequence accession numbers. Genome sequences have been deposited in GenBank.
Accession numbers are listed in Table 4-3.
KM606994
Genome
size (bp)
168,402
Coverage
(X)
2,831
No. of
CDSs
273
No. of
tRNAs
10
Enterobacteria phage RB5
KM606995
168,394
3,449
271
10
Enterobacteria phage RB6
KM606996
168,394
1,474
271
10
Enterobacteria phage RB7
KM606997
168,395
6,985
272
10
Enterobacteria phage RB9
KM606998
168,395
2,826
272
10
Enterobacteria phage RB10 KM606999
168,401
2,798
272
10
Enterobacteria phage RB27 KM607000
165,179
2,966
271
10
Enterobacteria phage RB33 KM607001
166,007
3,355
274
8
Enterobacteria phage RB55 KM607002
168,896
259
272
8
Enterobacteria phage RB59 KM607003
168,966
3,158
276
8
Enterobacteria phage RB68 KM607004
168,401
3,187
276
9
Strain
Accession no.
Enterobacteria phage RB3
Table 4-3 Genome features of the sequenced strains
145
4.4.3 Acknowledgements
This work was supported by NSF Small-Business/ ERC Collaborative Opportunity grant
IIP-1256446 to Gingko Bioworks and GMC. SJY was supported by a NSF Graduate Research
Fellowship and KME by the Wyss Technology Development Fellowship.
DNA preparation and sequencing were completed at the Molecular Biology Core
Facilities of the Dana-Farber Cancer Institute. Analysis was performed on the Orchestra cluster
supported by the Harvard Medical School Research Information Technology Group. We thank
Henry M. Krisch for the phages.
146
4.5
Generating effective CRISPR spacers against bacteriophages
4.5.1 Introduction
In this section, we describe a high-throughput library selection for effective spacers
against T4-like phages. To demonstrate the approach, we focused on a subset of phages,
particularly T6, RB15, RB33, and RB69, which infect a large number of E. coli strains ((320)
and Figure 4-18). We included RB69 since it shares less sequence similarity with the other
phages in this study (Table 4-4). Furthermore, we were interested in testing different phages,
because we noticed that a highly effective spacer (with an efficiency of plating, or EOP, less than
10-4) against one phage could be similarly effective at a homologous region in another phage. For
example, based on the T4 spacer 2 we characterized in Section 4.3, which we will call spacer
T4.Y in this section, we discovered that homologous spacers in phages RB49 and RB69 were
similarly effective (Figure 4-19).
Figure 4-18 Host range of T4-like phages.
The graph depicts the number of strains each phage could infect, based
on 72 strains in the ECOR collection and 4 laboratory E. coli strains.
Adapted from Ref. (320).
T6
RB15
RB33
RB69
82%
80%
80%
RB33
97%
96%
RB15
97%
Table 4-4 Pairwise similarity of phages T6, RB15, RB33, and RB69.
Values were calculated using BLAST (megablast default settings) at
http://blast.ncbi.nlm.nih.gov/ (286).
147
Figure 4-19 Spacer Y confers protection in phages T4, RB49, and RB69.
Sequence differences from the T4.Y spacer are bolded for spacers RB49.Y and RB69.Y.
All three target a homologous region in the major capsid protein (gp23) gene.
Nevertheless, we found many ineffective (EOP ~ 10-2 to 1) spacers. Given the immense
variation in spacer activity against phage infection (Figure 4-21), we tested a large library of
spacers against T4-like phages to identify effective anti-phage activity. In developing a selection
method, we observed inconsistent results using chemostats and batch culture. Thus, we used
phage-embedded soft agar that allows for isolation of more effective spacers. After validating the
approach with a mock selection experiment, we proceeded to construct a library of over 12,000
spacers targeting phages T6, RB15, RB33, and RB69. Using high-throughput sequencing to
determine which spacers were enriched, we were able to identify and confirm top spacers against
each phage.
4.5.2 Materials and methods
4.5.2.1 Strains and constructs
Phage T4 was obtained from the Coli Genetic Stock Center (CGSC); phage T6 from
DSMZ; and phages RB15, RB33, and RB69 from H. M. Krisch. Wild-type E. coli K-12
MG1655 and E. coli B (CGSC 5365) were used for phage propagation and selection experiments.
Methods for phage propagation and plaque assays to determine the level of Cas9-mediated
resistance to phage infection were conducted as described in Section 4.3. Highly competent E.
coli NEB Turbo (New England Biolabs, Ipswich, MA) were used for plasmid library
construction. In general, E. coli were grown at 37oC in LB broth and supplemented with
antibiotics as needed at final concentrations of 100 μg/mL spectinomycin, 30 μg/mL
chloramphenicol, and 100 μg/mL carbenicillin.
148
Cells expressing SpCas9 were constructed by transforming in DS-SPcas (Addgene
plasmid 48645, (21)), which encodes SpCas9 and its cognate tracrRNA on a backbone with a
cloDF13 origin of replication and aadA gene. We maintained the designed spacer on a separate
plasmid (based on PM-SP!TB, Addgene plasmid 48650, (21)) that expressed one spacer
followed by the SpCas9 repeat on a backbone with a p15a origin of replication and cat gene. For
most of our experiments, we used the single guide RNA (sgRNA) form, instead of the dual RNA
format (with a tracrRNA and a crRNA expressed from the spacer and repeat). In these cases we
removed the tracrRNA portion from DS-SPcas to construct plasmid DS-cas9-nt. We also
swapped the spacer and repeat on PM-SP!TB with the equivalent sgRNA sequence. For the
mock selection experiment, we assembled a compatible plasmid with a pBR322 origin of
replication and a bla gene, and if needed to label a strain, GFP.
In constructing the spacer library for high-throughput screening against phage, we
designed a destination vector for Golden Gate assembly (Figure 4-20). The vector is similar to
the sgRNA format of plasmid PM-SP!TB, except that it has a stronger promoter (J23119 instead
of J23110), and instead of a spacer, there is an RBS and GFP coding sequence flanked by BsaI
sites. This design permits quick screening for backbone (GFP+) versus candidate clones (GFP-)
after assembly with spacer inserts prepared to contain compatible overhangs.
Figure 4-20 Library construction and sequencing design.
149
4.5.2.2 Spacer library design
Below are steps we took to generate an oligonucleotide library that could be synthesized
as a 79 nt CustomArray Oligo Pool (CustomArray, Inc, Bothell, WA):
1. Find all possible NGGs and store the 20 nt upstream as candidate spacers for phages
T6, RB15, RB33, and RB69.
2. Filter by melting temperature, keeping spacers between 50 and 57 as calculated by the
oligoprop function in MATLAB. The range was set based on effective spacers against
phages T4, T7, and RB49 we characterized previously in Section 4.3.
3. Drop spacers with GGGG homopolymers.
4. Search and filter out spacers with off-targets to E. coli. Look at E. coli strains K-12
MG1655 and Nissle 1917 and allow one mismatch in the 15 nt closest to the PAM.
5. Rank the remaining spacers by secondary structure. Calculate the minimum free
energy (MFE) of each spacer using the entire sgRNA sequence using ViennaRNA
(363). Select the top 12,472 for synthesis (~4000 to 5000 per phage).
Since we aimed to construct one input library per phage, but some spacers hit multiple
phages, we devised a pooling strategy that would allow us to selectively combine spacers that hit
a phage regardless of its potential cross-reactivity with other phages. For example, for the T6
library, we needed to combine spacers hitting only T6, spacers hitting both T6 and one of the
other three phages, spacers hitting T6 and two of the other three spacers, and spacers hitting all
four phages. This amounted to a total of 15 different sub-pools for four phages (since we
excluded the combinatorial case of 4 choose 0 in which spacers do not hit any phage), and
required pooling 8 different sub-pools for each phage.
The barcodes we used were derived from species-unique primers from Chapter 3 for nonE. coli species and tweaked for no or minimal predicted secondary structure. The barcodes and
primers are listed in Table 4-5. We then designed the oligos to contain appropriate barcodes and
BsaI recognition sites, which are underlined:
barcode + TGGTGCCGGTCTCATAGC + spacer + GTTTAGAGACCAGCCGTTGTG
150
name
lib_f
lib_r
name
lib_fN1.1
lib_fN1.2
lib_fN1.3
lib_fN1.4
lib_fN2.12
lib_fN2.13
lib_fN2.14
lib_fN2.23
lib_fN2.24
lib_fN2.34
lib_fN3.o1
lib_fN3.o2
lib_fN3.o3
lib_fN3.o4
lib_fN4
sequence
CTTTATATCTAATATACAATGGTGCCGGTCTCATAGC
total members
12472
CTCATAAACATTAAAAACACAACGGCTGGTCTCTAAAC
sequence
GACCTTTGATAGTTACAGCGTGG
ACCATCTTCTATTGAAACGCTGG
TGGAGAAGAAGTCGGGAATGT
AAGTATCACTAAGCCGCATGTG
GGAGACAGGACATCAACTTTTGG
GACGCTTATGGTTAGAACCTTGG
AGTAACGGAGATAGTGAAGATGG
GGTATCCTGGATTACACGAATGG
GGTACTTACGTCAACTGGAATGG
GGACAAGTATAAAGGGGAAGTGG
GCGGAGTTCTATAGTATGGCTG
TAATCATTAAACCCGCTGCTGG
TGTTGCATCCTTCGTTGAATGG
TGGGTAGTTTTGAGTTTTGGTGG
ACGAGATACTTCAGTTCGGCT
target phage(s)
T6
RB15
RB33
RB69
T6, RB15
T6, RB33
T6, RB69
RB15, RB33
RB15, RB69
RB33, RB69
RB15, RB33, RB69
T6, RB33, RB69
T6, RB15, RB69
T6, RB15, RB33
T6, RB15, RB33, RB69
total
members
1493
1404
1550
4936
586
478
15
577
26
12
21
9
18
1287
60
Table 4-5 Primers for amplifying sub-pools of oligonucleotides based on barcodes.
The barcode portion is in underlined. Primer lib_r is paired with each of the other 15 forward
primers. Pairing it with primer lib_f will amplify all oligos.
4.5.2.3 Spacer library construction and selection on phage-embedded agar
Using 0.3 μM of primers for each sub-pool (Table 4-5) and 10 ng of the oligo library as
template, we performed PCRs (KAPA HiFi HotStart ReadyMix PCR Kit, Kapa Biosystems,
Wilmington, MA) consisting of 25 cycles of 15 s annealing at 64oC and 10 s extension. For some
samples, we used 20 cycles and 10 s annealing, or decreased the primer concentration to 0.1 μM.
The products were purified with MinElute purification (QIAGEN), quantified by Nanodrop, and
pooled by the expected frequency of each PCR sub-pool in the final library for each phage.
The pooled PCRs were combined with the destination vector in 15 μL Golden Gate
reactions, each composed of:
3 μL of destination plasmid (33 ng/μL)
3 μL of each pooled PCRs for each phage
1.5 μL 10X T4 ligase buffer (NEB)
0.15 μL 100X BSA (NEB)
151
1 μL BsaI (NEB)
1 μL T4 ligase (NEB; 400,000 U/mL)
5.35 μL ddH2O
Golden Gate assemblies were carried out for 25 cycles of 3 min at 37oC and 4 min at
16oC, followed by 1 cycle of 5 min at 50oC 5 and 5 min at 80oC. Then 5 μL of each was
transformed into 25 μL of NEB Turbo chemically competent cells. After 2 hours of recovery in
250 μL SOC, 10 μL was plated on LB+chloramphenicol agar for characterization while the rest
of the culture was diluted into 3 mL LB+chloramphenicol for overnight growth. Plasmids were
isolated from 2 mL of culture the next day (QIAprep Spin Miniprep Kit, QIAGEN).
From the platings of recovered assemblies, we found 5-7% GFP+ cells across the
different libraries. When we picked some clones for Sanger sequencing (Genewiz, South
Plainfield, NJ) to verify appropriate incorporation of our spacers, we discovered that 31% of
clones had an indel in the 20 nt spacer region of the plasmid. This was in the expected range of
oligo synthesis, which has a 0.5-1% error rate per cycle, corresponding to 1 – 0.99579 = 33% for
oligos of length 79.
To prepare cells for phage selection experiments, we transformed 100 ng of each library
into 50 μL of electrocompetent E. coli cells. We used both E. coli B and MG1655 strains that
carried the DS-cas9-nt plasmid. Cells were recovered in 500 mL SOC for 2 hours. Aliquots were
plated for verification, while the rest of the culture was diluted into 5 mL with antibiotics
(spectinomycin+chloramphenicol) for overnight recovery.
Fresh phage stocks with titers of ~109 PFU/mL were used for preparing phage-embedded
agar at 10 μL phage/mL soft agar. 6 mL of the mixture with appropriate antibiotics was poured
onto one-well rectangular petri plates, which already contained 6-7 mL of cooled regular LB
agar on the bottom. After the soft agar cooled, we gently spread 60 μL of cells (10 μL cells per
mL of soft agar) on top, let the plates dry, and inverted them for incubation overnight at 37oC.
Two of the selections (for phages T6 and RB69) were performed in duplicate for host strain E.
coli B. The next day, 3 mL of PBS buffer was used to gently but quickly scrape off cells without
disturbing the phage-embedded soft agar if possible. The final 1-1.5 mL of recovered mixture
was kept on ice until all samples were ready for plasmid extraction (Miniprep, QIAGEN).
4.5.2.4 Library sequencing and analysis
The extracted plasmids were used as template for nested PCRs to amplify the spacer
sequences for high-throughput sequencing. We pooled each of the four input phage libraries by E.
coli strain, which resulted in a total of two input samples to sequence (Bin and Min). The ten
152
output libraries were kept separate in individual PCRs. We designed primers for two sequential
PCRs – the first was an inner PCR to amplify the spacer region and the second was an outer PCR
to add compatible sequencing indices (Table 4-6). We used ~50 ng of plasmids in the first 20 μL
PCR with 25 cycles of 15 s annealing at 62oC and 10 s extension. The second PCR was modified
to anneal at 65oC. Custom sequencing primers (Table 4-7) were used for a paired-end 2 x 30 bp
run on the MiSeq instrument (Illumina, San Diego, CA) at the Molecular Biology Core Facilities
of the Dana-Farber Cancer Institute.
First (inner) PCR primers:
GAAGCCGTTCTCGATGGACGAcagctagctcagtcctaggtataa
L1_initial_f
AGGGAACTGAAAGTGGTGGATGTGcagctagctcagtcctaggtataa
L2_initial_f
GACGGACAGACGGcaagttgataacggactagcctta
L1L2_initial_r
Second (outer) PCR primers:
AATGATACGGCGACCACCGAGATCTACACCCTGCGGAAGCCGTTCTCGATGGACGA
L1_F
CAAGCAGAAGACGGCATACGAGATATTACTCGGACGGACAGACGGcaagttga
input_1_R
CAAGCAGAAGACGGCATACGAGATCGCTCATTGACGGACAGACGGcaagttga
T6_M_R
RB15_M_R CAAGCAGAAGACGGCATACGAGATATTCAGAAGACGGACAGACGGcaagttga
RB33_M_R CAAGCAGAAGACGGCATACGAGATCTGAAGCTGACGGACAGACGGcaagttga
RB69_M_R CAAGCAGAAGACGGCATACGAGATCGGCTATGGACGGACAGACGGcaagttga
CAAGCAGAAGACGGCATACGAGATTCTCGCGCGACGGACAGACGGcaagttga
T6_B2_R
L2_F
input_2_R
T6_B_R
RB15_B_R
RB33_B_R
RB69_B_R
RB69_B2_R
AATGATACGGCGACCACCGAGATCTACACGGAAGGTAGGGAACTGAAAGTGGTGGATGTG
CAAGCAGAAGACGGCATACGAGATTCCGGAGAGACGGACAGACGGcaagttga
CAAGCAGAAGACGGCATACGAGATGAGATTCCGACGGACAGACGGcaagttga
CAAGCAGAAGACGGCATACGAGATGAATTCGTGACGGACAGACGGcaagttga
CAAGCAGAAGACGGCATACGAGATTAATGCGCGACGGACAGACGGcaagttga
CAAGCAGAAGACGGCATACGAGATTCCGCGAAGACGGACAGACGGcaagttga
CAAGCAGAAGACGGCATACGAGATAGCGATAGGACGGACAGACGGcaagttga
Table 4-6 Primers for amplifying libraries for high-throughput sequencing.
The first PCR uses primer L1_initial_f or L2_initial_f with L1L2_initial_r. The second PCR uses
primer L1_F or L2_F with appropriate _R primers for each sample. Sequencing indices are in red.
Read 1 sequencing primers:
CGTTCTCGATGGACGAcagctagctcagtcctaggtataatgctagc
Set1_r1
CTGAAAGTGGTGGATGTGcagctagctcagtcctaggtataatgctagc
Set2_r1
Index read sequencing primer:
Set1and2_index tagaaatagcaagttaaaataaggctagtccgttatcaacttgCCGTCTGTCCGTC
Read 2 sequencing primer:
GACGGACAGACGGcaagttgataacggactagccttattttaacttgctatttcta
Set1and2_r2
Table 4-7 Custom sequencing primers.
153
From the 1.1 to 1.6 million paired reads per sample, we obtained 0.8 to 1.4 million paired
reads per sample (72-90%) after post-processing, which consisted of merging the reads using
SeqPrep (https://github.com/jstjohn/SeqPrep) and custom Python scripts that retained reads with
an exact 10 nt match to the 3’ end of the sgRNA backbone (Figure 4-20) and an exact match
with the remaining 20 nt to the designed spacer library. Any 20 nt sequences that did not match
to the expected library accounted for very few reads and could be discarded. Statistical analysis
of enriched spacers was performed using edgeR (364). Spacers with a false discover rate (FDR)
below 5% were considered significant.
4.5.2.5 Top spacer validation
We ordered two oligos for each spacer we sought to validate to ligate into the sgRNA
backbone vector. The oligos were of the form: 5’-TAGC-(20 nt spacer)-3’ and 5’-AAAC-(20 nt
reverse complement of spacer)-3’. Oligos were mixed in annealing buffer (10 mM Tris, 50 mM
NaCl, 1 mM EDTA, pH 7.5–8.0) and incubated at 95 °C for 3 min. After cooling to room
temperature, the oligos were combined with BsaI-digested backbone in 20 μL ligation reactions
(NEB T4 DNA Ligase), transformed into NEB Turbo cells, and sequenced for correct clones.
4.5.3 Results
4.5.3.1 Mock selection
To validate the selection approach, we started with a mixture of eight strains, each
carrying one ineffective spacer against T4 (Figure 4-21). We spiked in 1% of GFP-labeled cells
that had an effective spacer. After one round of selection on the phage agar, we observed ~70%
GFP+ colonies, though this enrichment did not improve with a second round of selection in
which we re-introduced the surviving cells onto the same concentrations of phage (Figure 4-22).
We used this phage-embedded agar as a one-round selection method on our larger spacer library.
4.5.3.2 Library selection
Relative to the input library, the abundance of spacers after phage selection ranged from
1000X depletion to 1000X enrichment, though medians were near zero across samples (Figure
4-23). We also found potential differences between host strains; while relative fold changes
correlated (R2 ~ 0.5) for phages T6 and RB15, values were not as consistent between E. coli B
and MG1655 for RB33 and RB69 (Figure 4-24). Therefore, we ranked statistically significant
spacers from analyses run separately for each strain as well as run on both strains taken together
(Table 4-8). We selected four to six top spacers for each phage to validate experimentally.
154
Figure 4-21 Mock library composition of T4 spacers.
Plaque assays testing 24 different T4 spacers. Dark quadrants represent completely lysed host
cells from the phage infection. Light quadrants represent phage-resistance and therefore cell
growth. Intermediate levels of immunity correspond to visible plaques (dark spots) formed by
the phage on host cells.
Figure 4-22 Mock library selection enriched for effective spacer.
As described for the larger library selection, we mixed phage with soft
agar and appropriate antibiotics to prepare the selection substrate. Cells
were then gently plated on top; surviving cells were collected and replated on fresh phage-embedded agar for a second round of selection.
155
Figure 4-23 Fold change of spacers after phage selection.
Counts for each spacer were normalized to the total counts in the sample for that
phage library. Then the log base 2 fold change was calculated for each postselection sample relative to the input, matched for phage and host E. coli strain (B
or MG1655). Duplicate selections for strain B are labeled as B1 and B2.
156
Figure 4-24 Host strain differences across selection experiments.
157
Table 4-8 Features of top spacers used for validation assays.
The functional category of targeted genes is based on phage T4 annotation (365). n.s. = not
statistically significant.
While only top spacers are shown here and included in validation experiments, there were many
more that were significant (FDR < 0.05), had a log fold change greater than 1, and did not have
zero input counts. The number of spacers that fit these criteria for analyses run on E. coli B and
MG1655 data were 300 for T6, 461 for RB15, and 29 for RB69. In the separate strain evaluations
for RB33, there were 296 spacers for E. coli B and 251 for E. coli MG1655.
158
4.5.3.3 Validation of top spacers
We performed initial validation by spotting cells containing a single spacer on phageembedded agar (Figure 4-25). Since we were interested in potential cross-reactivity (i.e., a spacer
from phage A selection also being effective against phage B), we screened all cloned spacers
against all four phages as well as phage T4. Since the cell densities were normalized and
presumably phage were sufficiently well-distributed across the soft agar, we tallied crude counts
of plaques that formed on the plates (Figure 4-26). We then took the subset of apparently active
spacers and conducted a secondary validation with better quantification using plaque assays.
Many spacers provided four to six orders of magnitude of protection against phage lysis
compared to unprotected controls (Figure 4-27).
Figure 4-25 Initial validation of top spacers using phage-embedded agar.
(Left panel) Cells (E. coli B or MG1655) carrying each spacer were arrayed in half of a 96-well
plate. WT = wild-type, unprotected cells. LB = media only control. Various “Y” spacers were
additional controls.
(Right panel) The result for phage RB33 is shown here as a representative plate. Protected cells
were able to grow and form a visible spot on the agar.
Left half of imaged plate: Cells were normalized to OD600nm = 0.3. Each spot has approximately 105
cells and 105 phage (MOI ~ 1).
Right imaged half: 10X diluted cells, which corresponded to MOI to ~10.
159
Figure 4-26 Semi-quantitative results of initial validation screen of top anti-phage spacers.
Top spacers from each phage selection are listed on the left and labeled as [phage].[spacer#],
while the infecting phage in the validation assay is listed across the top. “Y” spacers are
previously constructed strains that serve as comparison. Results for each E. coli strain, B and
MG1655 (abbreviated “M”), are separated. Each value represents the mean number of
plaques formed on the spot of E. coli cells from two experiments, one at a 10 X more dilute cell
density than the other. The values are roughly colored from white to blue for increasing
immunity against phage; completely lysed E. coli are represented as white cells with no
numerical value in the table, whereas E. coli with no visible plaques are “0” in blue.
160
Figure 4-27 Quantitative validation of screened spacers using plaque assays.
For each spacer and infecting phage, the efficiency of plating (EOP) was
calculated relative to unprotected E. coli strains (“WT”). Smaller circles indicate
values near that assay’s detection limit (i.e., true values are further to the right).
161
4.5.4 Discussion
We demonstrated that phage-embedded agar can selectively enrich for high-activity antiphage spacers. The most active spacers conferred protection at efficiencies of plating ranging
from 10-4 to 10-6. Interestingly, not all top spacers provided phage resistance in the validation
assays. It is possible that the less effective spacers resulted from an expansion of receptor mutant
populations, or that not enough selection pressure (i.e., phage) was applied. For instance, we
actually used 10X less phage in the RB33 selection in the E. coli B strain – both of the top
spacers (RB33.1 and RB33.4) we attempted to validate from that experiment did not provide
protection against RB33 infection. The most successful selections in this study were with phage
T6, in which all six of the top spacers provided phage resistance at EOP ~10-5.
Some spacers were broadly active and provided resistance against infection by other
phages. These included T6.1, T6.2, T6.4, and RB69.5. Since the RB69.5 sequence has exact
matches to the genomes of phages T6, RB15, and RB33, its cross-reactivity is not unexpected.
However, the three T6 spacers have mismatches to homologous regions in the RB15, RB33, and
RB69 genomes (Table 4-9). This suggests that in future studies, it would be worthwhile to screen
all selected spacers against all phages of interest in the follow-up validation, not just the expected
target phage on which the selection was performed. Furthermore, it suggests that it may be
sufficient to generate a simpler library, without the need for barcoding several sub-pools as we
did here, since a member of another phage library could provide protection. One caveat to this
alternative approach would be to ensure sufficient sequencing coverage of the entire library. As
we observed from our pooled input library sequencing results, low counts could skew calculated
enrichment values; we decided to exclude spacers that had zero counts in the input sample, and
may have thereby missed effective spacers that happened to not be sequenced in the input.
Nevertheless, a critical consideration for continuing to use a sub-pooling design is that for
our ultimate application for phage-assisted population cycling in a microbial community, we
would like to employ orthogonal spacers and phages to control replacement of different bacterial
strains. This requires phage specificity in high-activity spacers. Using the results from this work,
if we were to use phages T6, RB15, RB33, and RB69, we would select the following phagespecific spacers: T6.3 and T6.5 against T6, RB15.3 and RB15.4 against RB15, spacer RB33.3
against RB33, and spacer RB69.3 against RB69. However, if we were interested in using a crossreactive spacer, in the case where we would like to introduce two phages at once for example, we
would select spacer T6.6 against phages T6 and RB15, or spacer RB69.6 against phages RB15
and RB69.
162
Phage
Protospacer
PAM
T6
RB15
RB33
RB69
Spacer T6.1
GCAATCGACTAATCCAGAAT
ACAATCGACTAATCCAGAAT
ACAATCGACTAATCCAGAAT
ACAATCAACTAAACCAGAAT
GGG
GGG
GGG
GGG
T6
RB15
RB33
Spacer T6.2
TTGAACCATACACTGCTATT TGG
TTGAACCATATACTGCTATT TGG
TTGAACCATATACTGCTATT TGG
T6
RB15
RB33
Spacer T6.4
ATTAATGGTCTTCCTGTTGT AGG
ATTAACGGTCTTCCTGTTGT TGG
ATTAACGGTCTTCCTGTTGT TGG
T6
RB15
Spacer T6.6
TTAACTCTCGCTCGCATAGT AGG
TTAACTCTTGCTCGCATAGT AGG
Table 4-9 Sequence analysis of cross-reactive spacers.
For each spacer, the homologous regions in other
phages are listed below, where mismatches are in bold
and underlined.
163
Table 4-10 Comparison of quantified spacer activity with library selection data.
Raw counts are included for each input and output library by phage. With the exception of phage
RB33, all analyses here were based on considering E. coli B and MG1655 strains together in edgeR.
The log base 2 fold change (logFC) and false discovery rate (FDR) are reported. FDR values less
than 0.05 are in red. Any validated phage resistance activity is relative to an unprotected control
and reported as log base 10 of the efficiency of plating (EOP).
164
Interestingly, when we re-examined the log fold change and FDR values for any crossreactive spacers that were present in the full list of selected spacers for each phage, we found that
they were usually several fold lower in enrichment or not statistically significant (Table 4-10).
Moreover, by including some quantitative data for less effective spacers, we observed that a
meaningful log fold change cutoff could be ~4, though that does not always hold up, such as for
spacer RB69.5 in the selections with phage RB15 and phage RB33.
We also noticed that the top spacers almost all contained the nucleotide “T” at the
position closest to the PAM (Table 4-10). To investigate this further, we calculated the
nucleotide frequencies at each position of the 20 nt spacer for both input and output libraries
(Figure 4-28). In general, there is already a slight enrichment for T across all positions in the
input libraries, but further enrichment of T is most noticeable at the last position in phage T6 in
both E. coli strains. A similar trend is seen in RB15 for both strains, RB33 for E. coli B, and
RB69 for E. coli B. Amongst our validated spacers, only spacer RB69.6 does not have a T in the
final position – it has an A instead. Further validation of other effective spacers in quantitative
plaque assays are needed to examine this finding.
For future studies, several parameters should be considered, include using different phage
concentrations in the selection and validation. The use of higher phage concentrations during the
selection could better enrich for highly effective spacers. And the use of a range of phage titers in
the validation assays could reveal a gradient of anti-phage activity. Specifically, more spacers
that are mildly effective (EOP ~ 10-2) could be validated and matched back to the selection data.
Furthermore, spacer libraries should use a sgRNA format for consistent selection. By
coincidence, spacer RB69.3 is the same sequence as RB69.Y, which we had previously picked
based on effective “Y” spacers targeting the same region in other T4-like phages. This provided
insight into effects of promoter strength and the dual RNA versus sgRNA format, since the
plasmid encoding RB69.Y has a weaker promoter (J23110 instead of J23119) and expresses the
CRISPR RNA from a spacer-repeat array – the tracrRNA is encoded by the DS-SPcas plasmid.
Although phage resistance is comparable between RB69.3 and RB69.Y in E. coli B, the weaker
promoter and possibly less efficient RNA processing (as two RNAs must come together in the
cell) render the RB69.Y version less stable and less effective in E. coli MG1655 (Figure 4-27).
Finally, we mapped enriched spacers back to the phage genomes to investigate whether
particular genes or regions were sensitive to Cas9-mediated cleavage. We discovered that
commonly targeted genes encoded proteins for nucleotide metabolism, DNA packaging, and
structural elements such as tail fibers and head vertices. dNTP synthesis is particularly important
because the rate of dNTP synthesis is limiting in DNA replication (366). In phage T4, the
initiation of DNA replication is controlled by the synthesis of ribonucleotide reductase, a
tetrameric enzyme (α2β2), which appears at 4.8 min after infection and is the last enzyme
165
available for the dNTP synthetase complex. We found spacers against nrdA, which encodes the α
subunit, enriched in our selections using T6 (Figure 4-29) and RB15 (Figure 4-30). Moreover, in
RB69 (Figure 4-32), several spacers target nrdD and nrdG, which are involved in anaerobic de
novo synthesis of deoxyribonucleotides (367). In T6 and RB15, enriched spacers also targeted
DNA terminase (gp17); in T4, this protein is required for DNA packaging, in which it cleaves
and packs DNA into phage proheads (368). Several other regions with spacer enrichment
encoded structural proteins, such as long tail fibers in RB33 (Figure 4-31), short tail fibers in T6
and RB15, and various head proteins in RB69. All of these genes are essential. Thus, our
selection assay can be used for identifying effective CRISPR spacers for phage resistance
applications as well as studying essential genes and phage biology.
Figure 4-28 Nucleotide frequencies at each position in the spacer sequence across libraries.
166
For each library, the nucleotide frequencies are calculated for each position based on all spacers
that matched to the given phage library.
Figure 4-29 Enriched regions on the phage T6 genome.
A few regions (A, B, and C) are highlighted in the bottom panels. Of the top spacers validated in
this study, T6.3 is located at position 85,799 and T6.4 at 87,647 in in region A. Spacer T6.2 is at
93,432 in region B. The other spacers are elsewhere on the genome: T6.1 at 98,415, T6.5 at 109,061,
and T6.6 at 60,188.
167
Figure 4-30 Enriched regions on the phage RB15 genome.
Some regions (A, B, C, and D) are displayed at higher resolution in the bottom panels. Spacer
RB15.1 is at position 3,150 in region A, RB15.4 at 148,337 in region C, and RB15.3 at 140,131 in
region D.
168
Figure 4-31 Enriched regions on the phage RB33 genome.
The data here is based on MG1655 data. Regions A, B, C, and D are highlighted in the bottom
panels. Spacer RB33.3 is located at position 150,362 in region D. We included spacer RB33.2 (at
3,158 in region A) in validation, but it did not confer protection against RB33 infection.
169
Figure 4-32 Enriched regions on the phage RB69 genome.
A few enriched regions are shown in higher resolution in the lower panels. Spacer RB69.3 is located
at position 109,036 and RB69.5 at 109,717 in region C. RB69.4 is at 89,565 in region B. Elsewhere
on the genome, spacer RB69.6 is at position 158,867.
170
4.5.5 Acknowledgements
This work was supported by NSF Small-Business/ ERC Collaborative Opportunity grant
IIP-1256446 (to Gingko Bioworks and GMC), US Department of Energy grant DE-FG0202ER63445 (to GMC), and the Wyss Institute for Biologically Inspired Engineering. SJY was
also supported by a National Science Foundation Graduate Research Fellowship and KME by
the Wyss Technology Development Fellowship.
DNA preparation and sequencing were completed at the Molecular Biology Core
Facilities of the Dana-Farber Cancer Institute. Analysis was performed on the Orchestra cluster
supported by the Harvard Medical School Research Information Technology Group.
171
Chapter 5
Conclusions and outlook on
microbiome engineering
The human body is naturally colonized by a vast number of microbes, collectively called
the human microbiota and whose genes constitute the human microbiome. These microbes
benefit the human host by extracting otherwise inaccessible nutrients, helping to develop the
immune system, and protecting the host against pathogen colonization (3, 35, 38–40, 42, 46). Yet
dysbiosis, or an imbalance between protective and harmful gut flora, can lead to human disease
(369). For instance, disturbances to the homeostasis between intestinal microbial antigens and
the host’s immune system may bring about type 1 diabetes and inflammatory bowel disease (50,
52). Next-generation sequencing has enabled systematic studies of the microbial and genetic
composition of the human microbiota, but we still know relatively little about the function of
these microbes and their genes. We set out to study what to edit and how to edit the microbiome.
In Chapter 2, we described a novel approach for functional discovery of genetic elements
in the microbiota and identified fitness genes conferring an advantage in the mammalian host.
Genes characterized in this manner can enable more competitive nutrient utilization, as we
demonstrated, or provide other benefits, depending on the in vivo selection conditions. Such
selected genes can then be introduced onto mobile genetic vectors or engineered strains to restore
microbial imbalances or enhance the fitness of our engineered elements. In Chapter 3, we
established foundational tools for working with complex microbial systems, exploring
microbiota gene delivery, and actively immunizing native gut flora against acquiring pathogenic
elements. In Chapter 4, we harnessed bacteriophages for precise manipulation of endogenous
microbiota; a near-term application would be to selectively deplete native E. coli with a set of
phages and introduce an enhanced probiotic Nissle 1917 that is not only resistant to those phages,
but also immune to acquiring Shiga toxin and multiple antibiotic resistance genes.
172
Overall, the future is bright for microbiome engineering, given innovations across various
fields. First, we are now ever better at reading and writing “omes,” enabled by technological
advances and cost reductions in high-throughput sequencing and DNA synthesis (370). In our
work, this has allowed for temporal functional metagenomics, as well as a generalized approach
for addressing large-scale biological questions based on synthesizing libraries of sequences for
selection experiments and subsequent sequencing. Given the complex interplay between the
microbiota and human host, these methods will continue to be invaluable for interrogating
different omes (e.g., DNA, RNA, epigenome) in conjunction with other omics data (e.g.,
metabolites, antibodies). Second, the advent of precise genome engineering tools, most recently,
CRISPR-Cas9, has transformed prospects for gene-based therapies. While precision editing is
undoubtedly of interest for clinical applications, it also allows for more precise methods to study
the microbiota and its impact on human health. Third, in light of antibiotic overuse in the clinic,
there has been renewed interest in phage therapy, which has been a form of personalized
medicine in Eastern Europe, where patients can receive phage therapy tailored to the infection
based on results from patients’ samples sent to phage collection centers. Our efforts leverage all
of these advances for editing the genomes of endogenous species or replacing them with
protective versions that could immunize microbiota ecosystems against pathogenicity and
dysbiosis as well as sense and secrete therapeutic molecules. Clearly, from improving
understanding to enhancing engineering of the microbiota, we are closer to realizing the vision of
precise and even personalized in vivo editing of the human microbiome.
Figure 5-1 Engineering microbiomes from diseased to healthy states.
Efforts to engineer microbiomes rely on the abilities to sequence,
discover, and edit. Although one may consider these sequentially,
from metagenomics sequencing to functional gene discovery to
precise genome editing, these processes can be thought of as general
tools that enhance one another for more powerful studies that
improve biological understanding and build better therapeutics.
173
Bibliography
1.
Yaung SJ, Church GM, Wang HH (2014) Recent Progress in Engineering Human-Assosciated
Microbiomes. Methods Mol Biol 1151:69–74.
2.
Huttenhower C et al. (2012) Structure, function and diversity of the healthy human microbiome.
Nature 486:207–214.
3.
Ley RE, Peterson DA, Gordon JI (2006) Ecological and evolutionary forces shaping microbial
diversity in the human intestine. Cell 124:837–48.
4.
Turnbaugh PJ et al. (2007) The human microbiome project. Nature 449:804–10.
5.
Nicholson JK, Holmes E, Wilson ID (2005) Gut microorganisms, mammalian metabolism and
personalized health care. Nat Rev Microbiol 3:431–438.
6.
Dethlefsen L, McFall-Ngai M, Relman DA (2007) An ecological and evolutionary perspective on
human-microbe mutualism and disease. Nature 449:811–8.
7.
Qin J et al. (2010) A human gut microbial gene catalogue established by metagenomic sequencing.
Nature 464:59–65.
8.
Kaeberlein T, Lewis K, Epstein SS (2002) Isolating “uncultivable” microorganisms in pure culture
in a simulated natural environment. Science 296:1127–9.
9.
Hayes CS, Aoki SK, Low DA (2010) Bacterial contact-dependent delivery systems. Annu Rev
Genet 44:71–90.
10.
Bassler BL, Losick R (2006) Bacterially speaking. Cell 125:237–46.
11.
Walker AW et al. (2008) The species composition of the human intestinal microbiota differs
between particle-associated and liquid phase communities. Environ Microbiol 10:3275–83.
12.
Smillie CS et al. (2011) Ecology drives a global network of gene exchange connecting the human
microbiome. Nature 480:2–5.
13.
Bradshaw DJ, Homer KA, Marsh PD, Beighton D (1994) Metabolic cooperation in oral microbial
communities during growth on mucin. Microbiology 140:3407–12.
14.
Falony G, Vlachou A, Verbrugghe K, De Vuyst L (2006) Cross-feeding between Bifidobacterium
longum BB536 and acetate-converting, butyrate-producing colon bacteria during growth on
oligofructose. Appl Environ Microbiol 72:7835–41.
174
15.
Salazar N, Gueimonde M, Hernández-Barranco AM, Ruas-Madiedo P, de los Reyes-Gavilán CG
(2008) Exopolysaccharides produced by intestinal Bifidobacterium strains act as fermentable
substrates for human intestinal bacteria. Appl Env Microbiol 74:4737–4745.
16.
Gibson GR et al. (1990) Alternative pathways for hydrogen disposal during fermentation in the
human colon. Gut 31:679–683.
17.
Dabard J et al. (2001) Ruminococcin A, a New Lantibiotic Produced by a Ruminococcus gnavus
Strain Isolated from Human Feces. Appl Environ Microbiol 67:4111–4118.
18.
Santagati M, Scillato M, Patanè F, Aiello C, Stefani S (2012) Bacteriocin-producing oral
streptococci and inhibition of respiratory pathogens. FEMS Immunol Med Microbiol.
19.
Gillor O, Etzion A, Riley MA (2008) The dual role of bacteriocins as anti- and probiotics. Appl
Microbiol Biotechnol 81:591–606.
20.
Davey ME, George AO, Toole GAO (2000) Microbial Biofilms : from Ecology to Molecular
Genetics Microbial Biofilms : from Ecology to Molecular Genetics. 64.
21.
Marsh PD, Moter A, Devine DA (2011) Dental plaque biofilms: communities, conflict and
control. Periodontol 2000 55:16–35.
22.
Boles BR, Thoendel M, Singh PK (2004) Self-generated diversity produces “insurance effects” in
biofilm communities. Proc Natl Acad Sci U S A 101:16630–5.
23.
Stewart PS, Franklin MJ (2008) Physiological heterogeneity in biofilms. Nat Rev Microbiol
6:199–210.
24.
Frost LS, Leplae R, Summers AO, Toussaint A (2005) Mobile genetic elements: the agents of
open source evolution. Nat Rev Microbiol 3:722–32.
25.
Gogarten JP, Townsend JP (2005) Horizontal gene transfer, genome innovation and evolution. Nat
Rev Microbiol 3:679–87.
26.
Norman A, Hansen LH, Sørensen SJ (2009) Conjugative plasmids: vessels of the communal gene
pool. Philos Trans R Soc Lond B Biol Sci 364:2275–89.
27.
Jones B V, Marchesi JR (2007) Accessing the mobile metagenome of the human gut microbiota.
Mol Biosyst 3:749–58.
28.
Dobrindt U, Hochhut B, Hentschel U, Hacker J (2004) Genomic islands in pathogenic and
environmental microorganisms. Nat Rev Microbiol 2:414–424.
29.
Baquero F (2004) From pieces to patterns: evolutionary engineering in bacterial pathogens. Nat
175
Rev Microbiol 2:510–518.
30.
Salyers AA (1993) Gene transfer in the mammalian intestinal tract. Curr Opin Biotechnol 4:294–
298.
31.
Reid G et al. (2010) Microbiota restoration: natural and supplemented recovery of human
microbial communities. Nat Rev Microbiol 9:27–38.
32.
Koenig JE et al. (2010) Succession of microbial consortia in the developing infant gut
microbiome. Proc Natl Acad Sci U S A 108 Suppl :4578–85.
33.
Van den Abbeele P, Van de Wiele T, Verstraete W, Possemiers S (2011) The host selects mucosal
and luminal associations of coevolved gut microorganisms: a novel concept. FEMS Microbiol Rev
35:681–704.
34.
Giraud A et al. (2008) Dissecting the genetic components of adaptation of Escherichia coli to the
mouse gut. PLoS Genet 4:e2.
35.
Gill SR et al. (2006) Metagenomic analysis of the human distal gut microbiome. Science
312:1355–1359.
36.
Bäckhed F, Ley RE, Sonnenburg JL, Peterson DA, Gordon JI (2005) Host-Bacterial Mutualism in
the Human Intestine. Science 307:1915–1920.
37.
Guarner F, Malagelada J-R (2003) Gut flora in health and disease. Lancet 361:512–519.
38.
Stappenbeck TS, Hooper L V, Gordon JI (2002) Developmental regulation of intestinal
angiogenesis by indigenous microbes via Paneth cells. Proc Natl Acad Sci U S A 99:15451–5.
39.
Rakoff-Nahoum S, Paglino J, Eslami-Varzaneh F, Edberg S, Medzhitov R (2004) Recognition of
commensal microflora by toll-like receptors is required for intestinal homeostasis. Cell 118:229–
41.
40.
Hooper L V (2004) Bacterial contributions to mammalian gut development. Trends Microbiol
12:129–134.
41.
Pryde SE, Duncan SH, Hold GL, Stewart CS, Flint HJ (2002) The microbiology of butyrate
formation in the human colon. FEMS Microbiol Lett 217:133–9.
42.
Round JL, Mazmanian SK (2010) Inducible Foxp3+ regulatory T-cell development by a
commensal bacterium of the intestinal microbiota. Proc Natl Acad Sci U S A 107:12204–12209.
43.
Wu GD et al. (2011) Linking long-term dietary patterns with gut microbial enterotypes. Science
334:105–8.
176
44.
Serino M et al. (2012) Metabolic adaptation to a high-fat diet is associated with a change in the gut
microbiota. Gut 61:543–553.
45.
Honda K, Littman DR (2011) The Microbiome in Infectious Disease and Inflammation. Annu Rev
Immunol 30:759–795.
46.
Ley RE et al. (2005) Obesity alters gut microbial ecology. Proc Natl Acad Sci U S A 102:11070–5.
47.
Turnbaugh PJ, Bäckhed F, Fulton L, Gordon JI (2008) Diet-induced obesity is linked to marked
but reversible alterations in the mouse distal gut microbiome. Cell Host Microbe 3:213–23.
48.
Murphy EF et al. (2010) Composition and energy harvesting capacity of the gut microbiota:
relationship to diet, obesity and time in mouse models. Gut 59:1635–42.
49.
Cerf-Bensussan N, Gaboriau-Routhiau V (2010) The immune system and the gut microbiota:
friends or foes? Nat Rev Immunol 10:735–44.
50.
Wen L et al. (2008) Innate immunity and intestinal microbiota in the development of Type 1
diabetes. Nature 455:1109–13.
51.
Lee YK, Menezes JS, Umesaki Y, Mazmanian SK (2010) Proinflammatory T-cell responses to gut
microbiota promote experimental autoimmune encephalomyelitis. Proc Natl Acad Sci U S A
108:Suppl 1:4615–22.
52.
Abraham C, Cho JH (2009) Inflammatory bowel disease. N Engl J Med 361:2066–78.
53.
Hong P-Y et al. (2010) Comparative analysis of fecal microbiota in infants with and without
eczema. PLoS One 5:e9964.
54.
Saulnier DM et al. (2011) Gastrointestinal microbiome signatures of pediatric patients with
irritable bowel syndrome. Gastroenterology 141:1782–1791.
55.
Claesson MJ et al. (2011) Composition, variability, and temporal stability of the intestinal
microbiota of the elderly. Proc Natl Acad Sci U S A 108 Suppl:4586–91.
56.
Yatsunenko T et al. (2012) Human gut microbiome viewed across age and geography. Nature
486:222–227.
57.
Spor A, Koren O, Ley R (2011) Unravelling the effects of the environment and host genotype on
the gut microbiome. Nat Rev Microbiol 9:279–290.
58.
Nell S, Suerbaum S, Josenhans C (2010) The impact of the microbiota on the pathogenesis of
IBD: lessons from mouse infection models. Nat Rev Microbiol 8:564–77.
177
59.
Sokol H et al. (2009) Low counts of Faecalibacterium prausnitzii in colitis microbiota. Inflamm
Bowel Dis 15:1183–1189.
60.
Manichanh C et al. (2006) Reduced diversity of faecal microbiota in Crohn’s disease revealed by a
metagenomic approach. Gut 55:205–211.
61.
He T et al. (2008) The role of colonic metabolism in lactose intolerance. Eur J Clin Invest 38:541–
7.
62.
He T et al. (2006) Colonic fermentation may play a role in lactose intolerance in humans. J Nutr
136:58.
63.
Tehrani AB, Nezami BG, Gewirtz A, Srinivasan S (2012) Obesity and its associated disease: a
role for microbiota? Neurogastroenterol Motil 24:305–311.
64.
Everard A et al. (2011) Responses of Gut Microbiota and Glucose and Lipid Metabolism to
Prebiotics in Genetic Obese and Diet-Induced Leptin-Resistant Mice. Diabetes 60:1–12.
65.
Giongo A et al. (2010) Toward defining the autoimmune microbiome for type 1 diabetes. ISME J
5:1–10.
66.
Wu H-J et al. (2010) Gut-residing segmented filamentous bacteria drive autoimmune arthritis via
T helper 17 cells. Immunity 32:815–27.
67.
Lam V et al. (2012) Intestinal microbiota determine severity of myocardial infarction in rats.
FASEB J:1–9.
68.
Wardwell LH, Huttenhower C, Garrett WS (2011) Current concepts of the intestinal microbiota
and the pathogenesis of infection. Curr Infect Dis Rep 13:28–34.
69.
Gori A et al. (2008) Early impairment of gut function and gut flora supporting a role for alteration
of gastrointestinal mucosa in human immunodeficiency virus pathogenesis. J Clin Microbiol
46:757–8.
70.
Stecher B, Hardt W-D (2008) The role of microbiota in infectious disease. Trends Microbiol
16:107–114.
71.
Walk ST, Young VB (2008) Emerging Insights into Antibiotic-Associated Diarrhea and
Clostridium difficile Infection through the Lens of Microbial Ecology. Interdiscip Perspect Infect
Dis 2008:125081.
72.
Vrieze A et al. (2010) The environment within: how gut microbiota may influence metabolism and
body composition. Diabetologia 53:606–13.
178
73.
Hou JK, Abraham B, El-Serag H (2011) Dietary intake and risk of developing inflammatory
bowel disease: a systematic review of the literature. Am J Gastroenterol 106:563–573.
74.
Fava F, Lovegrove JA, Gitau R, Jackson KG, Tuohy KM (2006) The gut microbiota and lipid
metabolism: implications for human health and coronary heart disease. Curr Med Chem 13:3005–
21.
75.
Wang Z et al. (2011) Gut flora metabolism of phosphatidylcholine promotes cardiovascular
disease. Nature 472:57–63.
76.
Dobkin JF, Saha JR, Butler VP, Neu HC, Lindenbaum J (1983) Digoxin-inactivating bacteria:
identification in human gut flora. Science 220:325–327.
77.
Clayton TA, Baker D, Lindon JC, Everett JR, Nicholson JK (2009) Pharmacometabonomic
identification of a significant host-microbiome metabolic interaction affecting human drug
metabolism. Proc Natl Acad Sci U S A 106:14728–33.
78.
Wallace BD et al. (2010) Alleviating Cancer Drug Toxicity by Inhibiting a Bacterial Enzyme.
Science 330:831–835.
79.
Marsh P (1994) Microbial ecology of dental plaque and its significance in health and disease. Adv
Dent Res 8:263.
80.
Azarpazhooh A, Leake JL (2006) Systematic review of the association between respiratory
diseases and oral health. J Periodontol 77:1465–82.
81.
Ford PJ et al. (2007) Anti-P. gingivalis Response Correlates with Atherosclerosis. J Dent Res
86:35–40.
82.
Li L, Messas E, Batista ELL, Levine R, Amar S (2002) Porphyromonas gingivalis Infection
Accelerates the Progression of Atherosclerosis in a Heterozygous Apolipoprotein E-Deficient
Murine Model. Circulation 105:861–867.
83.
Koren O et al. (2010) Human oral, gut, and plaque microbiota in patients with atherosclerosis.
Proc Natl Acad Sci U S A 108:4592–8.
84.
Haug MC, Tanner SA, Lacroix C, Stevens MJA, Meile L (2011) Monitoring horizontal antibiotic
resistance gene transfer in a colonic fermentation model. FEMS Microbiol Ecol 78:210–9.
85.
Nelson KE et al. (2010) A catalog of reference genomes from the human microbiome. Science
328:994–9.
86.
Human T et al. (2012) A framework for human microbiome research. Nature 486:215–221.
179
87.
De Filippo C et al. (2010) Impact of diet in shaping gut microbiota revealed by a comparative
study in children from Europe and rural Africa. Proc Natl Acad Sci U S A 107:14691–6.
88.
Peterson DA, Frank DN, Pace NR, Gordon JI (2008) Metagenomic approaches for defining the
pathogenesis of inflammatory bowel diseases. Cell Host Microbe 3:417–27.
89.
Larsen N et al. (2010) Gut microbiota in human adults with type 2 diabetes differs from nondiabetic adults. PLoS One 5:e9085.
90.
Yang F et al. (2012) Saliva microbiomes distinguish caries-active from healthy human
populations. ISME J 6:1–10.
91.
Kong HH et al. (2012) Temporal shifts in the skin microbiome associated with disease flares and
treatment in children with atopic dermatitis. Genome Res 22:850–859.
92.
Keijser BJF et al. (2008) Pyrosequencing analysis of the Oral Microflora of healthy adults. J Dent
Res 87:1016–1020.
93.
Gao Z, Tseng C, Pei Z, Blaser MJ (2007) Molecular analysis of human forearm superficial skin
bacterial biota. Proc Natl Acad Sci U S A 104:2927–2932.
94.
Park J, Kerner A, Burns MA, Lin XN (2011) Microdroplet-enabled highly parallel co-cultivation
of microbial communities. PLoS One 6:e17019.
95.
Bollmann A, Lewis K, Epstein SS (2007) Incubation of environmental samples in a diffusion
chamber increases the diversity of recovered isolates. Appl Environ Microbiol 73:6386–6390.
96.
Thomas CM, Nielsen KM (2005) Mechanisms of, and barriers to, horizontal gene transfer between
bacteria. Nat Rev Microbiol 3:711–21.
97.
Lorenz MG, Wackernagel W (1994) Bacterial gene transfer by natural genetic transformation in
the environment. Microbiol Rev 58:563–602.
98.
Wirth R, Friesenegger A, Fiedler S (1989) Transformation of various species of gram-negative
bacteria belonging to 11 different genera by electroporation. MGG Mol Gen Genet 216:175–177.
99.
Sanford JC, Smith FD, Russell JA (1993) Optimizing the biolistic process for different biological
applications. Methods Enzym 217:483–509.
100.
Wyber JA, Andrews J, D’Emanuele A (1997) The use of sonication for the efficient delivery of
plasmid DNA into cells. Pharm Res 14:750–756.
101.
Swords WE (2003) Chemical transformation of E. coli. Methods Mol Biol 235:49–53.
180
102.
Thomson AM, Flint HJ (1989) Electroporation induced transformation of Bacteroides ruminicola
and Bacteroides uniformis by plasmid DNA. FEMS Microbiol Lett 52:101–4.
103.
Calvin NM, Hanawalt PC (1988) High-efficiency transformation of bacterial cells by
electroporation. J Bacteriol 170:2796–2801.
104.
Goodman AL et al. (2009) Identifying genetic determinants needed to establish a human gut
symbiont in its habitat. Cell Host Microbe 6:279–89.
105.
Phillips-Jones MK (1995) Introduction of recombinant DNA into Clostridium spp. Methods Mol
Biol 47:227–35.
106.
Bouillaut L, McBride SM, Sorg JA (2011) Genetic manipulation of Clostridium difficile. Curr
Protoc Microbiol Chapter 9:Unit 9A.2.
107.
Jennert KC, Tardif C, Young DI, Young M (2000) Gene transfer to Clostridium cellulolyticum
ATCC 35319. Microbiology 146 Pt 12:3071–80.
108.
Young DI, Evans VJ, Jefferies JR (1999) Genetic Methods in Clostridia. Methods in Microbology
29:191–207.
109.
Cocconcelli PS, Ferrari E, Rossi F, Bottazzi V (1992) Plasmid transformation of Ruminococcus
albus by means of high-voltage electroporation. FEMS Microbiol Lett 73:203–7.
110.
Van Pijkeren J-P et al. (2012) High efficiency recombineering in lactic acid bacteria. Nucleic
Acids Res 40:1–13.
111.
Damelin LH, Mavri-Damelin D, Klaenhammer TR, Tiemessen CT (2010) Plasmid transduction
using bacteriophage Phi(adh) for expression of CC chemokines by Lactobacillus gasseri ADH.
Appl Environ Microbiol 76:3878–85.
112.
Lizier M, Sarra PG, Cauda R, Lucchini F (2010) Comparison of expression vectors in
Lactobacillus reuteri strains. FEMS Microbiol Lett 308:8–15.
113.
Ljungh Å, Wadström T eds. (2009) Lactobacillus molecular biology: from genomics to probiotics
(Caister Academic Press, Norfolk, UK).
114.
Sørvig E, Mathiesen G, Naterstad K, Eijsink VGH, Axelsson L (2005) High-level, inducible gene
expression in Lactobacillus sakei and Lactobacillus plantarum using versatile expression vectors.
Microbiology 151:2439–2449.
115.
Thompson K, Collins MA (1996) Improvement in electroporation efficiency for Lactobacillus
plantarum by the inclusion of high concentrations of glycine in the growth medium. J Microbiol
Methods 26:73–79.
181
116.
Shepard BD, Gilmore MS (1995) Electroporation and efficient transformation of Enterococcus
faecalis grown in high concentrations of glycine. Methods Mol Biol 47:217–226.
117.
Holo H, Nes IF (1995) Transformation of Lactococcus by electroporation. Methods Mol Biol
47:195–199.
118.
Biswas I, Jha JK, Fromm N (2008) Shuttle expression plasmids for genetic studies in
Streptococcus mutans. Microbiology 154:2275–2282.
119.
McLaughlin RE, Ferretti JJ (1995) Electrotransformation of Streptococci. Methods Mol Biol
47:185–193.
120.
Lee JC (1995) Electrotransformation of Staphylococci. Methods Mol Biol 47:209–216.
121.
Alexander JE, Andrew PW, Jones D, Roberts IS (1990) Development of an optimized system for
electroporation of Listeria species. Lett Appl Microbiol 10:179–181.
122.
Kuramitsu HK, Chi B, Ikegami A (2005) Genetic manipulation of Treponema denticola. Curr
Protoc Microbiol Chapter 12:Unit 12B.2.
123.
Hyde JA, Weening EH, Skare JT (2011) Genetic transformation of Borrelia burgdorferi. Curr
Protoc Microbiol Supplement:1–17.
124.
Rosa P, Stevenson B, Tilly K (1999) Genetic Methods in Borrelia and Other Spirochaetes.
Methods in Microbology 29.
125.
Mayo B, van Sinderen D eds. (2010) Bifidobacteria: Genomics and Molecular Aspects (Caister
Academic Press, Norfolk, UK).
126.
Yeung MK, Kozelsky CS (1994) Transformation of Actinomyces spp. by a gram-negative broadhost-range plasmid. J Bacteriol 176:4173–4176.
127.
Parish T, Brown AC (2009) Mycobacteria Protocols eds Parish T, Brown AC (Humana Press,
Totowa, NJ).
128.
Sassetti CM, Boyd DH, Rubin EJ (2001) Comprehensive identification of conditionally essential
genes in mycobacteria. Proc Natl Acad Sci U S A 98:12712–12717.
129.
Luijk N Van et al. (2002) Genetics and molecular biology of propionibacteria. Lait 82:45–57.
130.
Binet R, Maurelli AT (2009) Transformation and isolation of allelic exchange mutants of
Chlamydia psittaci using recombinant DNA introduced by electroporation. Proc Natl Acad Sci U S
A 106:292–297.
182
131.
Bélanger M, Rodrigues P, Progulske-Fox A (2007) Genetic manipulation of Porphyromonas
gingivalis. Curr Protoc Microbiol Chapter 13:Unit13C.2.
132.
Flint HJ, Martin JC, Thomson AM (2000) in Electrotransformation of Bacteria, eds Eynard N,
Teissié J, pp 140–149.
133.
Salyers AA, Shoemaker NB, Nikolich MP (1992) METHOD AND MATERIALS FOR
INTRODUCING DNA INTO PREVOTELLA RUMINICOLA.
134.
Bacic MK, Smith CJ (2008) Laboratory maintenance and cultivation of bacteroides species. Curr
Protoc Microbiol Chapter 13:Unit 13C.1.
135.
Salyers AA et al. (1999) Genetic Methods for Bacteroides Species. Methods in Microbology
29:229–249.
136.
Smith CJ (1995) Genetic transformation of Bacteroides spp. using electroporation. Methods Mol
Biol 47:161–169.
137.
Kinder Haake S, Yoder S, Gerardo SH (2006) Efficient gene transfer and targeted mutagenesis in
Fusobacterium nucleatum. Plasmid 55:27–38.
138.
Segal ED (1995) Electroporation of Helicobacter pylori. Methods Mol Biol 47:179–184.
139.
Taylor D (1992) Genetics of Campylobacter and Helicobacter. Annu Rev Microbiol:35–64.
140.
Rachek LI et al. (2000) Transformation of Rickettsia prowazekii to Erythromycin Resistance
Encoded by the Escherichia coli ereB Gene Transformation of Rickettsia prowazekii to
Erythromycin Resistance Encoded by the Escherichia coli ereB Gene. J Bacteriol 182:3289–3291.
141.
McQuiston J, Schurig G (1995) Transformation of Brucella species with suicide and broad hostrange plasmids. Methods Mol 47:143–148.
142.
Scarlato V, Ricci S, Rappuoli R, Pizza M (1996) in Microbial Genome Methods, ed Adolph KW
(CRC Press), pp 247–262.
143.
Bogdan JA, Minetti CASA, Blake MS (2002) A one-step method for genetic transformation of
non-piliated Neisseria meningitidis. J Microbiol Methods 49:97–101.
144.
Genco CA, Knapp JS, Clark VL (1984) Conjugation of Plasmids of Neisseria gonorrhoeae to other
Neisseria Species: Potential Reservoirs for the -Lactamase Plasmid. J Infect Dis 150:397–401.
145.
O’Dwyer C et al. (2005) A novel neisserial shuttle plasmid: a useful new tool for meningococcal
research. FEMS Microbiol Lett 251:143–7.
183
146.
Dennis JJ, Sokol PA (1995) Electrotransformation of Pseudomonas. Methods Mol Biol 47:125–
133.
147.
Kleckner N (1981) Transposable elements in prokaryotes. Annu Rev Genet 15:341–404.
148.
Goodman AL, Wu M, Gordon JI (2011) Identifying microbial fitness determinants by insertion
sequencing using genome-wide transposon mutant libraries. Nat Protoc 6:1969–1980.
149.
Van Opijnen T, Bodi KL, Camilli A (2009) Tn-seq: high-throughput parallel sequencing for
fitness and genetic interaction studies in microorganisms. Nat Methods 6:767–72.
150.
Gawronski JD, Wong SM, Giannoukos G, Ward D V, Akerley BJ (2009) Tracking insertion
mutants within libraries by deep sequencing and a genome-wide screen for Haemophilus genes
required in the lung. Proc Natl Acad Sci U S A 106:16422–16427.
151.
Langridge GC et al. (2009) Simultaneous assay of every Salmonella Typhi gene using one million
transposon mutants. Genome Res 19:2308–2316.
152.
Sommer MOA, Dantas G, Church GM (2009) Functional characterization of the antibiotic
resistance reservoir in the human microflora. Science 325:1128–31.
153.
Warner JR, Reeder PJ, Karimpour-Fard A, Woodruff LB, Gill RT (2010) Rapid profiling of a
microbial genome using mixtures of barcoded oligonucleotides. Nat Biotechnol 28:856–862.
154.
Sandoval NR et al. (2012) Strategy for directing combinatorial genome engineering in Escherichia
coli. Proc Natl Acad Sci U S A.
155.
Wang HH et al. (2009) Programming cells by multiplex genome engineering and accelerated
evolution. Nature 460:894–8.
156.
Wang HH et al. (2012) Genome-scale promoter engineering by coselection MAGE. Nat Methods
9:591–593.
157.
Carr PA et al. (2012) Enhanced multiplex genome engineering through co-operative
oligonucleotide co-selection. Nucleic Acids Res 40:e132.
158.
Wang HH, Church GM (2011) Multiplexed genome engineering and genotyping methods
applications for synthetic biology and metabolic engineering. Methods Enzym 498:409–426.
159.
Sharan SK, Thomason LC, Kuznetsov SG, Court DL (2009) Recombineering: a homologous
recombination-based method of genetic engineering. Nat Protoc 4:206–223.
160.
Isaacs FJ et al. (2011) Precise manipulation of chromosomes in vivo enables genome-wide codon
replacement. Science 333:348–353.
184
161.
Swingle B et al. (2010) Oligonucleotide recombination in Gram-negative bacteria. Mol Microbiol
75:138–148.
162.
Swingle B, Bao Z, Markel E, Chambers A, Cartinhour S (2010) Recombineering using RecTE
from Pseudomonas syringae. Appl Env Microbiol 76:4960–4968.
163.
Van Kessel JC, Hatfull GF (2007) Recombineering in Mycobacterium tuberculosis. Nat Methods
4:147–152.
164.
Sonnenburg JL, Angenent LT, Gordon JI (2004) Getting a grip on things: how do communities of
bacterial symbionts become established in our intestine? Nat Immunol 5:569–73.
165.
Faith JJ, McNulty NP, Rey FE, Gordon JI (2011) Predicting a human gut microbiota’s response to
diet in gnotobiotic mice. Science 333:101–4.
166.
Hosoda K et al. (2011) Cooperative adaptation to establishment of a synthetic bacterial mutualism.
PLoS One 6:e17105.
167.
Shou W, Ram S, Vilar JM (2007) Synthetic cooperation in engineered yeast populations. Proc
Natl Acad Sci U S A 104:1877–1882.
168.
Wintermute EH, Silver PA (2010) Emergent cooperation in microbial metabolism. Mol Syst Biol
6:407.
169.
Mee JM, Wang HH (2012) Engineering Ecosystems and Synthetic Ecologies. Mol BioSyst
8:2470–2483.
170.
Saeidi N et al. (2011) Engineering microbes to sense and eradicate Pseudomonas aeruginosa, a
human pathogen. Mol Syst Biol 7:521.
171.
Duan F, March JC (2010) Engineered bacterial communication prevents Vibrio cholerae virulence
in an infant mouse model. Proc Natl Acad Sci U S A 107:11260–11264.
172.
Steidler L, Hans W, Schotte L, Neirynck S (2000) Treatment of Murine Colitis by Lactococcus
lactis Secreting Interleukin-10. Science 289:1352–1355.
173.
Steidler L, Rottiers P, Coulie B (2009) Actobiotics as a novel method for cytokine delivery. Ann N
Y Acad Sci 1182:135–45.
174.
Duncan SH et al. (2003) Effects of alternative dietary substrates on competition between human
colonic bacteria in an anaerobic fermentor system. Appl Environ Microbiol 69:1136–42.
175.
Leitch ECM, Walker AW, Duncan SH, Holtrop G, Flint HJ (2007) Selective colonization of
insoluble substrates by human faecal bacteria. Environ Microbiol 9:667–679.
185
176.
Macfarlane GT, Hay S, Gibson GR (1989) Influence of mucin on glycosidase, protease and
arylamidase activities of human gut bacteria grown in a 3-stage continuous culture system. J Appl
Bacteriol 66:407–17.
177.
Molly K, Woestyne M, Verstraete W (1993) Development of a 5-step multi-chamber reactor as a
simulation of the human intestinal microbial ecosystem. Appl Microbiol Biotechnol 39:254–258.
178.
Possemiers S, Verthé K, Uyttendaele S, Verstraete W (2004) PCR-DGGE-based quantification of
stability of the microbial community in a simulator of the human intestinal microbial ecosystem.
FEMS Microbiol Ecol 49:495–507.
179.
Pratten J (2007) Growing oral biofilms in a constant depth film fermentor (CDFF). Curr Protoc
Microbiol Chapter 1:Unit 1B.5.
180.
Ready D (2002) Composition and antibiotic resistance profile of microcosm dental plaques before
and after exposure to tetracycline. J Antimicrob Chemother 49:769–775.
181.
Roberts AP, Pratten J, Wilson M, Mullany P (1999) Transfer of a conjugative transposon, Tn5397
in a model oral biofilm. FEMS Microbiol Lett 177:63–66.
182.
Roberts AP et al. (2001) Transfer of Tn916-like elements in microcosm dental plaques.
Antimicrob agents 45:2943–2946.
183.
Kim HJ, Huh D, Hamilton G, Ingber DE, Links DA (2012) Human Gut-on-a-Chip inhabited by
microbial flora that experiences intestinal peristalsis-like motions and flow. Lab Chip:2165–2174.
184.
Foster JS, Kolenbrander PE (2004) Development of a multispecies oral bacterial community in a
saliva-conditioned flow cell. Appl Environ Microbiol 70:4340.
185.
Doucet-Populaire F, Trieu-Cuot P, Dosbaa I, Andremont A, Courvalin P (1991) Inducible transfer
of conjugative transposon Tn1545 from Enterococcus faecalis to Listeria monocytogenes in the
digestive tracts of gnotobiotic mice. Antimicrob Agents Chemother 35:185–7.
186.
Launay A, Ballard SA, Johnson PDR, Grayson ML, Lambert T (2006) Transfer of Vancomycin
Resistance Transposon Tn1549 from Clostridium symbiosum to Enterococcus spp . in the Gut of
Gnotobiotic Mice. Antimicrob Agents Chemother 50:1054–1062.
187.
Turnbaugh PJ et al. (2009) The effect of diet on the human gut microbiome: a metagenomic
analysis in humanized gnotobiotic mice. Sci Transl Med 1:6ra14.
188.
Lalla E et al. (2003) Oral infection with a periodontal pathogen accelerates early atherosclerosis in
apolipoprotein E-null mice. Arterioscler Thromb Vasc Biol 23:1405–11.
189.
Sellon RK et al. (1998) Resident enteric bacteria are necessary for development of spontaneous
186
colitis and immune system activation in interleukin-10-deficient mice. Infect Immun 66:5224–
5231.
190.
Caricilli AM et al. (2011) Gut microbiota is a key modulator of insulin resistance in TLR 2
knockout mice. PLoS Biol 9:e1001212.
191.
Vijay-Kumar M et al. (2010) Metabolic syndrome and altered gut microbiota in mice lacking Tolllike receptor 5. Science 328:228–31.
192.
Deng W, Vallance BA, Li Y, Puente JL, Finlay BB (2003) Citrobacter rodentium translocated
intimin receptor (Tir) is an essential virulence factor needed for actin condensation, intestinal
colonization and colonic hyperplasia in mice. Mol Microbiol 48:95–115.
193.
Newman J, Zabel B, Jha S, Schauer D (1999) Citrobacter rodentium espB is necessary for signal
transduction and for infection of laboratory mice. Infect Immun 67:6019–6025.
194.
Alex P et al. (2009) Distinct cytokine patterns identified from multiplex profiles of murine DSS
and TNBS-induced colitis. Inflamm Bowel Dis 15:341–52.
195.
Oz HS, Puleo DA (2011) Animal Models for Periodontal Disease. J Biomed Biotechnol 2011:1–8.
196.
Naglik JR, Fidel PL, Odds FC (2008) Animal models of mucosal Candida infection. FEMS
Microbiol Lett 283:129–139.
197.
Mcbride BC, van der Hoeven JS (1981) Role of interbacterial adherence in colonization of the oral
cavities of gnotobiotic rats infected with Streptococcus mutans and Veillonella alcalescens. Infect
Immun 33:467–472.
198.
Mahowald MA et al. (2009) Characterizing a model human gut microbiota composed of members
of its two dominant bacterial phyla. Proc Natl Acad Sci U S A 106:5859–64.
199.
Sonnenburg JL, Chen CTL, Gordon JI (2006) Genomic and metabolic studies of the impact of
probiotics on a model gut symbiont and host. PLoS Biol 4:e413.
200.
Lewis NE, Nagarajan H, Palsson BO (2012) Constraining the metabolic genotype-phenotype
relationship using a phylogeny of in silico methods. Nat Rev Microbiol 10:291–305.
201.
Zomorrodi AR, Maranas CD (2012) OptCom: a multi-level optimization framework for the
metabolic modeling and analysis of microbial communities. PLoS Comput Biol 8:e1002363.
202.
Mahadevan R, Edwards JS, Doyle 3rd FJ (2002) Dynamic flux balance analysis of diauxic growth
in Escherichia coli. Biophys J 83:1331–1340.
203.
Greenblum S, Turnbaugh PJ, Borenstein E (2012) Metagenomic systems biology of the human gut
187
microbiome reveals topological shifts associated with obesity and inflammatory bowel disease.
Proc Natl Acad Sci U S A 109:594–599.
204.
Zhuang K et al. (2011) Genome-scale dynamic modeling of the competition between Rhodoferax
and Geobacter in anoxic subsurface environments. ISME J 5:305–316.
205.
Taffs R et al. (2009) In silico approaches to study mass and energy flows in microbial consortia: a
syntrophic case study. BMC Syst Biol 3:114.
206.
Turnbaugh PJ et al. (2009) A core gut microbiome in obese and lean twins. Nature 457:480–4.
207.
Rohlke F, Surawicz CM, Stollman N (2010) Fecal flora reconstitution for recurrent Clostridium
difficile infection: results and methodology. J Clin Gastroenterol 44:567–70.
208.
Miele E et al. (2009) Effect of a probiotic preparation (VSL#3) on induction and maintenance of
remission in children with ulcerative colitis. Am J Gastroenterol 104:437–443.
209.
Gionchetti P et al. (2003) Prophylaxis of pouchitis onset with probiotic therapy: a double-blind,
placebo-controlled trial. Gastroenterology 124:1202–1209.
210.
Mimura T et al. (2004) Once daily high dose probiotic therapy (VSL#3) for maintaining remission
in recurrent or refractory pouchitis. Gut 53:108–114.
211.
Culligan EP, Hill C, Sleator RD (2009) Probiotics and gastrointestinal disease: successes,
problems and future prospects. Gut Pathog 1:19.
212.
Sartor RB (2004) Therapeutic manipulation of the enteric microflora in inflammatory bowel
diseases: antibiotics, probiotics, and prebiotics. Gastroenterology 126:1620–1633.
213.
Cronin M et al. (2010) Orally administered bifidobacteria as vehicles for delivery of agents to
systemic tumors. Mol Ther 18:1397–407.
214.
Fu G-F et al. (2005) Bifidobacterium longum as an oral delivery system of endostatin for gene
therapy on solid liver cancer. Cancer Gene Ther 12:133–40.
215.
Li X et al. (2003) Bifidobacterium adolescentis as a delivery system of endostatin for cancer gene
therapy: selective inhibitor of angiogenesis and hypoxic tumor growth. Cancer Gene Ther 10:105–
11.
216.
Duan F, Curtis KL, March JC (2008) Secretion of insulinotropic proteins by commensal bacteria:
rewiring the gut to treat diabetes. Appl Environ Microbiol 74:7437–8.
217.
Rao S et al. (2005) Toward a live microbial microbicide for HIV: commensal bacteria secreting an
HIV fusion inhibitor peptide. Proc Natl Acad Sci U S A 102:11993–8.
188
218.
Braat H et al. (2006) A phase I trial with transgenic bacteria expressing interleukin-10 in Crohn’s
disease. Clin Gastroenterol Hepatol 4:754–9.
219.
Degnan FH (2008) The US Food and Drug Administration and probiotics: regulatory
categorization. Clin Infect Dis 46 Suppl 2:S133–6; discussion S144–51.
220.
Yaung SJ et al. (2015) Improving microbial fitness in the mammalian gut by in vivo temporal
functional metagenomics. Mol Syst Biol 11:788–788.
221.
Peterson J et al. (2009) The NIH Human Microbiome Project. Genome Res 19:2317–23.
222.
Walker AW, Duncan SH, Louis P, Flint HJ (2014) Phylogeny, culturing, and metagenomics of the
human gut microbiota. Trends Microbiol 22:267–74.
223.
Healy FG et al. (1995) Direct isolation of functional genes encoding cellulases from the microbial
consortia in a thermophilic, anaerobic digester maintained on lignocellulose. Appl Microbiol
Biotechnol 43:667–674.
224.
Stein J, Marsh T, Wu K, Shizuya H, DeLong E (1996) Characterization of uncultivated
prokaryotes: isolation and analysis of a 40-kilobase-pair genome fragment from a planktonic
marine archaeon. J Bacteriol 178:591–599.
225.
Rondon MR et al. (2000) Cloning the Soil Metagenome : a Strategy for Accessing the Genetic and
Functional Diversity of Uncultured Microorganisms. 66:2541–2547.
226.
Tasse L et al. (2010) Functional metagenomics to mine the human gut microbiome for dietary
fiber catabolic enzymes. Genome Res 20:1605–12.
227.
Cecchini DA et al. (2013) Functional metagenomics reveals novel pathways of prebiotic
breakdown by human gut bacteria. PLoS One 8:e72766.
228.
Gloux K et al. (2011) A metagenomic β-glucuronidase uncovers a core adaptive function of the
human intestinal microbiome. Proc Natl Acad Sci U S A 108 Suppl :4539–46.
229.
Culligan EP, Sleator RD, Marchesi JR, Hill C (2012) Functional metagenomics reveals novel salt
tolerance loci from the human gut microbiome. ISME J 6:1916–25.
230.
Lakhdari O et al. (2010) Functional metagenomics: a high throughput screening method to
decipher microbiota-driven NF-κB modulation in the human gut. PLoS One 5:1–10.
231.
Xu J et al. (2003) A genomic view of the human-Bacteroides thetaiotaomicron symbiosis. Science
299:2074–6.
232.
Sonnenburg JL et al. (2005) Glycan foraging in vivo by an intestine-adapted bacterial symbiont.
189
Science 307:1955–9.
233.
Bjursell MK, Martens EC, Gordon JI (2006) Functional genomic and metabolic studies of the
adaptations of a prominent adult human gut symbiont, Bacteroides thetaiotaomicron, to the
suckling period. J Biol Chem 281:36269–79.
234.
Martens EC, Chiang HC, Gordon JI (2008) Mucosal glycan foraging enhances fitness and
transmission of a saccharolytic human gut bacterial symbiont. Cell Host Microbe 4:447–57.
235.
Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of
short DNA sequences to the human genome. Genome Biol 10:R25.
236.
Trapnell C et al. (2013) Differential analysis of gene regulation at transcript resolution with RNAseq. Nat Biotechnol 31:46–53.
237.
Li H et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–
9.
238.
Haardt M, Kempf B, Faatz E, Bremer E (1995) The osmoprotectant proline betaine is a major
substrate for the binding-protein-dependent transport system ProU of Escherichia coli K-12. Mol
Gen Genet 246:783–6.
239.
Usui Y et al. (2012) Investigating the effects of perturbations to pgi and eno gene expression on
central carbon metabolism in Escherichia coli using (13)C metabolic flux analysis. Microb Cell
Fact 11:87.
240.
Winson MK et al. (1998) Engineering the luxCDABE genes from Photorhabdus luminescens to
provide a bioluminescent reporter for constitutive and promoter probe plasmids and mini-Tn5
constructs. FEMS Microbiol Lett 163:193–202.
241.
Jost L (2006) Entropy and diversity. Oikos 113:363–375.
242.
Schloss PD et al. (2009) Introducing mothur: Open-source, platform-independent, communitysupported software for describing and comparing microbial communities. Appl Environ Microbiol
75:7537–7541.
243.
Bar-Joseph Z, Gerber G, Jaakkola T, Gifford D, Simon I (2003) Continuous representations of
time series gene expression data. J Comput Biol 3-4:341–356.
244.
Byers JP, Sarver JG (2009) in Pharmacology: Principles and Practice, eds Hacker M, Messer W,
Bachmann K (Elsevier), pp 201–277.
245.
Valvano M, Messner P, Kosma P (2002) Novel pathways for biosynthesis of nucleotide-activated
glycero-manno-heptose precursors of bacterial glycoproteins and cell surface polysaccharides.
190
Microbiology 148:1979–1989.
246.
Kneidinger B et al. (2002) Biosynthesis pathway of ADP-L-glycero-β-D-manno-heptose in
Escherichia coli. J Bacteriol 184:363–369.
247.
Wang L et al. (2010) Divergence of biochemical function in the HAD superfamily: D-glycero-Dmanno-heptose-1,7-bisphosphate phosphatase (GmhB). Biochemistry 49:1072–81.
248.
Chiang S, Mekalanos J (1999) rfb mutations in Vibrio cholerae do not affect surface production of
toxin-coregulated pili but still inhibit intestinal colonization. Infect Immun 67:976–980.
249.
Burns S, Hull S (1998) Comparison of Loss of Serum Resistance by Defined Lipopolysaccharide
Mutants and an Acapsular Mutant of UropathogenicEscherichia coli O75: K5. Infect Immun
66:4244–4253.
250.
Wexler HM, Tenorio E, Pumbwe L (2009) Characteristics of Bacteroides fragilis lacking the
major outer membrane protein, OmpA. Microbiology 155:2694–706.
251.
Sato K et al. (2010) OmpA variants affecting the adherence of ulcerative colitis-derived
Bacteroides vulgatus. J Med Dent Sci 57:55–64.
252.
Soulas C et al. (2000) Cutting Edge: Outer Membrane Protein A (OmpA) Binds to and Activates
Human Macrophages. J Immunol 165:2335–2340.
253.
Mehra R, Drabble W (1981) Dual Control of the gua Operon of Escherichia coli K12 by Adenine
and Guanine Nucleotides. J Gen Microbiol 123:27–37.
254.
Ratnayake-Lecamwasam M, Serror P, Wong K-W, Sonenshein A (2001) Bacillus subtilis CodY
represses early-stationary-phase genes by sensing GTP levels. Genes Dev 15:1093–1103.
255.
Buckstein MH, He J, Rubin H (2008) Characterization of nucleotide pools as a function of
physiological state in Escherichia coli. J Bacteriol 190:718–26.
256.
Pang B et al. (2012) Defects in purine nucleotide metabolism lead to substantial incorporation of
xanthine and hypoxanthine into DNA and RNA. Proc Natl Acad Sci U S A 109:2319–24.
257.
Sonnenburg ED et al. (2010) Specificity of polysaccharide use in intestinal bacteroides species
determines diet-induced microbiota alterations. Cell 141:1241–52.
258.
Blattner FR et al. (1997) The complete genome sequence of Escherichia coli K-12. Science
277:1453–62.
259.
Keseler IM et al. (2011) EcoCyc: a comprehensive database of Escherichia coli biology. Nucleic
Acids Res 39:D583–90.
191
260.
Gaudin HM, Silverman PM (1993) Contributions of promoter context and structure to regulated
expression of the F plasmid fraV promoter in Escherichia coii K-12. Mol Microbiol 8:335–342.
261.
Guan L, Murphy FD, Kaback HR (2002) Surface-exposed positions in the transmembrane helices
of the lactose permease of Escherichia coli determined by intermolecular thiol cross-linking. Proc
Natl Acad Sci U S A 99:3475–80.
262.
Soupene E et al. (2003) Physiological Studies of Escherichia coli Strain MG1655 : Growth
Defects and Apparent Cross-Regulation of Gene Expression. J Bacteriol 185:5611–5626.
263.
Weickert MJ, Adhyat S (1992) Isorepressor of the gal Regulon in Escherichia coli. J Mol Biol
226:69–83.
264.
Weickert MJ, Adhya S (1993) Control of transcription of gal repressor and isorepressor genes in
Escherichia coli. J Bacteriol 175:251–8.
265.
Juge N (2012) Microbial adhesins to gastrointestinal mucus. Trends Microbiol 20:30–9.
266.
Freter R, Brickner H, Botney M, Cleven D, Aranki A (1983) Mechanisms that control bacterial
populations in continuous-flow culture models of mouse large intestinal flora. Infect Immun
39:676.
267.
Maltby R, Leatham-Jensen MP, Gibson T, Cohen PS, Conway T (2013) Nutritional basis for
colonization resistance by human commensal Escherichia coli strains HS and Nissle 1917 against
E. coli O157:H7 in the mouse intestine. PLoS One 8:e53957.
268.
Lee SM et al. (2013) Bacterial colonization factors control specificity and stability of the gut
microbiota. Nature 501:426–9.
269.
Rakoff-Nahoum S, Coyne MJ, Comstock LE (2014) An ecological network of polysaccharide
utilization among human intestinal symbionts. Curr Biol 24:40–9.
270.
Ringel Y, Quigley EMM, Lin HC (2012) Using Probiotics in Gastrointestinal Disorders. Am J
Gastroenterol Suppl 1:34–40.
271.
Bermúdez-Humarán LG, Kharrat P, Chatel J-M, Langella P (2011) Lactococci and lactobacilli as
mucosal delivery vectors for therapeutic proteins and DNA vaccines. Microb Cell Fact 10 Suppl
1:S4.
272.
Motta J-P et al. (2012) Food-grade bacteria expressing elafin protect against inflammation and
restore colon homeostasis. Sci Transl Med 4:158ra144.
273.
Wells JM, Mercenier A (2008) Mucosal delivery of therapeutic and prophylactic molecules using
lactic acid bacteria. Nat Rev Microbiol 6:349–62.
192
274.
Lawley TD, Walker AW (2013) Intestinal colonization resistance. Immunology 138:1–11.
275.
Bäckhed F et al. (2012) Defining a healthy human gut microbiome: current concepts, future
directions, and clinical applications. Cell Host Microbe 12:611–22.
276.
Cohen ML (1992) Epidemiology of drug resistance: implications for a post-antimicrobial era.
Science 257:1050–5.
277.
Shoemaker N, Vlamakis H, Hayes K, Salyers A (2001) Evidence for extensive resistance gene
transfer among Bacteroides spp. and among Bacteroides and other genera in the human colon.
Appl Environ Microbiol 67:561–568.
278.
Zhang F, Luo W, Shi Y, Fan Z, Ji G (2012) Should we standardize the 1,700-year-old fecal
microbiota transplantation? Am J Gastroenterol 107:1755; author reply p.1755–6.
279.
Brandt LJ (2013) American Journal of Gastroenterology Lecture: Intestinal microbiota and the
role of fecal microbiota transplant (FMT) in treatment of C. difficile infection. Am J Gastroenterol
108:177–85.
280.
Hehemann J-H et al. (2010) Transfer of carbohydrate-active enzymes from marine bacteria to
Japanese gut microbiota. Nature 464:908–12.
281.
Stecher B et al. (2012) Gut inflammation can boost horizontal gene transfer between pathogenic
and commensal Enterobacteriaceae. Proc Natl Acad Sci U S A 109:1269–74.
282.
Dionisio F, Matic I, Radman M, Rodrigues OR, Taddei F (2002) Plasmids spread very fast in
heterogeneous bacterial communities. Genetics 162:1525–32.
283.
Pansegrau W et al. (1994) Complete nucleotide sequence of Birmingham IncP alpha plasmids. J
Mol Biol 239:623–663.
284.
Smith CJ, Rogers MMB, McKee ML (1992) Heterologous gene expression in Bacteroides fragilis.
Plasmid 27:141–54.
285.
Rasmussen JL, Odelson D a, Macrina FL (1987) Complete nucleotide sequence of insertion
element IS4351 from Bacteroides fragilis. J Bacteriol 169:3573–80.
286.
Altschul SF et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database
search programs. Nucleic Acids Res 25:3389–402.
287.
Guiney DG, Hasegawa P, Davis CE (1984) Plasmid transfer from Escherichia coli to Bacteroides
fragilis: differential expression of antibiotic resistance phenotypes. Proc Natl Acad Sci U S A
81:7203–6.
193
288.
Garrigues-Jeanjean N, Wittmer A, Ouriet MM., Duval-Iflah Y (1999) Transfer of the shuttle
vector pRRI207 between Escherichia coli and Bacteroides spp. in vitro and in vivo in the digestive
tract of axenic mice and in gnotoxenic mice inoculated with a human microflora. FEMS Microbiol
Ecol 29:33–43.
289.
Trieu-Cuot P, Carlier C, Martin P, Courvalin P (1987) Plasmid transfer by conjugation from
Escherichia coli to Gram-positive bacteria. FEMS Microbiol Lett 48:289–294.
290.
Trieu-Cuot P, Carlier C, Courvalin P (1988) Conjugative plasmid transfer from Enterococcus
faecalis to Escherichia coli. J Bacteriol 170:4388–91.
291.
Shkoporov AN et al. (2008) Characterization of plasmids from human infant Bifidobacterium
strains: sequence analysis and construction of E. coli-Bifidobacterium shuttle vectors. Plasmid
60:136–48.
292.
Horvath P, Barrangou R (2010) CRISPR/Cas, the immune system of bacteria and archaea. Science
327:167–70.
293.
Palmer KL, Gilmore MS (2010) Multidrug-resistant enterococci lack CRISPR-cas. MBio
1:e00227–10.
294.
Groman NB (1953) The relation of bacteriophage to the change of Corynebacterium diphtheriae
from avirulence to virulence. Science 117:297–9.
295.
Freeman V (1951) Studies on the virulence of bacteriophage-infected strains of Corynebacterium
diphtheriae. J Bacteriol 61:675–688.
296.
Betley M, Mekalanos J (1985) Staphylococcal enterotoxin A is encoded by phage. Science
456:233–235.
297.
Acheson DWK et al. (1998) In vivo transduction with shiga toxin 1-encoding phage. Infect Immun
66:4496–4498.
298.
Waldor M, Mekalanos J (1996) Lysogenic conversion by a filamentous phage encoding cholera
toxin. Science 272:1910–1914.
299.
Deltcheva E et al. (2011) CRISPR RNA maturation by trans-encoded small RNA and host factor
RNase III. Nature 471:602–7.
300.
Bikard D, Hatoum-Aslan A, Mucida D, Marraffini LA (2012) CRISPR interference can prevent
natural transformation and virulence acquisition during in vivo bacterial infection. Cell Host
Microbe 12:177–86.
301.
Cheng K, Smith G (1984) Recombinational hotspot activity of Chi-like sequences. J Mol Biol
194
151:371–377.
302.
Pearson GD, Woods A, Chiang SL, Mekalanos JJ (1993) CTX genetic element encodes a sitespecific recombination system and an intestinal colonization factor. Proc Natl Acad Sci U S A
90:3750–3754.
303.
Esvelt KM et al. (2013) Orthogonal Cas9 proteins for RNA-guided gene regulation and editing.
Nat Methods 10:1116–21.
304.
Iwanaga M, Yamamoto K (1985) New medium for the production of cholera toxin by Vibrio
cholerae O1 biotype El Tor. J Clin Microbiol 22:405–8.
305.
Arnold R et al. (2012) Emergence of Klebsiella pneumoniae Carbapenemase (KPC)- Producing
Bacteria. South Med J 104:40–45.
306.
Moellering Jr RC (2010) NDM-1—A Cause for Worldwide Concern. N Engl J Med 363:2377–
2379.
307.
Courvalin P (2006) Vancomycin Resistance in Gram-Positive Cocci. Clin Infect Dis 42:S25–34.
308.
Dalsgaard A, Forslund A, Sandvang D, Arntzen L, Keddy K (2001) Vibrio cholerae O1 outbreak
isolates in Mozambique and South Africa in 1998 are multiple-drug resistant, contain the SXT
element and the aadA2 gene located on class 1 integrons. J Antimicrob Chemother 48:827–38.
309.
Beaber JW, Burrus V, Hochhut B, Waldor MK (2002) Comparison of SXT and R391, two
conjugative integrating elements: Definition of a genetic backbone for the mobilization of
resistance determinants. Cell Mol Life Sci 59:2065–2070.
310.
Nishimasu H et al. (2014) Crystal structure of Cas9 in complex with guide RNA and target DNA.
Cell 156:935–49.
311.
Van Hoek AHAM et al. (2011) Acquired antibiotic resistance genes: an overview. Front
Microbiol 2:203.
312.
Friedman-Ohana R, Karunker I, Cohen A (1998) Chi-dependent intramolecular recombination in
Escherichia coli. Genetics 148:545–57.
313.
Rund SA, Rohde H, Sonnenborn U, Oelschlaeger TA (2013) Antagonistic effects of probiotic
Escherichia coli Nissle 1917 on EHEC strains of serotype O104:H4 and O157:H7. Int J Med
Microbiol 303:1–8.
314.
Halpern D et al. (2007) Identification of DNA motifs implicated in maintenance of bacterial core
genomes by predictive modeling. PLoS Genet 3:1614–21.
195
315.
Duerkop BA, Clements C V, Rollins D, Rodrigues JLM, Hooper L V (2012) A composite
bacteriophage alters colonization by an intestinal commensal bacterium. Proc Natl Acad Sci U S A
109:17621–6.
316.
Chibani-Chennoufi S et al. (2004) In vitro and in vivo bacteriolytic activities of Escherichia coli
phages: implications for phage therapy. Antimicrob Agents Chemother 48:2558–2569.
317.
Abedon ST, Kuhl SJ, Blasdel BG, Kutter EM (2011) Phage treatment of human infections.
Bacteriophage 1:66–85.
318.
Rossmann FS et al. (2015) Phage-mediated Dispersal of Biofilm and Distribution of Bacterial
Virulence Genes Is Induced by Quorum Sensing. PLOS Pathog 11:e1004653.
319.
Mojica FJM, Díez-Villaseñor C, García-Martínez J, Almendros C (2009) Short motif sequences
determine the targets of the prokaryotic CRISPR defence system. Microbiology 155:733–40.
320.
Kutter E (2009) Phage host range and efficiency of plating. Methods Mol Biol 501:141–9.
321.
Myhal ML, Laux DC, Cohen PS (1982) Relative colonizing abilities of human fecal and K 12
strains of Escherichia coli in the large intestines of streptomycin-treated mice. Eur J Clin
Microbiol 1:186–92.
322.
Bourdin G et al. (2014) Amplification and purification of T4-like escherichia coli phages for
phage therapy: from laboratory to pilot scale. Appl Environ Microbiol 80:1469–76.
323.
Macconkey A (1905) Lactose-Fermenting Bacteria in Faeces. J Hyg (Lond) 5:333–79.
324.
Kotula JW et al. (2014) Programmable bacteria detect and record an environmental signal in the
mammalian gut. Proc Natl Acad Sci U S A 111:4838–43.
325.
Timms AR, Steingrimsdottir H, Lehmann AR, Bridges BA (1992) Mutant sequences in the rpsL
gene of Escherichia coli B/r: mechanistic implications for spontaneous and ultraviolet light
mutagenesis. Mol Gen Genet 232:89–96.
326.
Yaung SJ, Esvelt KM, Church GM (2014) CRISPR/Cas9-Mediated Phage Resistance Is Not
Impeded by the DNA Modifications of Phage T4. PLoS One 9:e98811.
327.
Lehman IR, Pratt EA (1960) On the structure of the glucosylated hydroxymethylcytosine
nucleotides of coliphages T2, T4, and T6. J Biol Chem 235:3254–9.
328.
Kelleher J, Raleigh E (1991) A novel activity in Escherichia coli K-12 that directs restriction of
DNA modified at CG dinucleotides. J Bacteriol 173:5220–3.
329.
Jinek M et al. (2012) A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial
196
immunity. Science 337:816–21.
330.
Barrangou R et al. (2007) CRISPR provides acquired resistance against viruses in prokaryotes.
Science 315:1709–12.
331.
Marraffini LA, Sontheimer EJ (2008) CRISPR interference limits horizontal gene transfer in
staphylococci by targeting DNA. Science 322:1843–5.
332.
Grissa I, Vergnaud G, Pourcel C (2007) The CRISPRdb database and tools to display CRISPRs
and to generate dictionaries of spacers and repeats. BMC Bioinformatics 8:172.
333.
Díez-Villaseñor C, Almendros C, García-Martínez J, Mojica FJM (2010) Diversity of CRISPR
loci in Escherichia coli. Microbiology 156:1351–61.
334.
Touchon M et al. (2011) CRISPR distribution within the Escherichia coli species is not suggestive
of immunity-associated diversifying selection. J Bacteriol 193:2460–7.
335.
Toro M et al. (2014) Association of Clustered Regularly Interspaced Short Palindromic Repeat
(CRISPR) Elements with Specific Serotypes and Virulence Potential of Shiga Toxin-Producing
Escherichia coli. Appl Environ Microbiol 80:1411–20.
336.
Grissa I, Vergnaud G, Pourcel C (2007) CRISPRFinder: a web tool to identify clustered regularly
interspaced short palindromic repeats. Nucleic Acids Res 35:W52–7.
337.
Semenova E et al. (2011) Interference by clustered regularly interspaced short palindromic repeat
(CRISPR) RNA is governed by a seed sequence. Proc Natl Acad Sci U S A 108:10098–103.
338.
Georgopoulos CP (1967) Isolation and preliminary characterization of T4 mutants with
nonglucosylated DNA. Biochem Biophys Res Commun 28:179–184.
339.
Warren RAJ (1980) Modified bases in bacteriophage DNAs. Annu Rev Microbiol 34:137–58.
340.
Petrov VM, Ratnayaka S, Nolan JM, Miller ES, Karam JD (2010) Genomes of the T4-related
bacteriophages as windows on microbial genome evolution. Virol J 7:292.
341.
Sheludchenko MS, Huygens F, Hargreaves MH (2010) Highly discriminatory single-nucleotide
polymorphism interrogation of Escherichia coli by use of allele-specific real-time PCR and
eBURST analysis. Appl Environ Microbiol 76:4337–45.
342.
Dupuis M-È, Villion M, Magadán AH, Moineau S (2013) CRISPR-Cas and restrictionmodification systems are compatible and increase phage resistance. Nat Commun 4:2087.
343.
Hsu PD et al. (2013) DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol
31:827–32.
197
344.
Monod C, Repoila F, Kutateladze M, Tétart F, Krisch HM (1997) The genome of the pseudo Teven bacteriophages, a diverse group that resembles T4. J Mol Biol 267:237–49.
345.
Snyder L, Gold L, Kutter E (1976) A gene of bacteriophage T4 whose product prevents true late
transcription on cytosine-containing T4 DNA. Proc Natl Acad Sci U S A 73:3098–102.
346.
Kornberg S, Zimmerman S, Kornberg A (1961) Glucosylation of deoxyribonucleic acid by
enzymes from bacteriophage-infected Escherichia coli. J Biol Chem 236:1487–1493.
347.
Choo Y (1998) Recognition of DMA methylation by zinc fingers. Nat Struct Biol 5:264–265.
348.
Valton J et al. (2012) Overcoming transcription activator-like effector (TALE) DNA binding
domain sensitivity to cytosine methylation. J Biol Chem 287:38427–32.
349.
Fineran PC et al. (2014) Degenerate target sites mediate rapid primed CRISPR adaptation. Proc
Natl Acad Sci U S A 111:E1629–38.
350.
Deveau H et al. (2008) Phage response to CRISPR-encoded resistance in Streptococcus
thermophilus. J Bacteriol 190:1390–1400.
351.
Levin BR, Moineau S, Bushman M, Barrangou R (2013) The population and evolutionary
dynamics of phage and bacteria with CRISPR-mediated immunity. PLoS Genet 9:e1003312.
352.
Magadán AH, Dupuis M-È, Villion M, Moineau S (2012) Cleavage of phage DNA by the
Streptococcus thermophilus CRISPR3-Cas system. PLoS One 7:e40913.
353.
Garneau JE et al. (2010) The CRISPR/Cas bacterial immune system cleaves bacteriophage and
plasmid DNA. Nature 468:67–71.
354.
Bondy-Denomy J, Pawluk A, Maxwell KL, Davidson AR (2013) Bacteriophage genes that
inactivate the CRISPR/Cas bacterial immune system. Nature 493:429–32.
355.
Seed KD, Lazinski DW, Calderwood SB, Camilli A (2013) A bacteriophage encodes its own
CRISPR/Cas adaptive response to evade host innate immunity. Nature 494:489–91.
356.
Yaung SJ, Esvelt KM, Church GM (2015) Complete Genome Sequences of T4-Like
Bacteriophages RB3, RB5, RB6, RB7, RB9, RB10, RB27, RB33, RB55, RB59, and RB68.
Genome Announc 3:e01122–14.
357.
Russell R (1967) Speciation among the T-even bacteriophages. Dissertation (California Institute of
Technology).
358.
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence
data. Bioinformatics 30:2114–2120.
198
359.
Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn
graphs. Genome Res 18:821–9.
360.
Delcher AL, Bratke KA, Powers EC, Salzberg SL (2007) Identifying bacterial genes and
endosymbiont DNA with Glimmer. Bioinformatics 23:673–9.
361.
Schattner P, Brooks AN, Lowe TM (2005) The tRNAscan-SE, snoscan and snoGPS web servers
for the detection of tRNAs and snoRNAs. Nucleic Acids Res 33:W686–9.
362.
Darling AE, Mau B, Perna NT (2010) progressiveMauve: multiple genome alignment with gene
gain, loss and rearrangement. PLoS One 5:e11147.
363.
Lorenz R et al. (2011) ViennaRNA Package 2.0. Algorithms Mol Biol 6:26.
364.
Robinson MD, McCarthy DJ, Smyth GK (2009) edgeR: A Bioconductor package for differential
expression analysis of digital gene expression data. Bioinformatics 26:139–140.
365.
Miller ES et al. (2003) Bacteriophage T4 genome. Microbiol Mol Biol Rev 67:86–156.
366.
Greenberg GR, He P, Hilfinger J, Tseng MJ (1994) in Molecular biology of bacteriophage T4, pp
14–27.
367.
Young P, Ohman M, Sjoberg BM (1994) Bacteriophage T4 gene 55.9 encodes an activity required
for anaerobic ribonucleotide reduction. J Biol Chem 269:27815–27818.
368.
Black LW, Showe MK, Steven AC (1994) in Molecular biology of bacteriophage T4, pp 218–258.
369.
Tamboli CP, Neut C, Desreumaux P, Colombel JF (2004) Dysbiosis in inflammatory bowel
disease. Gut 53:1–4.
370.
Church GM (2013) Reading and writing omes. Mol Syst Biol 9:642.
199