Towards in vivo editing of the human microbiome by Stephanie J. Yaung S.B. Biological Engineering Massachusetts Institute of Technology, 2010 S.B. Management Science Massachusetts Institute of Technology, 2010 Submitted to the Harvard-MIT Division of Health Sciences and Technology in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Medical Engineering and Medical Physics at the Massachusetts Institute of Technology June 2015 © 2015 Massachusetts Institute of Technology. All rights reserved. Signature of Author: _____________________________________________________________ Harvard-MIT Division of Health Sciences and Technology May 8, 2015 Certified by: ___________________________________________________________________ George M. Church, PhD Professor of Genetics Harvard Medical School Thesis Supervisor Accepted by: ___________________________________________________________________ Emery N. Brown, MD, PhD Professor of Computational Neuroscience and Health Sciences and Technology Director, Harvard-MIT Program in Health Sciences and Technology 2 Towards in vivo editing of the human microbiome by Stephanie J. Yaung Submitted to the Harvard-MIT Division of Health Sciences and Technology on May 8, 2015 in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Medical Engineering and Medical Physics Abstract The human microbiota consists of 100 trillion microbial cells that naturally inhabit the body and harbors a rich reservoir of genetic elements collectively called the microbiome. Efforts based on metagenomic sequencing of microbiomes associated with healthy and diseased individuals have revealed vast effects of microbiota on human health. However, compared to the expanding amount of sequence data, little is known about the function of these microbes and their genes. Furthermore, current clinical approaches to modify the microbiota face several challenges, including colonization resistance in competitive environments such as the gut, and imprecise ecological perturbations using antibiotics and fecal transplants. The fundamental objective of this research is to develop safe methods to genetically edit the microbiome in vivo to promote human health. The abilities to introduce commensally fit strains and to control specificity of microbial modulations are critical steps towards ecological engineering of healthy microbiota. This thesis describes strategies to investigate, propagate, and ultimately engineer desired functions in microbiota. In particular, we developed a temporal functional metagenomics method to identify genes that improved microbial fitness in the mammalian gut in vivo. We also built foundational tools for delivering genetic elements and immunizing endogenous microbiota against acquiring antibiotic resistance and toxins. In addition to leveraging bacterial conjugation and the prokaryotic defense system CRISPR-Cas9, we employed bacteriophages for depleting native strains to empty the niche for an engineered version. Our work enables applications in engineering probiotic strains with augmented fitness and anti-pathogenesis properties, tempering host autoimmunity, and combating hospital-acquired infections and enteric diseases. Thesis Supervisor: George M. Church, PhD Title: Professor of Genetics, Harvard Medical School 3 Dedication To my parents, Fangling Chang and Alan Tsu-I Yaung To my grandparents, Chin-Wan Chang and Chin-Pin Chiu To my uncle Fred Fang-Jen Chang, aunt Connie Tze-Mei Chen, and cousins Angela Chang and Bora Chang for their endless love, encouragement, and support. 4 Acknowledgements The first thing I learned in graduate school was that science is done by people. Science may be the pursuit of knowledge and objective truth, but the process of research and invention is really a human endeavor. Therefore, I would like to thank the people who made this work possible, especially those who gave me honest counsel, valuable guidance, and good company during the ups and downs. First, I am indebted to my research advisor George Church for his inspiration, kindness, and insightful advice throughout my time in graduate school. I am grateful for the opportunity to be a part of the uniquely inventive environment that he has fostered by bringing together diverse people and resources. I would like to thank Eric Alm and Matt Waldor for serving on my thesis committee and providing generous support and constructive feedback. My graduate work would not exist without remarkable collaborators, including Harris Wang, Kevin Esvelt, Georg Gerber, Lynn Bry, and Matt Waldor. I sincerely thank them for the exceptional training and stimulating discussions. I am incredibly honored to have had the opportunity to work closely with Harris and Kevin, who are amazing scientists and thoughtful mentors. When I first joined the Church lab, Harris let me tag along on his new microbiome adventures. Kevin also saw potential and recruited me to be a coconspirator in some of his immense undertakings. They truly helped me launch and advance my graduate research career. I am also deeply grateful to Georg and Matt, who made time to meet with me and served as unofficial advisors in several aspects of my thesis research. I was also fortunate to work with skilled colleagues in these collaborations, including Jonathan Braff, Rose Deng, and Ning Li, who contributed significantly to portions of this work. I would also like to thank Pooja Jethani, a MIT UROP student, and Takahiro Yokoi, a visiting graduate student, for their hard work and contributions. In addition to crucial human capital, I must acknowledge the financial backing for my work, sponsored by the DOE, NIH, DARPA, and NSF through grants awarded to George, Harris, Georg, and Kevin, as well as the Wyss Institute. I am also thankful for support from the NSF Graduate Research Fellowship Program and the MIT Neurometrix Presidential Fellowship. At a place like Harvard or MIT, having brainpower and willpower can become so commonplace that perhaps what stands out more is sincere compassion. In addition to my wonderful scientific collaborators, many others have contributed positively to my time in graduate school and have shown me great kindness. To past and present Church lab members, I will treasure the banter and profound and candid conversations we shared. In particular, I would like to thank Alex Chavez and Jon Scheiman for being like big brothers to me and keeping me grounded; Noah Davidsohn and Noah Taylor for being easygoing neighbors and a calm force when times were hectic; Dan Goodman and Vatsan Raman for always being supportive and caring; and Susan Byrne, Su Vora, Andie Smidler, Michael Napolitano, Alex Garruss, Nikolai 5 Eroshenko, Prashant Mali, Jay Lee, Sri Kosuri, Eric Kelsic, Di Zhang, Mike Chou, John Aach, Sara Vassallo, Mike Mee, Henry Lee, Marc Lajoie, James DiCarlo, Xavier Rios, Alex Ng, Javier Fernández Juárez, Reza Kalhor, Marc Güell, Mike Sismour, Justin Feng, Anik Debnath, George Chao, Ben Stranges, Eswar Iyer, Raj Chari, Fred Vigneault, Sven Dietz, Bobby Dhadwar, Yu Wang, Noah Donoghue, Adam Marblestone, Evan Daugharthy, Uri Laserson, Adrian Briggs, Julie Norville, Barry Wanner, Dima Ter-Ovanesyan, Matthieu Landon, Jun Teramoto, Wei Leong Chew, Jamie Rogers, Nathan Johns, Chris Guzman, Joe Negri, Mirko Palla, Gleb Kuznetsov, Mingjie Dai, Margo Monroe, Joyce Yang, Madeline Ball, Arthur Sun, Jun Li, Luhan Yang, Po-yi Huang, Alex Hernandez-Siegel, Seth Shipman, Venky Soundararajan, Ido Bachelet, Chao Li, Rigel Chan, Tara Gianoulis, Josh Mosberg, Dan Mandell, Danny Levner, Charles Fracchia, Roger Conturie, Joe Davis, Yveta Masar, Meghan Radden, Laura Glass, Stan Fields, Frank Poelwijk, and several others for lending a hand, offering input, and adding colorful memories to my Church lab experience. Being a part of the Harvard-MIT HST program, HMS Genetics, and the Wyss Institute, I would like to express my gratitude to those who kept everything running as smoothly as possible. At HST, I thank my academic advisor Richard Cohen for making sure I was on track throughout my graduate program, and Julie Greenberg, Laurie Ward, Traci Anderson, and Joe Stein for all their work behind the scenes. I would like to acknowledge Vonda Shannon, Ella Sexton, Scott Blackwell, Heidi Turcotte, and Terri Broderick at HMS Genetics, Kelly Seary at the HIM animal facility, and several individuals at the Wyss, including Susan Kelly, Jeanne Nisbet, Martin Montoya-Zavala, Joel Rivera-Cardona, Angel Velarde, Ngawang Sherpa, Amanda Graveline, Andyna Vernet, Matt Balestrieri, Rich Terry, Brian Turczyk, Marcelle Tuttle, and Ben Pruitt for their assistance. I also thank the Harvard MSI for providing a warm community for students in the microbial sciences. Furthermore, I am grateful to all the faculty, teaching staff, and fellow classmates in my HST courses, from Pathology to ICM at Mount Auburn Hospital, for the memorable learning experience and incomparable exposure to clinical medicine. I am especially thankful to fellow students in and outside of HST who were part of our weekly lunches, particularly George Xu, Luvena Ong, Hanlin Tang, James Kath, Thomas Graham, and Luis Barrera, with guest appearances by Sandeep Koshy, Vikram Juneja, and Helen Hou; I came to think of it as a support group that helped preserve our well-being. I owe much gratitude to Luvena for helping me survive graduate school; I am glad we joined neighboring labs at the Wyss and could keep each other afloat during the high tides, though I suspect I benefited much more, at least in substance from her culinary explorations. I wish to extend special thanks to George Xu, with whom I have shared a delightful and unique partnership in building Quantamerix. I also thank George and Jesse Engreitz for suggesting that I check out the Church lab during our first weeks in HST given my broad interest in technology development. To some more senior HST students, Pavitra Krishnaswamy, Dan Macaya, Alice Chen, Tim O’Shea, James Dahlman, Meghan Shan, 6 Kay Furman, Ronn Friedlander, Nate Reticker-Flynn, Alex German, and Ryan Cooper, I would like to give thanks for reassuring me that things would turn out alright. At the Wyss, many other members of the Yin, Shih, Silver, Collins, Ingber, and Joshi labs improved my graduate work and experience, including Cameron Myhrvold, Marika Ziesack, Jaeseung Hahn, Bhavik Nathwani, Thomas Schlichthärle, Maartje Bastings, Andries van der Meer, Aishwarya Sukumar, Joanna Robaszewski, Ralf Jungmann, Buz Barstow, Nadia Cohen, Sauveur Jeanty, and Peter Nguyen. I am also grateful to members of the MIT Sidney-Pacific community for their camaraderie, including Tarun Jain, Sumit Dutta, Annie Chen, Naichun Chen, and Amy Bilton. Many thanks to friends who kept me sane and connected to the real world at various points during my graduate studies: Yodit Tewelde, Erika Sandford, Orrin Barnhart, Rebecca Rich, Rebecca Gould, Mindy Eng, Michelle Princi, Heymian Wong, Nina Guo, Ruel Jerry, Jackie Holmes, Julie Paul, Alice Chi, Margaret Ding, Jackie Goldstein, Ana Chen, Hattie Chung, Lily Keung, Geena Márquez, Omar Abudayyeh, Eric Timmons, Jason Trigg, Seoung Yeon Kim, Ashley Chang, Julie Wu, Ankur Mandhania, Rakesh Popli, Sherry Wu, and many others. I would also like to express my heartfelt gratitude to those who have shaped my path, specifically advisors and mentors from my undergraduate years at MIT, including Linda Griffith, Ram Sasisekharan, Mary Camerlengo, Steve Wasserman, Agi Stachowiak, Chuck Eesley, Venky Soundararajan, Anne Hunter, David Schauer, Tina Amarnani, and Wyman Li, and at my internships outside of MIT, including Camilla Forsberg, Conan Li, Odilo Mueller, Michael McNulty, David Deng, Dorothy Yang, Mark Van Cleve, Hirdesh Uppal, Kyle Kolaja, George Zhou, Tseng-En Hu, Zhihao Lin, Yatin Gokarn, Elaine Tseng, Peter Matthews, and Frank Reynolds. They not only gave me advice and assistance at critical times, but also offered me opportunities to grow in many areas, such as leadership, communication, teamwork, and scientific maturity. To my teachers before MIT, I owe special thanks to Steve Bowen, Joyce Yamamoto, Bernadette Troyan, Diane Shires, and Bruce Compton for believing in me and encouraging me to be industrious, inquisitive, and innovative. Most of all, I am grateful to my family, to whom this thesis is dedicated. Their nurturing, unconditional support has been indispensable to my growth. They have taught me courage, resilience, integrity, and self-reflection; demonstrated giving without expectation; and shown me what it means to be conscientious, empathetic, and humble – I hope I have not disappointed. Moreover, I could not have asked for better parents; Mom and Dad have been my most ardent supporters, never wavering in their faith in my endurance and abilities to chase my dreams. Finally, I apologize for the laundry list nature and incompleteness of these acknowledgements – countless pages would be required to properly thank everyone and provide specific vignettes. Instead, I will end by saying, to everyone who has contributed, thank you for making me a better person and a better scientist. 7 Contents Abstract 3 Dedication 4 Acknowledgements 5 Contents 8 List of Figures 11 List of Tables 14 Chapter 1 Introduction 15 1.1 Scope 15 1.2 Recent progress in engineering human-associated microbiomes 17 1.2.1 Abstract 17 1.2.2 Introduction 17 1.2.3 Microbiota, host, and disease 19 1.2.4 Enabling tools for engineering the microbiota 21 1.2.5 Perspectives 31 1.2.6 Acknowledgements 32 Chapter 2 Improving microbial fitness in the mammalian gut using in vivo temporal functional metagenomics sequencing 33 2.1 Abstract 33 2.2 Introduction 34 2.3 Materials and Methods 36 2.3.1 Bacterial strains and growth conditions 36 2.3.2 Library generation 37 2.3.3 Plasmid retention 37 2.3.4 In vitro selection 37 2.3.5 In vivo selection 38 2.3.6 Colony PCR and Sanger sequencing 38 2.3.7 DNA extraction and PCR amplification of inserts for Illumina sequencing 39 2.3.8 High-throughput sequencing and analysis of in vitro library selection data 40 8 2.3.9 2.4 High-throughput sequencing and processing of in vivo selection data 41 2.3.10 Statistical analyses of in vivo selection data 44 2.3.11 Whole genome sequencing of isolated clones from in vivo selection 45 2.3.12 Growth assays 46 Results 47 2.4.1 Library construction and characterization 47 2.4.2 In vitro stability and selection by media condition 49 2.4.3 In vivo library selection in germ-free mice 53 2.4.4 Characterization of in vivo library population dynamics 54 2.4.5 Genes showing transient selection during early gut colonization 57 2.4.6 Genes showing long-term selection during gut colonization 59 2.4.7 In vivo genomic stability of E. coli recipient strain 65 2.5 Discussion 69 2.6 Data Availability 71 2.7 Acknowledgements 72 Chapter 3 3.1 3.2 3.3 Delivering and maintaining genetic elements 73 Background 73 3.1.1 Limitations of current microbiota manipulations 74 3.1.2 Horizontal gene transfer 75 Engineering horizontal gene transfer networks 76 3.2.1 Introduction 76 3.2.2 Materials and Methods 76 3.2.3 Results 83 3.2.4 Discussion 94 3.2.5 Acknowledgements 95 Immunizing strains against acquisition of antibiotic resistance and toxins 96 3.3.1 Introduction 96 3.3.2 Materials and Methods 97 3.3.3 Results 99 3.3.4 Discussion 108 3.3.5 Acknowledgements 110 9 Chapter 4 Replacing gut microbial strains with precision using phages and CRISPR 111 4.1 Background 111 4.2 Phage-assisted niche depletion in the murine gut 114 4.2.1 Introduction 114 4.2.2 Materials and methods 114 4.2.3 Results 119 4.2.4 Discussion 124 4.2.5 Acknowledgements 125 4.2.6 Supplementary figures 126 4.3 4.4 4.5 CRISPR/Cas9-mediated phage resistance is not impeded by T4 DNA modifications 130 4.3.1 Abstract 130 4.3.2 Introduction 130 4.3.3 Materials and methods 131 4.3.4 Results 133 4.3.5 Discussion 141 4.3.6 Acknowledgements 143 Complete genome sequences of 11 T4-like bacteriophages 144 4.4.1 Abstract 144 4.4.2 Genome announcement 144 4.4.3 Acknowledgements 146 Generating effective CRISPR spacers against bacteriophages 147 4.5.1 Introduction 147 4.5.2 Materials and methods 148 4.5.3 Results 154 4.5.4 Discussion 162 4.5.5 Acknowledgements 171 Chapter 5 Bibliography 10 Conclusions and outlook on microbiome engineering 172 174 List of Figures Figure 1-1 Framework for engineering human-associated microbiota. 18 Figure 1-2 Composition of the human gut microbiome during development with respect to microbial diversity and population stability. 20 Changes in the composition of human microbiota during disease states compared to healthy states. 22 Figure 1-4 Approaches to human microbiome engineering. 23 Figure 1-5 Genetic tractability of abundant or relevant human-associated microbial genera. 25 Figure 2-1 Experimental design. 36 Figure 2-2 Double digestion and PCR protocol for sequencing. 40 Figure 2-3 Technical reproducibility of library sequencing protocol. 47 Figure 2-4 Input library characterization. 48 Figure 2-5 Insert distribution over time in in vitro selection. 49 Figure 2-6 Insert distribution over time in in vivo selection. 50 Figure 2-7 In vivo selection experiments. 53 Figure 2-8 Distribution of mapped bases to each Bt gene by mouse. 55 Figure 2-9 COG functional categories of bases mapped to the entire Bt genome averaged across the five mice. 56 Figure 2-10 BT_1759 glycoside hydrolase selection kinetics. 60 Figure 2-11 BT_1759 glycoside hydrolase read mapping profile. 61 Figure 2-12 BT_1759 glycoside hydrolase functional characterization in sucrose media. 62 Figure 2-13 BT_0370 galactokinase and BT_371 glucose/galactose transporter. 64 Figure 2-14 Growth characterization of clones with genomic SNVs. 68 Figure 3-1 Maps of plasmids used in this study. 77 Figure 3-2 Triplicate design to minimize effects of evaporation in edge wells. 81 Figure 3-3 Conjugation mating experimental workflow. 82 Figure 3-4 Example validation for qPCR primer pair. 83 Figure 3-5 First set of growth curves by bacterial strain. 84 Figure 3-6 First set of growth curves by media condition. 85 Figure 3-7 Second set of growth curves by bacterial strain. 86 Figure 3-8 Second set of growth curves by media condition. 87 Figure 3-9 Third set of growth data. 88 Figure 1-3 11 Figure 3-10 Antibiotic resistance profiles of representative microbiota species. 90 Figure 3-11 Design of Cas9 cassette with genome-copying feature. 97 Figure 3-12 Example CRISPR spacer validation assay in V. cholerae. 101 Figure 3-13 Escapees recombine at repeat regions to excise the spacer. 103 Figure 3-14 Alternative CRISPR repeats with base substitutions. 104 Figure 3-15 Alternative CRISPR repeats with truncations. 105 Figure 3-16 Alternative CRISPR repeats with length 18 nt and one to two mismatches. 106 Figure 3-17 Stable incorporation of engineered Cas9 mobile elements in E. coli. 107 Figure 4-1 Strain rotation scheme using phage and corresponding susceptible and Cas9mediated resistant host strains. 112 Figure 4-2 Mouse experiment 1 design to test effect of phage and/or sugar. 117 Figure 4-3 Mouse experiment 2 design to test effect of repeated phage dosing. 118 Figure 4-4 Biomass of YFP- and YFP+ cells in mouse experiment 1. 120 Figure 4-5 Fraction of replaced cells in mouse experiment 1. 121 Figure 4-6 Biomass of YFP- and YFP+ cells in mouse experiment 2. 122 Figure 4-7 Fraction of replaced cells in mouse experiment 2. 123 Figure 4-8 Raw data points from mouse experiment 1. 126 Figure 4-9 Raw data points from mouse experiment 2. 127 Figure 4-10 Individual mouse data from experiment 2. 128 Figure 4-11 Raw data points for mice #24 and #27. 129 Figure 4-12 Native E. coli spacers target phage with modified DNA. 134 Figure 4-13 Cas9 cuts methylated cytosines and adenosines in E. coli. 135 Figure 4-14 Cas9 reduces E. coli susceptibility to phages T7 and RB49. 137 Figure 4-15 Cas9 reduces E. coli susceptibility to phages T4 and T4 gt. 139 Figure 4-16 Restriction digest of phages. 140 Figure 4-17 Efficiency of plating of T4 gt on wild-type E. coli K-12. 142 Figure 4-18 Host range of T4-like phages. 147 Figure 4-19 Spacer Y confers protection in phages T4, RB49, and RB69. 148 Figure 4-20 Library construction and sequencing design. 149 Figure 4-21 Mock library composition of T4 spacers. 155 Figure 4-22 Mock library selection enriched for effective spacer. 155 Figure 4-23 Fold change of spacers after phage selection. 156 12 Figure 4-24 Host strain differences across selection experiments. 157 Figure 4-25 Initial validation of top spacers using phage-embedded agar. 159 Figure 4-26 Semi-quantitative results of initial validation screen of top anti-phage spacers. 160 Figure 4-27 Quantitative validation of screened spacers using plaque assays. Figure 4-28 Nucleotide frequencies at each position in the spacer sequence across libraries. 166 Figure 4-29 Enriched regions on the phage T6 genome. 167 Figure 4-30 Enriched regions on the phage RB15 genome. 168 Figure 4-31 Enriched regions on the phage RB33 genome. 169 Figure 4-32 Enriched regions on the phage RB69 genome. 170 Figure 5-1 Engineering microbiomes from diseased to healthy states. 173 161 13 List of Tables Table 2-1 Primers used in the study. 38 Table 2-2 Summary of sequencing metrics for in vitro experiments. 41 Table 2-3 Summary of sequencing metrics for in vivo experiments. 43 Table 2-4 Summary of metrics for whole genome sequencing of E. coli strains. 46 Table 2-5 Bt genes significantly enriched or depleted at Day 6 or 7 in vitro. 52 Table 2-6 Statistical testing of in vivo selection of Bt genes. 58 Table 2-7 Genetic variants in mouse-isolated clones identified by whole genome sequencing. 67 Table 3-1 Composition of first set of growth media. 78 Table 3-2 Composition of second set of growth media. 79 Table 3-3 Composition of third set of growth media. 80 Table 3-4 Composition of the “3:2pas” medium. 89 Table 3-5 List of species-specific primers. 91 Table 3-6 List of species-specific qPCR primers. 92 Table 3-7 Conjugation frequencies of pFD340 into Bacteroides. 93 Table 3-8 Secondary transfers from Bacteroides into E. coli. 93 Table 3-9 Conjugation frequencies of pBC003. 94 Table 3-10 Conjugation frequencies from literature. 95 Table 3-11 Validated spacers for E. coli and V. cholerae applications. 100 Table 3-12 Nested chi site regions in E. coli MG1655. 109 Table 3-13 Nested chi site regions in E. coli Nissle 1917. 109 Table 4-1 Primers to identify E. coli Nissle 1917. 116 Table 4-2 Phage escapee analysis. 143 Table 4-3 Genome features of the sequenced strains 145 Table 4-4 Pairwise similarity of phages T6, RB15, RB33, and RB69. 147 Table 4-5 Primers for amplifying sub-pools of oligonucleotides based on barcodes. 151 Table 4-6 Primers for amplifying libraries for high-throughput sequencing. 153 Table 4-7 Custom sequencing primers. 153 Table 4-8 Features of top spacers used for validation assays. 158 Table 4-9 Sequence analysis of cross-reactive spacers. 163 Table 4-10 Comparison of quantified spacer activity with library selection data. 14 164 Chapter 1 Introduction 1.1 Scope Microorganisms occupy a fascinating space in our world, not only in the environment, but also in our own bodies. With the increasingly large body of evidence that the microbes living in or on us, or microbiota, play a role in human health, one begins to wonder what these microbes do, how they may vary over time in a single person and across different individuals, and if they can be systematically and safely tuned to improve clinical outcomes. To better understand and potentially engineer microbes that have become associated with us, we have developed novel approaches to probe the function as well as precisely modulate the genetic content of these microbial residents. This dissertation aims to address two main questions: what do we edit, and how do we edit? To answer the first, we must conduct functional gene discovery. Since the mammalian gut is home to the most densely populated microbial community characterized to date, we were interested in how microbes survive and persist in this highly competitive environment, where there are limited resources (e.g., nutrients) and intense pressures (e.g., host immune system). We developed a method called temporal functional metagenomics sequencing (TFUMseq), described in Chapter 2. As a starting point, we placed raw DNA of interest, from a well-defined “donor” genome, in another species that can express the heterologous pieces of DNA. We then studied how the new DNA fragments can improve the performance of the “recipient” strain, specifically in the context of the mammalian gut. By introducing the recipient bacterial strain expressing the donor DNA fragments into mice, we identified genes contributing to improved fitness in vivo. This type of functional metagenomics approach opens the doors to more complex investigations into how different sources of donor DNA material interact with different mammalian environments and selections through various recipient strains; it is an avenue to gain insight into the in vivo dynamics of host-microbiota interaction. 15 To answer the second question of how to edit the microbiota, we must consider methods of gene delivery, and even cell-level perturbations to modulate microbial members and their DNA. We built several molecular tools that contribute to efforts in editing the microbiome, the genes that make up the microbiota. In one approach, as described in Chapter 3, bacterial strains serve as the delivery vehicle for introducing genetic material into native microbiota that could express a therapeutic protein or immunize the microbiota against acquisition of pathogenic elements from the environment. Given the inadequate amount of available data on how well DNA can transfer across different microbial species, we laid the groundwork for studying complex, defined microbial communities and rates of gene transfer. We present methods for molecular-based identification and growth-based selection of various microbiota species. Then, we harnessed the bacterial adaptive immune system, CRISPR-Cas9, to explore opportunities in lowering the likelihood of microbiota acquiring toxins or antibiotic resistance genes. We validated several antibiotic resistance and toxin sequences to target, tested designs for building large stable arrays of these sequences, and demonstrated the feasibility of an immunization payload that can copy itself into the bacterial chromosome. These findings provide a critical foundation for stably delivering engineered elements into the endogenous microbiota in order to promote health by expressing therapeutic proteins or preventing pathogenesis. In Chapter 4, we present another approach leveraging viruses that infect bacteria, or bacteriophages. These are essentially highly specific antibiotics that we apply to selectively vacate a niche in the native microbial community to allow for the colonization of an engineered bacterial strain. We piloted a mouse experiment where we tested key assumptions about phage selection pressures on targeted bacteria in vivo and phage resistance mediated by CRISPR-Cas9. To overcome challenges in identifying effective CRISPR spacers against phage, we investigated phage-encoded DNA modifications and genome sequences. We found that bulky DNA modifications do not impede Cas9 activity in the context of lytic phage infection. We also sequenced nearly a dozen T4-like phages in order to construct a large library of candidate CRISPR spacers; we designed a selection method using phages to enrich for effective anti-phage spacers. Our work enables novel targeted microbiome therapies by integrating the molecular precision of CRISPR-Cas9 with the strain-specificity of bacteriophages. The remaining portions of this introductory chapter are a review of microbiome-related research and engineering efforts, and have been adapted from: Stephanie J. Yaung, George M. Church, Harris H. Wang. Recent Progress in Engineering Human-associated Microbiomes. Methods in Molecular Biology 1151:3-25 (2014). Ref. (1) 16 1.2 Recent progress in engineering human-associated microbiomes 1.2.1 Abstract Recent progress in molecular biology and genetics open up the possibility of engineering a variety of biological systems, from single-cellular to multi-cellular organisms. The consortia of microbes that reside on the human body, the human-associated microbiota, are particularly interesting as targets for forward engineering and manipulation due to their relevance in health and disease. New technologies in analysis and perturbation of the human microbiota will lead to better diagnostic and therapeutic strategies against diseases of microbial origin or pathogenesis. Here, we discuss recent advances that are bringing us closer to realizing the true potential of an engineered human-associated microbial community. 1.2.2 Introduction Of the 100 trillion cells in the human body, 90% are microbes that naturally inhabit various body sites, including the gastrointestinal tract, nasal and oral cavities, urogenital area, and skin (2). An individual’s colon is home to 1011-1012 microbial cells/mL, the greatest density compared to any microbial habitat characterized to date (3). Many studies, such as the Human Microbiome Project and MetaHIT, have probed the vast effects of microbiota on human health and disease (2, 4–6). In addition to metagenomic sequencing (7), traditional methods of studying cells in isolation are important for elucidating molecular bases of microbial activity. However, cells do not exist in single-species cultures in nature. In fact, some species are only culturable in the presence of other microorganisms (8). This interdependence for survival amongst microbial species in a community attests to the importance of intercellular interactions, both microbemicrobe and host-microbe. Despite the fact that the human microbiota is composed of many individual microbes, these individuals work in concert to perform tasks that rival in complexity to those of more sophisticated multicellular systems. Thus, the human-associated microbiome presents a ripe opportunity for forward engineering to potentially improve human health (Figure 1-1). Here, we review recent advances in this area and outline potential avenues for future endeavors. 17 Figure 1-1 Framework for engineering human-associated microbiota. Engineering human-associated microbiota requires detailed understanding of processes that govern the natural propagation and retention of microbes in the host as well as environmental and adaptive pressures that drive the evolution of cells and communities. 18 1.2.3 Microbiota, host, and disease Contrary to traditional views, microbes are social organisms that engage with the environment and other organisms in specific ways. Microbes participate in intercellular communication through contact-dependent signaling (9), quorum-sensing (10), metabolic cooperation or competition (6), spatiotemporal organization (11), and horizontal gene transfer (HGT) (12). Human-associated microbes produce byproducts that serve as substrates utilized by other resident bacteria (13–15). For instance, accumulated hydrogen gas from bacterial sugar fermentation is removed by acetogenic, methanogenic, and sulfate-reducing gut bacteria (16). In contrast to cross-feeding relationships, microbes under stress can release bacteriocins to suppress the growth of competitors (17–19). If microbes are members of a biofilm community, they benefit from physical protection from the environment, access to nutrients trapped and distributed through channels in the biofilm, development of syntrophic relationships with other members, and the ability to share and acquire genetic traits (20, 21). Microbial populations also genetically diversify to insure against possible unstable environmental conditions (22, 23). Moreover, multispecies communities harbor a dynamic gene pool consisting of mobile genetic elements, such as transposons, plasmids, and bacteriophages, which serve as a source of HGT to share beneficial functions with neighbors to preserve community stability (24–27). Densely populated communities such as the human gut are active sites for gene transfer and reservoirs for antibiotic resistance genes (12, 28–30). Beyond microbe-microbe interactions, the microbiota co-evolves with the host as it develops, driving microbial adaptation (31–34). Core functions of microbiota benefit the host, such as extraction of otherwise inaccessible nutrients, immune system development, and protection against pathogen colonization (3, 35–38). Gut microbes are critical in intestinal angiogenesis, epithelial cell maturation, and immunological homeostasis (38–41). For example, the commensal Bacteroides fragilis produces polysaccharide A, which converts host CD4+ T cells into Foxp3+ Treg cells, producing IL-10 and inducing mucosal tolerance (42). Host diet, inflammatory responses, and aging also affect microbial community composition and function (43–46) (Figure 1-2). Indeed, aberrations in host genetics, immunology, and diet can lead to microbiota-associated human diseases. Diet-induced obesity in mice from a high-fat diet is characterized by enhanced energy harvest and an increased Firmicutes to Bacteroidetes ratio (47, 48). Furthermore, disruptions in the homeostasis between gut microbial antigens and host immunity can invoke allergy and autoimmunity, as in type 1 diabetes and multiple sclerosis (49– 51). It is thought that inflammatory bowel disease (IBD) results from inappropriate immune responses to intestinal bacteria; genes identified in genome-wide association studies highlight the role of a host imbalance between pro-inflammatory and regulatory states (49, 52). 19 Figure 1-2 Composition of the human gut microbiome during development with respect to microbial diversity and population stability. Data compiled from recent studies from the literature: a, Hong 2010 (53); b, Saulnier 2011 (54); c, Claesson 2011 (55); d, Yatsunenko 2012 (56); e, Spor 2011 (57). 20 While the host selects for microbial communities that harvest nutrients and prime the immune system, irregular microbiota composition may cause disease (Figure 1-3), including IBD (58–60), lactose intolerance (61, 62), obesity (63, 64), type I diabetes (65), arthritis (66), myocardial infarction severity (67), and opportunistic infections by pathogens such as Clostridium difficile and HIV (68–71). Microbial gut metabolism links host diet not only to body composition and obesity (72), but also chronic inflammatory states, such as IBD, type 2 diabetes, and cardiovascular disease (73–75). Intestinal microbes are also important in off-target drug metabolism, rendering digoxin, acetaminophen, and Irinotecan less effective or even toxic (76– 78). In the case of Irinotecan, a chemotherapeutic used mainly for colon cancer, the drug is metabolized by β-glucuronidases of commensal gut bacteria into a toxic form that damages the intestinal lining and causes severe diarrhea. In the oral cavity, ecological shifts in dental plaque microbiota lead to caries (cavities), gingivitis, and periodontitis (79). Dental caries arise from acidic environments generated by acidogenic (acid-forming) and aciduric (acid-tolerant) bacteria, which metabolize sugar from the host diet. Translocation of oral bacteria into other tissues results in infections, and cytokines from inflamed gums released into the bloodstream stimulate systemic inflammation. Oral bacteria have been implicated in respiratory (80, 81) and cardiovascular diseases (82–84), though mechanisms remain unclear. 1.2.4 Enabling tools for engineering the microbiota The human-associated microbial community presents a vast reservoir of non-mammalian genetic information that encode for a variety of functions essential to the mammalian host (85). Second-generation sequencing technologies have enabled us for the first time to systematically probe the genetic composition of these trillions of microbes that reside on the human body (2). The ongoing effort by the Human Microbiome Project and MetaHIT to catalog dominant microbial strains from different body sites have generated useful reference genomes for many of the representative species (86). Metagenomic shot-gun sequencing approaches of whole microbial communities, such as those found in the gut, have yielded near-complete gene catalogs that describe abundance and diversity of genes that contribute to maintenance and metabolism of the microbiota (7). In order to determine functional relationships between human-associated microbes and their concerted effect in the mammalian host, we rely on functional perturbation of the microbial community. These investigative avenues include genome-scale perturbation assays, specified community reconstitutions, and directed engineering through synthetic biology (Figure 1-4). Each approach provides us with a unique angle to attack an otherwise daunting challenging of de-convolving a highly intertwined set of microbial interactions in a very heterogeneous environment and a difficult-to-manipulate human host. Advances in both in vitro and in vivo host models have thus also facilitated research endeavors in this area, which we discuss in the following sections. 21 Figure 1-3 Changes in the composition of human microbiota during disease states compared to healthy states. Data compiled from recent studies from the literature: a, De Filippo 2010 (87); b, Peterson 2008 (88); c, Larsen 2010 (89); d, Yang 2012 (90); e, Kong 2012 (91); f. Keijser 2008 (92); g, Gao 2007 (93). 22 Figure 1-4 Approaches to human microbiome engineering. General approaches to engineer the human microbiome through design, quantitative modeling, genome-scale perturbation and analysis in in vitro and in vivo models, with the ultimate goal of producing demand-meeting applications to improve sensing, prevention, and treatment of diseases. 23 1.2.4.1 Challenges of building new genetics system Approaches to study the function of human-associated microbes by genetic manipulation rely on several fundamental capabilities, which are often the largest practical barriers to manipulate microbes genetically. First, individual microbes need to be isolated and cultured in the laboratory. Because microbes have a myriad of physiologies and require different nutritional supplement for growth, different media compositions and growth conditions need to be laboriously tested by trial-and-error to isolate and culture each microbe. These microbial culturing techniques date back to the times of Louis Pasteur and are still the dominant approach today. More recent microbial cultivation techniques use microfluidics and droplet technologies to enable the discovery of synergistic interactions between natural microbes that allow otherwise “uncultureable” organisms to be grown in laboratory conditions (8, 94, 95). Upon successful microbial cultivation, the next limiting step of microbial genetic manipulation is the transformation of foreign DNA into cells. The passage of foreign DNA (e.g. plasmids, recombinant fragments) into the cell requires overcoming the physical barriers presented by the cell wall or membrane. This task is accomplished in nature through processes such as transduction by phage, conjugation and mating, or natural competency and DNA-uptake (96, 97). Numerous laboratory techniques have been developed for microbial transformation including electroporation (98), biolistics (99), sonication (100), and chemical or heat disruption (101). Electroporation, the most common of the laboratory transformation techniques, rely on high-voltage electrocution of the bacterial sample that is thought to transiently induce pores on the cell membrane that then enable extracellular DNA to diffuse into the cell. Various protocols for electroporation of human-associated microbes have been described and are good starting points for developing genetic systems in these microbes (102, 103). Upon transformation of DNA into the cell, the DNA needs to either stably propagate intracellularly or integrate into the microbial host genome through recombination or other integration strategies. Inside the cell, stable propagation of episomal DNA such as plasmids requires DNA replication machinery that is compatible with the foreign DNA (96). Additionally, cells often use methylation and DNA modification and restriction systems to discern foreign versus host DNA through a primitive defensive mechanism that fight against viruses or other invading genetic elements. Nonetheless, these promiscuous genetic elements can often be used as a way to integrate foreign DNA into the chromosome and are often used for large-scale functional genomics (104). Taking all these parameters into considerations currently, we summarized the genetic tractability of human-associated microbes with respect to culturability, availability of full genome sequences, transfection methods, and expression and manipulation systems (Figure 1-5). Expansion of these basic genetic tools is crucial to future functional studies of human-microbiota. 24 Figure 1-5 Genetic tractability of abundant or relevant human-associated microbial genera. Genetic tractability is evaluated here by the availability of means to introduce genetic material (e.g. transformation, conjugation, or transduction), vectors, expression systems, completed genomic sequences, and culturing methods. Circles of increasing sizes indicate more genetic tractability. Protocols and demonstrated methods for genetic manipulation are listed as follows: a. Clostridium: Phillips-Jones 1995, Jennert 2000, Young 1999, Bouillaut 2011 (105–108); b. Ruminococcus: Cocconcelli 1992 (109); c. Lactobacillus: van Pijkeren 2012, Ljungh 2009, Damelin 2010, Sorvig 2005, Thompson 1996, Lizier 2010 (110–115); d. Enterococcus: Shepard 1995 (116); e. Lactococcus: 25 Holo 1995, van Pijkeren 2012 (110, 117); f. Streptococcus: McLaughlin 1995, Biswas 2008 (118, 119); g. Staphlyococcus: Lee 1995 (120); h. Listeria: Alexander 1990 (121); i. Treponema: Kuramitsu 2005 (122); j. Borrelia: Hyde 2011, Rosa 1999 (123, 124); k. Bifidobacterium: Mayo 2010 (125); l. Actinomyces: Yeung 1994 (126); m. Mycobacterium: Parish 2009, Sassetti 2001 (127, 128); n. Proprionibacterium: Luijk 2002 (129); o. Chlamydia: Binet 2009 (130); p. Porphyromonas: Belanger 2007 (131); q. Prevotella: Flint 2000, Salyers 1992 (132, 133); r. Bacteroides: Salyers 1999, Smith 1995, Bacic 2008 (134–136); s. Fusobacterium: Haake 2006 (137); t. Helicobacter: Taylor 1992, Segal 1995 (138, 139); u. Camplyobacter: Taylor 1992 (139); v. Rickettsia: Rachek 2000 (140); w. Brucella: McQuiston 1995 (141); x. Bordetella: Scarlato 1996 (142); y. Neisseria: O'Dwyer 2005, Bogdon 2002, Genco 1984 (143–145); z. Pseudomonas: Dennis 1995 (146). 1.2.4.2 Genome-scale perturbations Genome-scale perturbations are a class of genetic approaches that disrupt or perturb the expression of functional genes that contribute to relevant phenotypes by individual microbes. To dissect the function of different genes in the cell, we have relied heavily on the use of transposons, which are selfish genetic elements that can splice into and out of different locations of chromosomal DNA thereby disrupting the coding sequence (147). This classical approach, known as transposon mutagenesis, has allowed us to isolate many genetic mutants whose disrupted genes give rise to interesting phenotypes that reflect the importance of those genes to its physiology. Next-generation DNA sequencing has now enabled multiplexed genotyping of pools of transposon mutants by using molecular barcodes that then can be applied to measure the effect of genome-scale perturbations in different environmental conditions. For example, techniques such as Insertion Sequencing (INSeq) (148) utilize the inverted repeat recognition of the Himar transposase, which is also a restriction site for the type II restriction enzyme MmeI, to generate paired 16-17 bp flanking genomic sequences around the transposon that can be sequenced in pools. Thus, the defined insertion location of every transposon in the library can be determined. By sequencing this pooled mutant library pre- and post-treatment with any number of environmental perturbations, one can probe the effects of different gene disruptions on the physiology of the cell in a multiplexed fashion. Similar techniques using other transposon systems such as Tn-seq (149), high-throughput insertion tracking by deep sequencing (HITS) (150), and transposon-directed insertion-site sequencing (TraDIS) (151) have also been developed. In addition to transposon-based systems, shotgun expression libraries have been useful in discovering functional DNA elements in genomic or metagenomic DNA. Shot-gun expression libraries rely on physical shearing or restriction digestion of a donor DNA source into smaller DNA fragments that are then cloned into a gene expression vector and transformed into a host strain for functional analysis. A library of metagenomic DNA samples can for example be extracted from an environment and cloned into plasmids that are then expressed in E. coli. Selection and sequencing of the E. coli population for heterologous DNA that enable new 26 function lead to discovery of new gene elements that perform a particular function. This approach can easily identify functions such as antibiotic resistance (152), but have yielded less success with other functions. Towards forward engineering of human-associated microbes, new genome engineering tools such as trackable multiplex recombineering (TRMR) (153, 154) and multiplex automated genome engineering (MAGE) enable efficient, site-specific modification of the genome (155– 158). TRMR combines double-stranded homologous recombination (159) and molecular barcodes synthesized from DNA microarrays to generate populations of mutants that are trackable by microarray or sequencing. MAGE relies on introduction of pools of single-stranded oligonucleotides that targets defined locations of the genome to introduce regulatory mutations (156) or coding modifications (160). These and other recombineering technologies are now being developed for a variety of other organisms including gram-negative bacteria (161), lactic acid bacteria (110), Pseudomonas syringae (162), Mycobacterium tuberculosis (163), and are likely to be very useful for engineering human-associated microbes. 1.2.4.3 Reconstituted communities The community of microbes that make up the human-microbiome can be considered a “pseudo-organ” of its own. These microbes interact with one another and the mammalian host in potentially highly complex ways that may be difficult to decipher even with tractable genetic systems (164). A direct approach to study these interactions is to build reconstituted communities of microbes derived from monoculture isolates in defined combinations. This de novo reconstitution approach to build synthetic communities has significant advantages over attempts to deconvolute natural communities. Reconstituted synthetic consortium presents a tractable level of complexity in terms of number of interacting microbial species that we can track by sequencing and predict with quantitative models. In one such study, researchers inoculated 10 representative strains of the human microbiota into germ-free mice (165). The mice were then fed with defined diets of macronutrients consisting of proteins, fats, polysaccharides, and sugars. By tracking the abundance of the 10-member microbial consortium using high-throughput sequencing, the researchers could predict over 60% of the variation in species abundance as a result of diet perturbations. This avenue of investigation presents a viable approach to study the human microbiome and ways to analyze synthetically engineered microbiota. Engineered microbes have been utilized to reconstitute synthetic communities to investigate the role of metabolic exchange. One such important metabolic exchange is that of amino acids, as they are the essential constituent of proteins. Various syntrophic cross-feeding communities have been described using auxotrophic E. coli and yeast strains that require different amino acid supplementation for growth (166–168). In these syntrophic systems, metabolites that are exchanged across different biosynthetic pathways promote more syntrophic growth than those that exchanged along the same pathway, which also related to the cost of 27 biosynthesis of the amino acid metabolites. Amino acid exchange is likely a large player in driving metabolism of microbial communities as a substantial fraction of all microbes are missing biosynthesis of various metabolites and thus require growth on more rich and complex substrates that are found in the gut (169). 1.2.4.4 Microbial engineering through synthetic biology New approaches are now utilizing synthetic biology to engineer human-associated microbiota to improve health and metabolism as well as monitor and fight diseases. These efforts focus on developing genetic circuits that actuate in an engineered host cell such as E. coli that can sense and respond to changes to its environment and in the presence of particular pathogens. For example, to detect the human opportunistic pathogen Pseudomonas aeruginosa, which often causes chronic cystic fibrosis infections and colonizes the gastrointestinal tract, E. coli was engineered to detect the small diffusible molecule that is excreted by P. aeruginosa through the quorum sensing pathway (170). An engineered synthetic circuit was placed in non-pathogenic E. coli, which when placed in the presence of high-density P. aeruginosa, triggered a self-lysis program that released a narrow-spectrum bacteriocin that specifically killed the P. aeruginosa strain. Similar strategies have also been demonstrated to detect and respond to Vibrio cholerae infection using engineered E. coli that sense autoinducer-1 (AI1) molecules from V. cholerae quorum sensing pathway (171). These strategies appear to yield improved survival rates against microbial-pathogenesis in murine models. Quorum sensing systems, which normally help microbes detect local cell density, has been further enhanced to improve robustness and performance to enable coupled short-range and long-range feedback circuits that enable microbial communication across large distances in an engineered community. Other microbes have been successfully engineered to perform specific functions on human-associated surfaces such as the mucosal layer of the gut epithelium. Numerous diseases that occur along the intestinal tract are targets of such engineered approaches. For example, the probiotic strain Lactococcus lactis has been engineered to secrete recombinant human interleukin-10 in the gastrointestinal tract to reduce colitis (172, 173). Other future applications of engineered probiotics include enhancing catabolism of nutrients (e.g. lactose and gluten), modulation of the immune system, and removal of pathogens by selective toxin release (170). 1.2.4.5 In vitro host models To probe and engineer the human-associated microbial community, various in vitro models have been developed, ranging from traditional batch culturing in chemostats to microfluidic systems that incorporate host cells. Single-vessel chemostats inoculated with fecal samples from healthy individuals have helped identify horizontal gene transfer (174) and selective bacterial colonization on different carbohydrate substrates (175, 176). A multichamber continuous culture system mimicking spatial, nutritional, and pH properties of different GI tract 28 regions can be used to investigate stabilization dynamics (177–179). Similarly, the constantdepth film fermenter resembles oral biofilm (180) and has enabled studies on biofilm formation, antibiotic resistance, and horizontal gene transfer in a multispecies oral community (181, 182). To incorporate mammalian cells in studying host-microbial interactions, organ-on-a-chip microfluidic devices have been recently used. In one version of such a system, a gut-on-a-chip device, the microfluidic channel is coated with extracellular matrix and lined by human intestinal epithelial (Caco-2) cells. This system mimics intestinal flow and peristaltic motion, recapitulates columnar epithelium polarization and intestinal villi formation, and supports the growth of commensal Lactobacillus rhamnosus GG (183). These microdevices offer an opportunity to investigate host-microbiota interactions in a well-controlled manner and in physiologically relevant conditions. Inoculating with native microbiota samples provides a method to overcome the uncultivability of many microbes as well as to study collective activity and discover novel functions without a priori knowledge of community composition. However, starting with a predefined microbial community allows a controlled setting better suited for testing engineered systems. In one study analyzing the dynamics of a community representing the four main gut phyla in a chemostat, the authors propose that intrinsic microbial interactions, rather than host selective pressure, play a role in the observed colonization pattern, which was similar to what has been documented in the human gut . Similar models have been developed for oral microbiota studies. The use of predefined oral microbial inocula has helped elucidate metabolic cooperation in batch culture (13) and community development in saliva-conditioned flow cells (184). 1.2.4.6 In vivo host models In order to move into in vivo animal models that more closely represent the physiology of the human host environment, researchers have extensively utilized murine models including germ-free, gnotobiotic, and conventionally-raised mice. Gnotobiotic animals are born in aseptic conditions and reared in a sterile environment where they are exposed only to known microbial species; technically, germ-free mice are a type of gnotobiotic mice that have not been exposed to any microbes. Similar to in vitro systems, mice can be inoculated with either a natural microbiota sample or a predefined microbial community. Fecal samples, as well as oral swab and saliva samples, can then be collected from gnotobiotic mice for biochemical analysis and species quantification of gut and oral cavity microbiota. In vivo models have been used to study the transmission of antibiotic resistance in the mouse gut (185, 186) and colonization resistance in the oral cavity. Furthermore, the choice of the inoculum donor offers opportunities to compare different host selection pressures and microbial community responses. Microbiota can be transplanted not only from conventionally-raised to germ-free animals of the same species, but also inter-species, as in human microbiota into mouse, called humanized gnotobiotic mice (187). In one study, transplants from zebrafish gut microbiota into germ-free mice and mouse gut microbiota into germ-free zebrafish revealed that the resulting community conformed to the 29 native host composition, demonstrating host selection (188). Altering host diet, environment, or genetic background can also enable studies in hostmicrobiota interactions. One method to gain insight into the role of microbial communities in disease is to utilize mice with recapitulated pathologies. For example, IL-10-/-, ob-/-, apoE-/-, and TLR2-/- or TLR5-/- mice are models for colitis, obesity, hypercholesterolemia, and metabolic syndrome, respectively (47, 188–191). To generate antigen- or pathogen-specific phenotypes, mice can be infected with Salmonella typhimurium to study colitis , or Citrobacter rodentium as a model for attaching and effacing pathogens, such as enterohemorrhagic E.coli (192, 193). Furthermore, murine models with chemically induced inflammation can be tools to study chronic mucosal inflammation; dextran sodium sulfate (DSS) can induce ulcerative colitis and trinitrobenzene sulfonic acid (TNBS) can stimulate Crohn’s disease (194). To investigate oral microbiota, there are periodontal disease (195) and oral infection models (196, 197); gnotobiotic rodents can also be fed a high-sucrose cariogenic diet to promote plaque formation. Germ-free mice inoculated with defined microbes are informative models for analyzing microbial colonization and metabolic adaptation (198). For example, resident bacteria and probiotic strains adapt their substrate utilization: in the presence of Bifidobacterium longum, Bifidobacterium animalis, or Lactobacillus casei, Bacteroides thetaiotaomicron diversified its carbohydrate utilization by shifting metabolism from mucosal glycans to dietary plant polysaccharides (199). Furthermore, the effect of different diets on microbial community composition can be studied, as in gnotobiotic mice inoculated with ten sequenced gut bacterial species and fed with various levels of casein, cornstarch, sucrose, and corn oil to represent protein, polysaccharide, sugar, and fat content in the diet, respectively (165). 1.2.4.7 Computational frameworks for human-microbiomics Over the past several decades, a large number of theoretical and quantitative models have been developed to describe the cell and its behavior. Constrain-based models are used to describe metabolism of individual cells using stoichiometric representation of metabolic reactions and optimization constraints (200). Approaches such as Flux Balance Analysis (FBA) enable the analysis of metabolism under steady state assumptions by linear optimization solution methods. These methods are now being scaled to ecosystems of cells. Recent developments using multilevel objective optimization (201), and dynamic systems (202) enable the modeling of synthetic ecosystems of three or more members. Using metagenomic data of the gut microbiome, Greenblum et al generated a community-level metabolic reconstruction network of the microbiota and discovered topological variations that are associated with obesity and inflammatory bowel disease, giving rise to low-diversity and differences in community composition (203). For models that account for systems dynamics, population abundance and metabolite concentrations can be solved independently through different FBA models that are 30 iterated at each time step. This approach called dynamic multi-species metabolic modeling (DMMM) can capture scenarios of resource competition, leading to the identification of limiting metabolites (204). Other complementary models include elementary mode analysis (EMA) (205) that enable quantitative analysis of microbial ecosystems in a multicellular fashion. 1.2.5 Perspectives Reframing the microbiota community as a core set of genes, not a core set of species, opens a new front to the microbiome engineering design space. In a metagenomic study of 154 individuals, no single gut bacterial phylotype was detected at an abundant frequency amongst all the samples, a finding that is consistent with the idea that the core human gut microbiome may not be best defined by prominent species but by abundantly shared genes and functions (206). We propose that manipulation at the gene, genome, and ultimately metagenome level offers the ability for precise multicellular engineering of desirable traits in human-associated microbiota. Besides controlled perturbations of the microbiome to advance our understanding of hostmicrobiota interactions, metagenome-scale tools enable novel developments in diagnostics and therapeutics. From biosensors on the skin to reporters in the gut, there are several opportunities in monitoring the health and disease status of the human host, such as sensing nutritional deficiencies, immune imbalances, environmental toxins, or invading pathogens. Prophylactic and therapeutic avenues for human-microbiome engineering include modifying community composition, tuning metabolic activity, mediating microbe-microbe relationships, and modulating host-microbe interactions. Two current microbiota-associated treatments have shown clinical efficacy: 1) fecal transplants for recurrent Clostridium difficile infection (207) and 2) probiotics for pouchitis, which is inflammation of the ileal pouch that is created after surgical removal of the colon in ulcerative colitis patients (208–210). The main challenge is transmission of undesirable agents from donor feces to the recipient gut in fecal transplants, and native colonization resistance that would impair infiltration and growth of new species in probiotics (211, 212). Nevertheless, these successful approaches demonstrate the potential benefits of leveraging natural microorganisms and entire microbial communities. In fact, coupling organismal and functional gene level approaches would be a powerful way to engineer the native microbiota. Microbiome engineering enables multiscale systems design for the synthesis of nutrients and vitamins, enhanced digestion of gluten and lactose, decreased acidity of the oral cavity, targeted elimination of multi-drug resistant pathogens, and microbial modulation of the host immune system. As vehicles for drug delivery, commensal bacteria designed to secrete heterologous genes have been explored for treating cancer (213–215), diabetes (216), HIV (217), and IBD (172). For example, IL-10 has immunomodulatory effects in 31 IBD, but requires localized delivery at the intestinal lining to avoid the toxic side effects and low efficacy of systemic IL-10 injection. Ingestion of modified Lactococcus lactis that secrete recombinant IL-10 is safe and effective in animal models, and has been promising in human clinical trials for IBD (173, 218). Finally, besides addressing clinical safety and efficacy criteria for FDA regulatory approval (219), overall safety precautions are critical considerations to minimize unintentional risks in releasing genetically modified material into the natural environment. Rational design, such as creating auxotrophic microbes (173), for robust stability, non-pathogenicity, and containment of recombinant genetic systems will be essential in microbiome engineering. 1.2.6 Acknowledgements H.H.W. acknowledges the generous support from the National Institutes of Health Director’s Early Independence Award (grant 1DP5OD009172-01). S.J.Y. acknowledges support from the National Science Foundation Graduate Research Fellowship and the MIT Neurometrix Presidential Graduate Fellowship. G.M.C. acknowledges support from the Department of Energy Genomes to Life Center (Grant DE-FG02-02ER63445). 32 Chapter 2 Improving microbial fitness in the mammalian gut using in vivo temporal functional metagenomics sequencing This chapter has been adapted from: Stephanie J. Yaung, Luxue Deng, Ning Li, Jonathan L. Braff, George M. Church, Lynn Bry, Harris H. Wang, Georg K. Gerber. Improving microbial fitness in the mammalian gut using in vivo temporal functional metagenomics. Molecular Systems Biology 11(3):788 (2015). Ref. (220) 2.1 Abstract Elucidating functions of commensal microbial genes in the mammalian gut is challenging because many commensals are recalcitrant to laboratory cultivation and genetic manipulation. We present TFUMseq (Temporal FUnctional Metagenomics sequencing), a platform to functionally mine bacterial genomes for genes that contribute to fitness of commensal bacteria in vivo. Our approach uses metagenomic DNA to construct large-scale heterologous expression libraries that are tracked over time in vivo by deep sequencing and computational methods. To demonstrate our approach, we built a TFUMseq plasmid library using the gut commensal Bacteroides thetaiotaomicron (Bt) and introduced Escherichia coli carrying this library into germfree mice. Population dynamics of library clones revealed Bt genes conferring significant fitness advantages in E. coli over time, including carbohydrate utilization genes, with a Bt 33 galactokinase central to early colonization, and subsequent dominance by a Bt glycoside hydrolase enabling sucrose metabolism coupled with co-evolution of the plasmid library and E. coli genome driving increased galactose utilization. Our findings highlight the utility of functional metagenomics for engineering commensal bacteria with improved properties, including expanded colonization capabilities in vivo. 2.2 Introduction The mammalian gastrointestinal (GI) tract is a hostile environment for poorly adapted microbes. Nonetheless, diverse groups of microbes have evolved to prosper in the GI tract, in the setting of intense interspecies competition, physical and chemical stressors, and the host immune system (3, 6). These microorganisms also support the normal homeostatic functions of the host by helping to extract nutrients, stimulate the immune system, and provide protection against colonization by pathogens (3, 35, 36, 38, 40). Next-generation sequencing has enabled systematic studies of the mammalian microbiota, and great strides have been made in characterizing the structure of bacterial communities and their genetic potential in vivo. For instance, the Human Microbiome Project (HMP) (2, 4, 221) and MetaHIT (7) have generated maps of bacterial species abundances throughout the human body, reference genomes, and catalogs of more than 100 million microbial genes assembled from shotgun sequencing of in vivo communities. Although these studies have generated vast amounts of descriptive data, the functions of most bacterial genes in these collections remain poorly characterized or wholly unknown. Traditional methods to characterize the functions of microbial genes require the isolation, cultivation, and introduction of foreign DNA into a recipient organism. However, an estimated 60-80% of mammalian-associated microbiota species remain uncultivated (222). Even after successful culture and introduction of genetic material into a microorganism, the DNA must integrate into the microbial genome or be maintained episomally. This requires known compatible replication and restriction-modification systems, which may not be feasible for many microbes. If these barriers can be overcome, standard low-throughput methods for functional characterization of genes may be employed, or newer approaches such as transposon mutagenesis could be coupled with next-generation sequencing. In this latter approach, random locations on the genome are disrupted with a transposon containing a selectable marker; the resulting library is subjected to selection conditions and deep sequenced to determine enriched and depleted mutants (149). A limitation of this approach is that essential genes or those that are important to cell fitness are difficult to assay, since inactivation of these genes by transposon mutagenesis would be lethal to the organism under study. An additional constraint is that transposon mutagenesis may disrupt the expression of bystander genes that are near the relevant locus, thus causing confounding phenotypic effects. 34 Here, we employ an alternative approach, by building large-scale shotgun expression libraries that can confer a gain of function in the recipient bacterial strain. Our approach uses physical shearing or restriction digestion of donor DNA to generate fragments that are cloned into an expression vector and transformed into the recipient bacterial strain, for high-throughput functional screening to identify genes that confer a fitness advantage in a particular context. This approach has the advantage that the donor organism need not be readily culturable or genetically manipulable in the laboratory; moreover, it allows investigation of essential genes or those conferring a fitness advantage synergistic with the recipient organism. Functional metagenomics using environmental samples was first established for communities derived from lignocellulosic feedstocks (223), seawater (224), and soil (225). The use of shotgun libraries for functional metagenomics of mammalian-associated microbiota has been demonstrated ex vivo, such as by growing the library in media with different substrates to characterize carbohydrate active enzymes (226), prebiotic metabolism (227), glucuronidase activity (228), salt tolerance (229), and antibiotic resistance genes (152), or by using filtered lysates of the library to screen for signal modulation in mammalian cell cultures (230). This metagenomic shotgun library approach has yet to be carried out on a large-scale in vivo. To demonstrate our TFUMseq (Temporal FUnctional Metagenomics sequencing) approach, we used high-coverage genetic fragments from the genome of the fully sequenced human gut commensal Bacteroides thetaiotaomicron (Bt) (231) and cloned the fragments into a plasmid library in an Escherichia coli K-12 strain. We chose Bt because it is a common commensal strain in the human gut that persistently colonizes and possesses a broad and wellcharacterized repertoire of catabolic activities, such as sensing polysaccharides and redirecting metabolism to forage on host versus dietary glycans (232–234). We subjected the TFUMseq library to in vitro and in vivo selective pressures, collected output samples at different time points for high-throughput sequencing, and used computational methods to reconstruct the population dynamics of clones harboring donor genes (Figure 2-1). Our work is an advance over previous studies in two major aspects. First, to our knowledge, our study is the first to employ shotgun expression libraries for functional metagenomics in vivo. Important features of the mammalian gut are difficult to recapitulate in vitro, such as the host immune response. Thus, in vivo experiments are essential for investigating the function of commensal microbiota genes in the host. Second, our study leverages high-throughput sequencing and computational methods to generate detailed dynamics of the entire population subject to selection over time. This kinetic information is crucial for understanding succession events during the inherently dynamic and complex process of host colonization. 35 Figure 2-1 Experimental design. (Left panel) Map of the library backbone vector. The vector was linearized and ligated to sheared fragments of donor genome to generate the heterologous insert library. (Right panels) Passaging of the E. coli library in two liquid media conditions (top) and inoculation of the library or a control luciferase plasmid into germ-free (GF) mice (bottom). Small boxes across the time line denote sample collection points. Arrows indicate deep-sequenced samples. 2.3 Materials and Methods 2.3.1 Bacterial strains and growth conditions Bacteroides thetaiotaomicron VPI-5482 (ATCC # 29148) was grown anaerobically in a rich medium based on supplemented Brain Heart Infusion. The genomic library was maintained in an Escherichia coli K-12 strain, NEB Turbo (New England Biolabs, Ipswich, MA). E. coli strains were grown in Luria broth (LB) and supplemented with carbenicillin (final concentration 100 μg/mL) as needed. For anaerobic growth, an anaerobic jar (GasPak System, Becton Dickinson, Franklin Lakes, NJ) was used. Mouse chow (MC) filtrate was prepared by adding 150 mL deionized water to 8 g of crushed mouse chow (Mouse Breeding Diet 5021, LabDiet, St. 36 Louis, MO). The mixture was heated at 95oC for 30 minutes with mixing, passed through a 0.22 μm filter, and autoclaved. The sterility of the MC filtrate was confirmed by incubating at 37oC in aerobic and anaerobic conditions and observing no growth after several days. 2.3.2 Library generation Bacteroides thetaiotaomicron genomic DNA was isolated (DNeasy Blood & Tissue Kit, Qiagen, Venlo, Netherlands), fragmented by sonication to 3-5 kb (Covaris E210, Covaris, Woburn, MA), and size-selected and extracted by gel electrophoresis (Pippin Prep, Sage Sciences). The fragments were end-repaired (End-It DNA End-Repair Kit, Epicenter, Madison, WI) and cloned into a PCR-amplified GMV1c backbone vector via blunt-end ligation. The reaction was transformed into NEB Turbo electrocompetent E. coli cells (New England Biolabs). The library size was quantified by counting colonies formed on selective media (LB carbenicillin) after plating a fraction of the transformed cells. To assess the size of inserts successfully cloned into the library, we picked colonies for PCR amplification using primers ver2_f/r (Table 2-1) that flanked the insert site. We further confirmed the presence of inserts by submitting amplified inserts for Sanger sequencing (Genewiz, South Plainfield, NJ) and aligning sequences with the donor B. thetaiotaomicron genome. 2.3.3 Plasmid retention Individual stool pellets from Days 0.75, 1.5, 1.75, 2.5, 4, 10, 14, 21, 25, and 28 were homogenized in 10% PBS and plated on LB agar with or without carbenicillin (carb). To obtain accurate counts, colony platings were performed in triplicate and repeated at 100X dilutions if the plates were overgrown. Plasmid retention was calculated as the number of colonies grown on LB-carb plates divided by the number of colonies grown on LB only plates. 2.3.4 In vitro selection After inoculating the library in LB or MC broth, the cultures were passaged by diluting at 20X into fresh media. LB cultures were grown in aerobic conditions with shaking and passaged every day for two weeks. MC cultures were grown in anaerobic conditions without shaking and passaged every two days for two weeks, since the cultures took more time to reach saturation compared to the LB condition. 37 2.3.5 In vivo selection All of the mice used in this study were handled in accordance with protocols approved by the Harvard Medical Area Standing Committee on Animals (HMA IACUC). Male C57BL/6 mice, 6–8 weeks of age, were used. The mice were bred in the Center for Clinical and Translational Metagenomics facility and maintained in germfree conditions prior to the experiments. Germ-free mice were orally gavaged with ~2 x 108 CFU of bacteria in a volume of 200 μL on Day 0. Mice inoculated with the library were separately housed. Fecal pellets were collected at 0.5, 0.75, 1.5, 1.75, 2.5, 3, 4, 7, 10, 14, 21, 25, and 28 days post-inoculation and stored at -80oC in 10% PBS buffer. 2.3.6 Colony PCR and Sanger sequencing Individual colonies were isolated from stool samples streaked onto LB agar with carbenicillin (100 μg/mL). Colonies were grown overnight at 37oC in a 96-well plate with 200 μL of LB+carbenicillin. 0.8 μL of the culture was added to a total PCR reaction volume of 20 μL. The PCR mix (KAPA HiFi HotStart ReadyMix PCR Kit, Kapa Biosystems, Wilmington, MA) contained primers ver2_f/r (Table 2-1) that flanked the insert site. PCR amplicons were submitted for sequencing (Genewiz) and the insert sequence was mapped back to the B. thetaiotaomicron genome using BLASTn. Primers for genotyping the galK locus on the E. coli genome are listed in Table 2-1. The presence or absence of IS2 in galK was confirmed using primers galK16_chk_f/r that flanked the expected insertion site in galK. Name Sequence (5’ -> 3’) A_L AGGACGCACTGACCGAATT A_R TTTATTTGATGCCTCTAGCACGC ver2_f TTTACTTTGCAGGGCTTCCC ver2_r ACTGAGCCTTTCGTTTTATTTGATG galK16_chk_f CCTGCCACTCACACCATTCAG galK16_chk_r TGGGCGCATCGAGGGA GMV_amp_f AACAAGCTTGATATCGAATTCCTGC GMV_amp_r GACGGTACCTTTCTCCTCTTTAATGA Table 2-1 Primers used in the study. 38 2.3.7 DNA extraction and PCR amplification of inserts for Illumina sequencing DNA was extracted from collected samples in the in vitro experiment using the DNeasy Blood & Tissue Kit (Qiagen). Inserts were PCR amplified using primers ver2_f/r (Table 2-1) in KAPA HiFi HotStart Mix (Kapa Biosystems) and purified with Agencourt AMPure XP beads (Beckman Coulter, Indianapolis, IN) at a beads:sample volumetric ratio of 0.5:1. The amplicons were prepared for sequencing using the Nextera kit (Illumina, San Diego, CA). For all fecal samples from the in vivo experiment, the QIAamp DNA Stool Mini Kit (Qiagen) was used. Isolated DNA was digested with PspXI and AvrII enzymes (New England Biolabs) prior to purification with QIAquick PCR Purification Kit (Qiagen) and subsequent PCR amplification with primers A_L and A_R (Table 2-1) in KAPA HiFi HotStart Mix. The PCR reaction was purified with AMPure beads at a beads:sample volumetric ratio of 0:5:1. Initially, in our sequencing of the in vitro samples, we observed a high fraction (30-45%) of reads mapping to the backbone vector and fewer reads (20%) mapping to the B. thetaiotaomicron genome. Given the large (>3 kb) insert sizes of these libraries, traditional amplification methods evidently over-amplify the smaller vector backbone (2 kb), thereby overwhelming vectors containing actual genomic inserts. We therefore optimized the sample preparation protocol by incorporating a double digestion strategy prior to PCR amplification of the inserts (Figure 2-2). The two restriction sequences selected were the least common (of all available sites on the plasmid) in the B. thetaiotaomicron genome. With this new protocol, in our subsequent in vivo sequencing, we observed <4% of reads mapping to the backbone vector and >90% of reads mapping to the B. thetaiotaomicron genome. 39 Figure 2-2 Double digestion and PCR protocol for sequencing. (A) We used restriction sites PspXI and AvrII that flanked the insert site on the backbone vector prior to PCR-amplification of the insert with primers A_L and A_R. These two enzymes had a minimal number of restriction sites (29 for PspXI and 62 for AvrII) in the B. thetaiotaomicron genome. The gel shows the result of library PCR with or without double digestion. Double digestion appears to eliminate the dominating band corresponding to the backbone vector. (B) PCR amplicons were prepared for sequencing by the Nextera kit and size selection. After trimming off any backbone sequence, which would be present on end fragments, we mapped the reads back to the B. thetaiotaomicron genome. 2.3.8 High-throughput sequencing and analysis of in vitro library selection data Samples were sequenced on the MiSeq (Illumina) instrument at the Molecular Biology Core Facilities of the Dana-Farber Cancer Institute. Metrics for this sequencing run are provided in Table 2-2. Due to the PCR amplification protocol prior to optimization (see previous section), we observed large amounts of E. coli plasmid DNA in our sequencing reads. To maximize the reads aligned to the B. thetaiotaomicron genome, we aggressively trimmed low quality bases and removed sequences mapping to the E. coli genome or with length shorter than 20 nt. The 40 reference genome of B. thetaiotaomicron (NC_004663 and NC_004703) was downloaded from the NCBI nucleotide database (http://www.ncbi.nlm.nih.gov/). Due to the aggressive preprocessing of reads described, the length of trimmed sequences was shorter than 50 nt. Therefore, Bowtie (235) was applied instead of Bowtie2 for higher sensitivity. Default parameters were used for building a Bowtie index with the B. thetaiotaomicron chromosome and plasmid sequences. Paired-end reads were aligned to the reference genome with parameter –X 300 using Bowtie. SAM files from the Bowtie alignment were converted to indexed and sorted BAM files using SAMtools (Li, et al. 2009). Cuffdiff (236) was applied to test for differential representation of genes (i.e., the library grown in rich medium at time 0 versus the library grown in rich medium at Day 7, and the library grown in MC medium at time 0 versus the library grown in MC medium at Day 6. Media condition Timepoint (day) Paired raw reads Paired trimmed reads input library 0 5675769 5671724 LB aerobic 7 8226863 8222571 MC anaerobic 6 6672316 6668006 Table 2-2 Summary of sequencing metrics for in vitro experiments. Paired-end reads of 250 nt length were generated on the MiSeq instrument. 2.3.9 High-throughput sequencing and processing of in vivo selection data B. thetaiotaomicron genomic DNA inserts were amplified from isolated E. coli plasmids using our improved PCR protocol (Figure 2-2). After Nextera sequencing library preparation, paired-end reads of 101 nt length were generated on the HiSeq 2500 (Illumina) instrument at the Baylor College of Medicine Alkek Center for Metagenomics and Microbiome Research. Metrics for this sequencing run are provided in Table 2-3. All reads passed quality control (base quality >30) using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). To eliminate plasmid DNA sequences in reads, the reads were trimmed using custom Perl scripts that removed all flanking regions matching 15bp of the plasmid DNA on the 5’ and 3’ ends of insert fragment. Reads less than 20bp after trimming were discarded, and the others were matched as pairs with the forward read and reverse reads. 41 Sequencing reads were mapped onto the reference genome of B. thetaiotaomicron using Bowtie2 (235). Default parameters were used for building the Bowtie2 index using the B. thetaiotaomicron chromosome and plasmid sequences, and for aligning reads to the reference sequence. SAM files generated from Bowtie2 alignment were converted to indexed and sorted BAM files using SAMtools(237). In SAMtools, ‘mpileup’ with parameter ‘-B’ was used to obtain the depth of coverage of the reference genome. Across all samples, the mean of the mapped bases to the B. thetaiotaomicron genome was 1.17 x 109, with a minimum of 4.31 x 108 and maximum of 2.49 x 109 bases per sample. 42 Mouse 1 Timepoint (day) Paired raw reads 1.5 6863895 Paired trimmed reads 6859853 1 1 1.75 2.5 7759215 7365279 7751437 7362905 1 3 5818449 5818061 1 4 5531304 5530269 1 7 5662825 5662593 1 10 6698408 6697569 1 1 14 21 4723483 6687984 4723238 6687697 1 28 5839481 5839205 2 0.5 7228693 7225763 2 1.5 8918192 8916633 2 1.75 5888569 5887816 2 2 2.5 3 8785555 4342612 8784833 4342118 2 4 6837676 6835603 2 7 3967877 3967384 2 10 8601605 8601131 2 14 5849784 5849571 2 2 21 28 4556363 7655382 4556235 7655090 3 0.5 5010698 5005943 3 1.5 6155822 6155291 3 1.75 12620630 12619054 3 2.5 4684094 4683664 3 3 6101414 6100076 3 4 5942854 5942213 3 7 4172374 4171934 3 10 4801895 4801689 3 14 2164395 4812691 3 21 4812865 7188598 3 4 28 0.5 7188952 4533216 2164338 4526163 4 1.5 4334677 4331837 4 1.75 7468959 7466817 4 2.5 4985966 4984355 4 3 5103796 5102782 4 4 4 7 9074708 6749347 9073032 6748350 4 10 5542079 5541171 4 14 4411316 4410997 4 21 7574141 7573540 4 28 6262455 6262057 5 5 0.5 1.5 7060331 5588814 7051476 5583481 5 1.75 4776273 4771776 5 2.5 6594126 6590706 5 3 5617337 5616357 5 4 6090548 6089696 5 5 7 10 8025711 4246859 8024678 4246609 5 14 5401171 5400825 5 21 4771095 4770757 5 28 3923280 3922897 input library replicate 1 7305414 7303389 input library replicate 2 3852825 3849575 Table 2-3 Summary of sequencing metrics for in vivo experiments. Of the 56 samples sequenced, two were the input library, 44 were from four mice with 11 time-point stool collections, and 10 were from one mouse with a 10 timepoint collection. Paired-end reads of 101 nt length were generated on the HiSeq instrument. 43 2.3.10 Statistical analyses of in vivo selection data Analyses were performed using custom functions written in Matlab (MathWorks, Natick, MA). The effective positional diversity (EPD), a genome-wide measure of the diversity of library representation, was calculated using the formula: Here, rti represents the fraction of reads at time t mapping to nucleotide i in a reference sequence totaling P nucelotides (e.g., the Bt genome). The time-averaged relative abundance (TA-RA), a gene-level measure of library selection, was calculated using the formula: Here, t1 and t2 denote the bounds of the time-interval of interest, and fg represents a continuous-time function for gene g. The function fg was estimated as follows. We fit a cubic smoothing spline, using the Matlab function csaps, applied to the log fold change in Fragments Per Kilobase per Million mapped reads (FPKM) for gene g at each time-point t (i.e., the FPKM value at time-point t divided by the FPKM value for the gene in the starting library). FPKM values were generated using Cufflinks (236) with parameter --max-bundle-frags 40000000. The smoothing spline was used to account for non-uniform temporal sampling and noise in the data. The time-averaged normalized effective coverage (TA-NEC), a gene-level measure of coverage, was calculated using the formula: Here, lg denotes the length of gene g, and hg represents a continuous-time function for gene g. The function hg was estimated as follows. We fit a cubic smoothing spline, using the Matlab function csaps, applied to the effective coverage, EC(g,t) for the gene at each time-point: 44 Here, sg denotes the start of the gene. To detect genes with significantly higher than expected selection, we performed a onesided t-test on Box-Cox transformed TA-RA and TA-NEC values, and corrected for multiple hypothesis testing using the Matlab function mafdr. To estimate the relevant null hypotheses for the t-tests, while taking into account possible biases due to differential representation of genes in the input library, we used a robust regression algorithm (Matlab function robustfit) in which the input library value served as the independent variable, and the TA-RA or TA-NEC value served as the dependent variable. 2.3.11 Whole genome sequencing of isolated clones from in vivo selection Whole genome sequencing of E. coli recipient isolates from seven Day 7 clones, six Day 28 clones, and two Day 28 luciferase control clones was performed on the MiSeq (Illumina) instrument after Nextera (Illumina) sequencing library preparation at the Molecular Biology Core Facilities of the Dana-Farber Cancer Institute. Metrics for this sequencing run are provided in Table 2-4. The raw data was processed with Millstone (http://churchlab.github.io/millstone), which combines BWA alignment, GATK for BAM realignment and cleanup, and SnpEff for variant effect prediction. Reads were aligned to E. coli K-12 DH10B as well as MG1655 to identify any variants not in common with the starting library strain NEB Turbo. The average genome coverage of each sequenced strain ranged from 20 to 140X. Alignments were also performed against the F plasmid (which is present in the starting recipient strain) and a library plasmid with the expected insert (as confirmed by Sanger-sequencing of individual clones). 45 Sample Paired raw reads NEB Turbo control 357506 Day 7 Mouse 1 clone 3 376194 Day 7 Mouse 1 clone 1 785257 Day 7 Mouse 2 clone 5 1291948 Day 7 Mouse 3 clone 1 1257935 Day 7 Mouse 4 clone 4 1133157 Day 7 Mouse 5 clone 2 1049238 Day 7 Mouse 5 clone 4 170610 Day 28 Mouse 1 clone 1 1112924 Day 28 Mouse 2 clone 1 574731 Day 28 Mouse 3 clone 1 543707 Day 28 Mouse 4 clone 1 618713 Day 28 Mouse 5 clone 1 169684 Day 28 Mouse 5 clone 4 687173 Day 28 Mouse 7 clone 1 lux control 1194434 Day 28 Mouse 10 clone 2 lux control 931238 Table 2-4 Summary of metrics for whole genome sequencing of E. coli strains. Paired-end reads of 300 nt length were generated on the MiSeq instrument. 2.3.12 Growth assays Cells were pre-conditioned by growth in minimal media (M9) supplemented with 0.2% glucose. Then, 1 μL of the culture was inoculated into a final volume of 200 μL of M9 supplemented with 0.2%, unless otherwise noted, of a sole carbon source, such as glucose, lactose, galactose, or sucrose. When needed, MacConkey base agar with a final concentration of 1% lactose or galactose was also used to characterize lactose or galactose utilization. 46 2.4 Results 2.4.1 Library construction and characterization A 2.2 kb E. coli expression vector, GMV1c, was constructed to include the strong constitutive promoter pL and a ribosomal binding site upstream of the cloning site for input DNA fragments (Figure 2-1). We cloned in 2-5 kb fragments of donor genomic DNA from Bt, and generated a library of ~100,000 members, corresponding to >50X coverage of the donor genome. We sequenced the library on the Illumina HiSeq instrument to confirm sufficient coverage of the Bt genome (Figure 2-3 and Figure 2-4A). The distribution of member insert sizes in the input library was verified to be centered around 2-3 kb (Figure 2-4B), a size range allowing for the full-length representation of almost all Bt genes. Figure 2-3 Technical reproducibility of library sequencing protocol. The input library was prepared in duplicate for deep sequencing using our double digestion and PCR strategy. The coverage for each gene in replicate 1 is plotted against that of replicate 2. 47 Figure 2-4 Input library characterization. (A) Even coverage of the Bacteroides thetaiotaomicron genome. The blue and purple lines represent per-base coverage values for the chromosome and native B. thetaiotaomicron p5482 plasmid, respectively. The histogram (top right) shows the distribution of genes by their coverage (normalized to gene length). (B) Insert size distribution of library. (C) Plasmid retention calculated by comparing number of colonies on LB vs. LB+carbenicillin plates from in vitro passaging experiments in aerobic LB or anaerobic mouse chow (MC) filtrate. 48 2.4.2 In vitro stability and selection by media condition To determine vector stability in vitro, we performed serial batch passaging of cells carrying GMV1c every one to two days over two weeks in two media conditions: aerobic Luria broth (LB) and anaerobic mouse-chow filtrate (MC). We expected the MC medium and anaerobic conditions to better reflect aspects of the nutritional content and oxygenation status in the mouse gut than the rich LB medium in aerobic conditions. In both conditions, the vector was maintained in over 80% of library members without antibiotic selection throughout two weeks of in vitro passaging (~70 generations) (Figure 2-4C), suggesting general stability of the medium copy vector (~40 copies per cell). Clones harboring the empty vector (i.e., plasmid with no Bt insert) were the most fit library member: in both LB and MC conditions, these clones initially constituted 70% of the library and increased to 90% by the end of two weeks (Figure 2-5), albeit at a slower rate in anaerobic MC (Figure 2-6). Figure 2-5 Insert distribution over time in in vitro selection. Distribution of inserts in the initial library and at various time points in the (A) in vitro and (B) in vivo experiments. Multiple colonies were picked from each mouse and the total insert sizes were tabulated for each time point. 49 Figure 2-6 Insert distribution over time in in vivo selection. Distribution of inserts in the initial library and at various time points in the (A) in vitro and (B) in vivo experiments. Multiple colonies were picked from each mouse and the total insert sizes were tabulated for each time point. To identify Bt genes with differential in vitro selection in LB and MC conditions relative to the input library, we isolated DNA from Day 0 and Day 6 or 7 cultures, amplified the inserts by PCR for deep sequencing on the Illumina MiSeq platform and used computational methods to determine donor genes that were differentially enriched or depleted. In each condition, we found a number of significantly enriched Bt genes (Table 2-5). At Day 7 in aerobic LB, enriched genes included metabolic enzymes, such as chitobiase (BT_0865), which degrades chitin, and stress response proteins, such as glycine betaine/L-proline transport system permease (BT_1750), which is involved in the import of osmoprotectants glycine betaine or proline that mitigate effects of high osmolarity (238). At Day 6 in anaerobic MC, a different set of genes was significantly enriched, particularly the locus consisting of endo-1,4-beta-xylanase (BT_0369), galactokinase (BT_0370), glucose/galactose transporter (BT_0371), and aldose 1-epimerase (BT_0372). These results highlight that our functional metagenomics approach is able to enrich for likely bioactive donor genes that improve fitness of the recipient cells in in vitro passaging conditions. Enolase (BT_4572), the only common hit among annotated genes in both media conditions, was found to be depleted relative to the input library. This enzyme catalyzes the penultimate step of glycolysis, and its overexpression may be toxic in E. coli (239). 50 Gene Gene product log2(fold change) q value Enrichment at Day 6 in anaerobic MC passaging BT_0370 galactokinase 3.51 8.30E-05 BT_0372 aldose 1-epimerase 3.21 8.30E-05 BT_0371 glucose/galactose transporter 3.59 1.53E-04 BT_0478 hypothetical protein 3.59 3.70E-04 BT_0369 endo-1,4-beta-xylanase D 2.51 2.63E-03 Enrichment at Day 7 in aerobic LB passaging BT_1750 glycine betaine/L-proline transport system permease 9.18 0.00E+00 BT_2055 biopolymer transport protein 3.64 3.03E-05 BT_4358 hypothetical protein 2.84 6.31E-03 BT_1922 N-acetylmuramoyl-L-alanine amidase 2.62 7.24E-03 BT_0659 hypothetical protein 2.59 7.24E-03 BT_4333 hypothetical protein 2.57 7.84E-03 BT_2054 hypothetical protein 2.90 8.97E-03 BT_0757 beta-galactosidase 3.00 1.44E-02 BT_0660 hypothetical protein 2.36 1.48E-02 BT_2732 hypothetical protein 2.48 1.48E-02 BT_2843 integrase 3.34 1.51E-02 BT_3927 hypothetical protein 2.43 1.56E-02 BT_3612 FKBP-type peptidylprolyl isomerase 2.23 2.36E-02 BT_3821 5,10-methylenetetrahydrofolate reductase 2.31 2.36E-02 BT_0865 chitobiase 2.39 2.36E-02 BT_0973 hypothetical protein 2.21 2.36E-02 BT_0676 N-acetylglucosamine-6-phosphate deacetylase 2.17 2.59E-02 BT_2408 LuxR family transcriptional regulator 2.22 2.59E-02 BT_1038 hypothetical protein 2.24 2.59E-02 BT_3985 hypothetical protein 2.16 2.59E-02 BT_2917 hypothetical protein 2.16 2.80E-02 51 BT_0972 oxidoreductase 2.13 3.17E-02 BT_1923 O-acetylhomoserine (thiol)-lyase 2.43 3.35E-02 BT_1006 nitroreductase 2.10 3.64E-02 BT_1004 hypothetical protein 2.11 3.94E-02 BT_4544 transposase 1.99 4.64E-02 BT_2379 hypothetical protein 5.81 4.64E-02 BT_0974 hypothetical protein 2.08 4.64E-02 BT_0011 hypothetical protein 2.00 4.64E-02 BT_0510 heme biosynthesis protein 2.22 4.64E-02 Depletion at Day 6 in anaerobic MC passaging BT_1771 cell surface protein -3.25 3.70E-04 BT_4572 phosphopyruvate hydratase (enolase) -3.48 1.32E-03 BT_2959 hypothetical protein -2.97 5.18E-03 BT_3089 hypothetical protein -2.73 7.92E-03 BT_3528 hypothetical protein -2.95 1.69E-02 BT_2051 hypothetical protein -4.21 1.69E-02 BT_3577 hypothetical protein -2.16 2.01E-02 Depletion at Day 7 in aerobic LB passaging BT_4572 phosphopyruvate hydratase (enolase) -3.17 2.78E-03 BT_1538 hemagglutinin -3.46 7.24E-03 BT_3395 acetylglutamate kinase -2.42 1.48E-02 BT_1817 RNA polymerase ECF-type sigma factor -2.19 2.36E-02 BT_1818 hypothetical protein -2.54 2.36E-02 BT_2961 hypothetical protein -2.32 2.59E-02 BT_4571 RNA polymerase ECF-type sigma factor -3.91 3.31E-02 BT_2959 hypothetical protein -2.18 3.45E-02 Table 2-5 Bt genes significantly enriched or depleted at Day 6 or 7 in vitro. Statistically significant genes (q < 0.05) enriched (blue) or depleted (red) at Day 6 or 7 relative to Day 0 in the in vitro passaging experiment are listed for the anaerobic mouse chow (MC) and aerobic Luria broth (LB) conditions. 52 2.4.3 In vivo library selection in germ-free mice To investigate in vivo gene selection in our library, we inoculated two cohorts of C57BL/6 male 6-8 week old germ-free mice (n=5 per group) and maintained the mice for 28 days under gnotobiotic conditions. One cohort was colonized with our library; the other cohort with a control GMV1c vector carrying the 5.9 kb luciferase operon (luxCDABE from Photorhabdus luminescens, Winson et al, 1998). Fecal pellets were collected on days 0.5, 0.75, 1.5, 1.75, 2.5, 3, 4, 7, 10, 14, 21, 25, and 28 after inoculation. To determine in vivo vector stability, we plated fecal pellets on LB, on which E. coli either with or without vectors would grow, and on LB+carbenicillin, selective for E. coli harboring our vectors. Strains carrying the luciferase vector dropped by ~100,000-fold by Day 28 compared to the earliest plated time-point (18 hours), presumably due to negative selective pressures from the energy consumption of the vector-borne luciferase in E. coli (Figure 2-7A). In contrast, our library was well-maintained in vivo throughout the 28 days of the experiment, suggesting at least minimal fitness cost to maintain the Bt insert library. Furthermore, unlike in the in vitro experiment, where clones containing the empty vector were enriched over time, these clones were virtually absent by the end of the in vivo experiments (Figure 2-6), suggesting positive selection had taken place. Figure 2-7 In vivo selection experiments. (A) Plasmid retention calculated by comparing number of colonies on LB vs. LB+carbenicillin plates from mouse fecal samples. n = 5 mice; error bars = standard deviation (B) Effective positional coverage across the entire Bt genome for each mouse, begins with essentially even coverage of the Bt genome of ~6 Mb, but drops rapidly over the experimental time-course, representative of selection at specific loci. 53 2.4.4 Characterization of in vivo library population dynamics To characterize the entire in vivo selected library over time, we extracted DNA from collected stool samples, PCR amplified the donor inserts, prepared sequencing libraries of the amplicons, sequenced libraries on the Illumina HiSeq 2500 instrument, and used computational techniques to detect selected genes in the donor genome that were uniformly covered over time by more than the expected background number of sequencing reads. Each sample resulted in ~7 million 101 nt paired-end reads (Table 2-3) that were mapped back to the donor genome (Figure 2-2). We also employed Sanger-sequencing of vectors from clones directly isolated from stool samples to confirm deep-sequencing results and obtain insights into the structure of full-length inserts. To obtain a genome-wide view of library selection over time and across the different mice, we calculated an information theoretic measure, termed effective positional diversity, similar to that commonly used to quantify population diversity in macroscopic and microscopic ecology studies (241, 242) (Figure 2-7B). This measure, equal to the exponentiated Shannon entropy over all positions in the Bt genome, reflects how many positions in the donor genome are evenly represented in the population. Effective positional diversity values of the initial library were ~6 Mb, indicating essentially even coverage of the entire Bt genome. From Day 1.75 to Day 7 and continuing until the end of the experiment at Day 28, there was a rapid decline in effective positional diversity, which signifies expansion in the population of clones harboring inserts at a limited number of Bt genomic loci. To explore the kinetics of gene selection in vivo, we plotted the percentage of sequencing reads mapped to genes in the Bt genome over time, and examined genes constituting >0.2% of total reads. As noted, prior to inoculation, the read coverage was even over the entire Bt genome and corresponded to <0.2% per gene. A visualization of Bt gene selection for each mouse is shown in Figure 2-8. By 36 hours post-inoculation, five genes, alpha-L-arabinofuranosidase, endo-1,4-beta-xylanase, galactokinase, glucose/galactose transporter, and aldose 1-epimerase (BT_0368 to BT_0372), comprised over half of the reads mapped. At Day 2.5, glucose/galactose transporter (BT_1758) and glycoside hydrolase (BT_1759) became noticeable and continued to increase until they saturated all reads at Day 14. Then, fructokinase (BT_1757) emerged and stabilized at around 6% of the reads throughout the remaining two weeks of the experiment. These observations are generally consistent across all five mice, though the selection kinetics varied slightly (Figure 2-8). For example, the transition from galactokinase and glucose/galactose transporter (BT_0370 and BT_0371) to glycoside hydrolase (BT_1759) occurred four days earlier in Mouse 5 than in Mouse 2, and the emergence of fructokinase (BT_1757) was detectable only in Mice 2, 4, and 5. 54 Figure 2-8 Distribution of mapped bases to each Bt gene by mouse. For each mouse and time point, ~109 sequenced bases were mapped to the B. thetaiotaomicron genome. Of those mapped bases, the percentage mapping to each gene is shown. Genes with < 0.2% are grouped together (dark gray bars). Specific genes >= 0.2% that were present in clones in one mouse but not in the others are indicated in smaller font and colored differently. 55 In terms of functional groups rather than individual genes, of the 51.4% Bt genes with COG annotations, those related to carbohydrate transport and metabolism comprised 10% of the input library. Averaged across the five mice, these carbohydrate transport and metabolism genes increased to 25% of reads on Day 0.5, 72% on Day 1.5, and essentially 100% by Day 7 (Figure 2-9), suggesting the importance of carbohydrate transport and metabolism in in vivo fitness. Figure 2-9 COG functional categories of bases mapped to the entire Bt genome averaged across the five mice. 56 To rigorously determine the Bt genes that are differentially represented in the population over time and to localize putatively selected regions to specific genes, we applied information theoretic and statistical techniques for longitudinal analysis (243). In our analyses, transient dominance of clones in vivo is of particular interest as different genes may confer fitness advantages at distinct stages of host colonization. Further, our experiments capture competition among ~100,000 strains harboring distinct genetic fragments, rather than traditional binary competition experiments. Thus, we are interested in not only clones harboring Bt fragments that show an increase over time in relative abundance, but also those clones that show a significantly slower rate of depletion than other clones. To methodically detect these effects, for every Bt gene, we computed two measures: (1) time-averaged relative abundance (TA-RA), and (2) timeaveraged normalized effective coverage (TA-NEC). The TA-RA value is conceptually similar to a time-integrated pharmacological dose value (244); in our analysis, it represents the average “dose” of a particular donor gene, relative to all other donor genes present in vivo over a period of time. The TA-NEC value quantifies the fraction of the gene that is effectively covered by reads over a period of time. These measures are important to evaluate in tandem, since bystander genetic loci may be differentially abundant in clones (i.e., high TA-RA values) simply because they are contiguous with genes under selection; however, these loci are likely to be detectable as spurious (i.e., low TA-NEC values) because they will often include only fragments of genes. 2.4.5 Genes showing transient selection during early gut colonization We found 13 Bt genes during the early stage of gut colonization (up to Day 4) with significantly larger than expected TA-RA and TA-NEC values (q-values < 0.05; Table 2-6). These genes include those coding for enzymes involved in synthesis of extracellular capsular polysaccharides and lipopolysaccharides (LPS), specifically D-glycero-alpha-D-manno-heptose1,7-bisphosphate 7-phosphatase (gmhB) (BT_0477) and dTDP-4-dehydrorhamnose reductase (rfbD; rmlD) (BT_1730). There are two biosynthesis pathways of nucleotide-activated glyceromanno-heptose that result in either L-β-D-heptose or D-α-D-heptose, which serve as precursors or subunits in LPS, S-layer glycoproteins, and capsular polysaccharides (245). The E. coli GmhB is critical for complete synthesis of the LPS core (246). The selection for Bt gmhB could allow E. coli to expand its extracellular glycoprotein display, since E. coli GmhB is highly selective for βanomers while Bt GmhB prefers α-anomers during hydrolysis of D-glycero-D-manno-heptose 1β,7-bisphosphate (247). BT_1730 (rfbD; rmlD) is involved in dTDP-rhamnose biosynthesis involved in production of O-antigen, a repetitive glycan polymer in LPS, and potentially other cell-membrane components. Deletion of rmlD in Vibrio cholerae results in a severe defect in colonization of an infant mouse model (248), and uropathogenic E. coli lacking functional RmlD lose serum resistance (249). Thus, expressing Bt rmlD could allow the recipient E. coli to alter its antigenicity or resistance to host factors that would impede its initial colonization of the mammalian gut. 57 TA-RA q-value TA-NEC q-value 3.06E-02 4.08E-04 1.14E-03 (3.25E-03) 1.14E-03 (3.14E-03) 5.94E-06 (6.95E-09) 3.50E-02 (4.21E-05) 1.67E-02 1.32E-02 BT_0478 hypothetical protein 1.77E-03 2.47E-02 BT_1510 hypothetical protein 3.86E-02 1.10E-03 BT_1511 outer membrane protein OmpA 4.38E-02 7.33E-04 3.86E-02 3.45E-04 BT_1731 hypothetical protein 4.38E-02 8.80E-03 BT_1757 fructokinase 1.58E-02 2.50E-03 BT_1759 glycoside hydrolase 1.19E-02 (2.48E-07) 2.58E-04 (1.21E-09) BT_1771 cell surface protein 4.38E-02 4.00E-03 BT_4265 GMP synthase (guaA) 3.86E-02 3.33E-02 Gene Annotation BT_0297 outer membrane lipoprotein SilC BT_0370 galactokinase BT_0371 glucose/galactose transporter BT_0477 BT_1730 D-glycero-alpha-D-manno-heptose-1,7bisphosphate 7-phosphatase (gmhB) dTDP-4-dehydrorhamnose reductase (rfbD; rmlD) Table 2-6 Statistical testing of in vivo selection of Bt genes. Genes demonstrating significant in vivo selection profiles were determined via statistical testing of time-averaged relative abundance (TA-RA) and time-averaged normalized effective coverage (TA-NEC) values up to either Day 4 or Day 28 of host colonization. Genes showing significant selection up to Day 4 are in white. Genes showing significant selection up to both Day 4 and Day 28 are highlighted in orange. q-values for Day 28 are listed in parentheses. 58 Several other genes with membrane-associated functions also showed increased selection at Day 4, including outer membrane lipoprotein SilC (BT_0297), cell surface protein (BT_1771), and outer membrane protein OmpA (BT_1511). These genes could confer increased capabilities for E. coli to attach to the mucosal surface of the mammalian GI tract, or increased adaptations to the gut chemical environment. For instance, Bacteroides fragilis lacking OmpA are more sensitive to SDS, high salt, and oxygen exposure (250). In Bacteroides vulgatus, OmpA additionally plays a role in intestinal adherence (251), and in Klebsiella pneumoniae, activates macrophages (252). Since nucleotide pools are tightly controlled in E. coli (253), the selection for Bt GMP synthase guaA (BT_4265) may substantially affect intracellular guanine concentration, translation regulation, and cell signaling. Inhibiting GMP synthase induces stationary phase genes in Bacillus subtilis (254), and nucleotide concentrations drop when E. coli transition from growth to stationary phase (255). These observations suggest that a copy of heterologous guaA could enable escape of native tight regulation of the guanine pool to prolong the cell’s exponential growth phase. Moreover, extra GMP synthase may further protect E. coli from incorporating mutagenic deaminated nucleobases that would interfere with RNA function and gene expression (256). 2.4.6 Genes showing long-term selection during gut colonization We found three Bt genes over the entire period of colonization (up to Day 28) with significantly larger than expected TA-RA and TA-NEC values (q-values < 0.05; Table 1); these genes also showed significant selection during early colonization (up to Day 4). All three genes are involved in sugar metabolism and transport, suggesting they may act to unlock more nutrient resources for E. coli in the gut. We performed in vitro experiments, described below, to further characterize the functions of these strongly selected loci, centered around a Bt glycoside hydrolase (BT_1759) and galactokinase (BT_0370). 59 2.4.6.1 Glycoside hydrolase (BT_1759) From Day 1.5 to Day 3 in the high-throughput sequencing data, we observed sharply positive selection of glycoside hydrolase (BT_1759), which stabilized and continued to be strongly selected for from Day 4 to Day 28 across all mice (Figure 2-10). We confirmed these results with Sanger sequencing, which additionally allowed us to identify exact junctions and directionality of isolated inserts. In clones from Days 7, 14, and 28, we observed the primary selected insert to be 2.5 kb in length, beginning four nucleotides after the annotated glycoside hydrolase (BT_1759) start codon, and ending about one-third of the way into the downstream gene (glucose/galactose transporter). Notably, we also detected other inserts containing different 5’ truncated versions of the glycoside hydrolase in the late time points, both in our highthroughput and Sanger sequencing data (Figure 2-11). Figure 2-10 BT_1759 glycoside hydrolase selection kinetics. Graphs show Fragments Per Kilobase Mapped (FPKM) fold change and normalized effective coverage of genes BT_1757, BT_1758, and BT_1759. m1-5 = Mouse 1-5. 60 Figure 2-11 BT_1759 glycoside hydrolase read mapping profile. The chart shows reads to each base in the region with deep sequencing and Sanger sequencing of isolated clones (below, length of inserts are to scale to the gene map). Read values are the mean across five mice and were normalized to 1 billion mapped bases per run to compare across time points. Sanger sequencing was performed on ten clones per mouse at Day 7 and eight clones per mouse at Day 28. nt = nucleotides. 61 Sonnenburg et al. previously demonstrated that periplasmic BT_1759 in Bt hydrolyzes smaller fructooligosaccharides and sucrose (257). To functionally characterize BT_1759 and surrounding genes when heterologously expressed in E. coli, we cloned the CDS of each into the backbone vector and transformed it into the starting E. coli strain. None of the full-length genes conferred growth in M9 minimal media with sucrose as the sole carbon source (Figure 2-12). However, clones isolated from mice on Day 28 were able to metabolize sucrose. Furthermore, retransformation of the DNA vectors from these clones into the starting E. coli strain also conferred growth on sucrose, indicating that the phenotype was plasmid-borne. Interestingly, sucrose utilization was enabled when we reconstituted the 4 nt truncation found in many of the Day 7 and Day 28 Sanger-sequenced clones into the starting E. coli strain. These results suggest that the truncation allows for appropriate processing of the signal sequence to express and localize the Bt enzyme in the periplasmic space of E. coli, where sucrose is capable of entering by diffusion. Figure 2-12 BT_1759 glycoside hydrolase functional characterization in sucrose media. Three sets of strains were studied in minimal media with sucrose as the sole carbon source: 1) starting E. coli strains transformed with the CDS of each gene cloned into the backbone vector, 2) E. coli clones directly isolated from stool samples, and 3) starting E. coli strains re-transformed with individual plasmids isolated from stool samples. All clones isolated at Day 28 carried the BT_1759 locus. Lines represent the mean. 62 2.4.6.2 Galactokinase (BT_0370), glucose/galactose transporter (BT_0371), and native galactokinase reversion In contrast to the selection profile of glycoside hydrolase (BT_1759), the galactokinase (BT_0370) and glucose/galactose transporter (BT_0371) exhibited an earlier increase in abundance that peaked at Day 2.5 and gradually declined over the remainder of the experiment (Figure 2-13A). We observed a similar trend in Day 7 clones by Sanger sequencing, with no clones containing BT_0370 or BT_0371 present at Day 28 (Figure 2-13B). We confirmed that individually cloned BT_0370 and BT_0371 genes confer galactose utilization in the starting E. coli strain when grown using M9 minimal media supplemented with 0.5% galactose as the sole carbon source (Figure 2-13C). To our surprise, E. coli isolated from mouse stool at later time points were able to grow on galactose even though they carried plasmids with glycoside hydrolase (BT_1759), and not the Bt galactose utilization genes (BT_0370 and BT_0371). However, strains retransformed with BT_1759 were unable to grow on galactose, suggesting that the stool-isolated strains gained the capability to use galactose through mutations independent of the expression plasmid, namely in the recipient E. coli genome. After confirmation that our starting E. coli strain was galK- due to the presence of an insertion sequence (IS2), we hypothesized that stool isolates reverted to galK+ via loss of IS2. In stoolisolated clones from Days 7, 14, and 28, we found that the galK reversion occurred after Day 7 and was found in >75% of clones in four of five mice at Day 14 (Figure 2-13D). Interestingly, E. coli harboring the insert library exhibited accelerated galK reversion in the mouse gut; in the luciferase control mice, there was an overall reversion rate of only 50% by Day 28, as opposed to 100% in the mice that had been inoculated with the Bt library. The genomic galK reversion by ~Day 14 suggests that there is early selection for Bt galactokinase (BT_0370), but this foreign gene is subsequently lost as the recipient E. coli regain native galactokinase activity, which seems to have a fitness advantage over the heterologously expressed Bt galactokinase gene. 63 Figure 2-13 BT_0370 galactokinase and BT_371 glucose/galactose transporter. (A) Selection kinetics by Fragments Per Kilobase Mapped (FPKM) fold change and normalized effective coverage of genes BT_0368, BT_0369, BT_0370, BT_0371, and BT_0372. (B) Mapped reads to each base in the region with deep sequencing and Sanger sequencing of isolated clones (below). Read values are the mean across five mice and were normalized to 1 billion mapped bases per run to compare across time points. Isolation of individual clones allowed for insert size profiling at Day 7. Screened isolates from Day 28 did not reveal any galactokinase inserts. (C) Functional characterization in minimal media with galactose as the sole carbon source. Three sets of strains were studied: 1) starting E. coli strains transformed with the CDS of each gene cloned into the backbone vector, 2) E. coli clones directly isolated from stool samples, and 3) starting E. coli strains re-transformed with individual plasmids isolated from stool samples. All clones isolated at Day 28 carried the BT_1759 locus. Lines represent the mean. (D) Genotyping of the background E. coli genome at the galK locus. 20 clones isolated from mice inoculated with the library at each of the indicated time-points were screened , while 30 clones isolated from the lux control mice at Day 28 were screened. 64 2.4.7 In vivo genomic stability of E. coli recipient strain Given the observed genomic galK reversion, we investigated whether other changes occurred in the E. coli genome over the course of our in vivo experiments. Genomic stability of bacterial cells in the gastrointestinal tract in vivo has not been characterized in great detail and microbial mutation rates in vivo are also not well-characterized. We performed whole genome sequencing of 13 E. coli isolates from stool of mice inoculated with the library and two E. coli isolates from stool of mice inoculated with the control luciferase construct. Of the isolates from mice that were inoculated with the library, seven were from Day 7 samples containing either BT_1759 or BT_0370 inserts and six were from Day 28 samples containing BT_1759 inserts. In addition to searching for variants in the E. coli genome, we also looked for variants on the library plasmid with the known insert locus, and the F plasmid, which was present in the starting E. coli strain. Overall, we found single nucleotide variants (SNVs) in only three of the 15 isolates (Table 2-7). One of the isolates from a luciferase control mouse harbored three mutations. One SNV was in the coding sequence of adenylate cyclase cyaA, while the other two SNVs were in intergenic regions, between tRNAs lysW and valZ, and between traJ and traY on the F plasmid. The functional effects of these SNVs, if any, are unclear. The operon structure of the tRNA region may be lysT-valT-lysW-valZ-lysYZQ (258), or valZ-lysY could be a separate operon as predicted in EcoCyc (259), in which case the SNV could affect transcription of the downstream tRNAs. As for the traY promoter variant, the -35 hexamer has been documented to be TTTACC (260). The SNV T>C changes it to CTTACC, which could weaken the promoter to decrease expression of TraY, a DNA-binding protein involved in initiation of DNA transfer during conjugation. One E. coli isolate from a library-inoculated mouse had a genomic change that conferred increased growth on galactose. Isolate 1 from Mouse 1 on Day 28 had a mutation in the lactose / melibiose:H+ symporter, lacY (F27S), which is a missense mutation in the first transmembrane region (261). We did not observe phenotypic differences on MacConkey-lactose plates, since the E. coli recipient strain has a deletion in lacZ, and thus all of our isolates were Lac-. However, the lacY (F27S) mutant reached a higher density in M9 galactose compared to other Day 28 isolates, which also carried the same plasmid-borne Bt glycoside hydrolase (Figure 2-14A). This clone also grew to a greater density than E. coli recipient strains in which we had cloned the Bt galactokinase operon (BT_0370-BT_0372) (Figure 2-14B). The lacY transporter can transport galactose in addition to lactose, and lacY mutants have been shown previously to confer faster growth of E. coli MG1655 on galactose (262). 65 Sample NEB Turbo control Day 7 Mouse 1 clone 3 Day 7 Mouse 1 clone 1 Day 7 Mouse 2 clone 5 Day 7 Mouse 3 clone 1 Day 7 Mouse 4 clone 4 Day 7 Mouse 5 clone 2 Day 7 Mouse 5 clone 4 Day 28 Mouse 1 clone 1 Day 28 Mouse 2 clone 1 Day 28 Mouse 3 clone 1 Day 28 Mouse 4 clone 1 66 Insert locus & size (kb) Genomic galK+/- - galK- BT_0370 (4.0) galK- BT_1759 (2.5) galK- BT_0370 (4.3) galK- BT_1759 (3.1) galK- BT_0370 (4.1) galK- BT_1759 (2.5) galK+ BT_1759 (3.1) galK- BT_1759 (2.5) galK+ BT_1759 (2.5) galK+ BT_1759 (2.5) galK+ BT_1759 (2.5) galK+ Variant position on E. coli genome Variant impact & coverage SNV 2976657 G>T galR (R20L) [34/34 reads] SNV 363100 A>G lacY (F27S) [173/173 reads] Day 28 Mouse 5 clone 1 Day 28 Mouse 5 clone 4 Day 28 Mouse 7 clone 1 lux control BT_1759 (2.5) galK+ BT_1759 (4.2) galK+ - galK+ 1. SNV 3991675 G>A Day 28 Mouse 10 clone 2 lux control - galK- 1. cyaA (G175S) [122/122 reads] 2. SNV 780994 G>A 2. intergenic, between lysW and valZ [108/108 reads] 3. F plasmid SNV 67772 3. traY promoter (-35) T>C [229/229 reads] Table 2-7 Genetic variants in mouse-isolated clones identified by whole genome sequencing. Remarkably, we found an interaction between the E. coli genome and a heterologously expressed Bt gene. Isolate 3 from Mouse 1 on Day 7 had an SNV in the galactose repressor, galR (R20L), in its DNA binding domain (263). E. coli GalR binds operator sequences upstream of the galETK operon (264), and the amino acid substitution of arginine for leucine could be disruptive to binding. Using MacConkey-galactose plates, we found that the galR (R20L) isolate was Gal+, whereas a similar Day 7 clone, which also had a genomic galK- genotype and a Bt galactokinase (BT_0370) insert but no galR SNV, exhibited a Gal- phenotype. Since the BT_0370 inserts in the Day 7 clones were not identical (Figure 2-14B), we re-transformed the plasmids into the starting E. coli strain to confirm the phenotype and rule out effects from an underlying chromosomal galR mutation. In M9 galactose medium, the galR (R20L) mutant grew to a higher cell density than a wild-type galR strain with the same Bt galactokinase plasmid (Figure 2-14C). These findings indicate that the E. coli genome had co-evolved with the in vivo selection of plasmids carrying Bt genes for galactose utilization. 67 We found no mutations in the library plasmids or Bt genes, and, aside from loss of the IS2 element in the galK gene, all other IS elements were intact on the E. coli genome. In aggregate, these small numbers of variants (~10-8 mutations per bp per day) in the E. coli recipient strain suggest that outside of genetic loci with selective pressures exerted upon them, the organism remained genetically stable in the mammalian gut over the course of our experiment. Figure 2-14 Growth characterization of clones with genomic SNVs. Growth curves over 42 hours at 37oC in M9 with 0.2% galactose and carbenicillin of (A) mouse-isolated clones from Day 28 and (B) BT_0369, BT_0370, BT_0371, BT_0372, and BT_0370-BT_0372 cloned into the starting recipient E. coli strain. The mean of four replicates is plotted in filled circles; error bars represent the standard deviation. (C) Endpoint optical density after 96 hours of growth. Two mouse-isolated strains with the BT_0370 insert were compared to isogenic strains transformed with those plasmids (4.0 or 4.3 kb insert). The strain with the galR SNV is shown in red. Lines represent the mean. 68 2.5 Discussion We have demonstrated the use of TFUMseq for high-throughput in vivo screening of genetic fragments from an entire donor genome from a commensal microbe to increase the fitness of a phylogenetically distant bacterial species in the mammalian gut. To our knowledge, this is the first demonstration of temporal functional metagenomics using shotgun libraries applied to the in vivo mammalian gut environment. Our findings attest to the value of a timeseries approach, as the shifts in population dynamics of clones harboring different gene fragments would not have been discovered if we had only obtained endpoint data. Further, we introduced computational methods using information theoretic measures and statistical longitudinal analysis techniques that allowed us to identify and localize significant selection of donor genes over time. In this demonstration of the TFUMseq approach using an E. coli plasmid library of Bt genes, we uncovered sequential selection of clones with different carbohydrate utilization genes – first for galactose and then for sucrose metabolism. Galactose plays a substantial role in selection in our experiment, as all three of the observed E. coli genomic mutations (in galK, lacY, and galR) affected galactose utilization, and we observed selection for Bt galactokinase (BT_0370) and glucose/galactose transporter (BT_0371) in vivo. Galactose is a component of the hemi-cellulose that makes up part of the 15.2% neutral detergent fiber in mouse chow, although galactose composition was not explicitly provided by the manufacturer. Galactose is also a component of mammalian mucin in the GI tract (265). However, our observation that in vitro selection occurs for the BT_0370 and BT_0371 galactose utilization locus in MC medium indicates that the mouse chow diet itself is providing sufficient galactose to exert selective pressure at least in part. During our in vivo colonization experiments, once E. coli restored native galactokinase (galK) activity in its genome through loss of IS2, Bt genes that catabolized a second carbon source, sucrose, became dominant. Sucrose is a dominant simple carbohydrate in mouse chow, present at 0.71% (w/w) in comparison to 0.22% for glucose and fructose. Per Freter’s nutrient-niche hypothesis, which described substrate-level competition and substratelimited population levels (266), our results suggest that galactose is preferred over sucrose, and that a clone capable of utilizing both carbon sources will outcompete clones capable of only using one of the sources. Nutrient-based niches have been documented in the mammalian GI tract, including the varying sugar preferences among commensal and pathogenic E. coli strains (267), and polysaccharide utilization loci (PULs) in Bacteroides species that promote long-term colonization (268). In fact, the enterohemorrhaghic E. coli strain EDL933 can use sucrose, while commensal E. coli strains K-12 MG1655, HS, and Nissle 1917 cannot (267). Incorporating sucrose utilization, such as through the truncated Bt glycoside hydrolase (BT_1759) identified in this study, could enhance retention of probiotic E. coli strains. Pre-colonization with sucroseutilizing probiotic strains to occupy the sucrose niche could also be an effective strategy to resist pathogen colonization. 69 Bt has been investigated previously using transposon mutagenesis systems coupled to mouse gut colonization experiments (104), facilitating comparison of our results to the prior study. Goodman et al. found no difference in abundances of galactokinase (BT_0370) mutants in vitro but BT_0370 mutants were underrepresented in vivo. In contrast, in our study, the Bt galactokinase was selected for not only in vivo, but also in vitro. Furthermore, Goodman et al. found dTDP-4-dehydrorhamnose reductase (BT_1730) and GMP synthase (BT_4265) mutants were underrepresented both in vitro and in vivo. However, in our study, BT_1730 and BT_4265 seemed to confer fitness only in vivo. The in vitro discrepancies may be a result of slightly different culturing and media conditions. The in vivo results are in agreement for BT_0370, BT_1730, and BT_4265, though the other genes we identified in our experiments were not significantly altered in representation in the transposon mutagenesis experiments, highlighting the different capabilities of the two approaches. Overall, we expect TFUMseq to be a powerful tool for engineering commensal microbes with new or enhanced capabilities, as it provides a general approach to functionally identifying genes from metagenomic DNA that enhance microbial fitness in vivo. Going forward, there are two primary considerations for designing future TFUMseq experiments: the choice of the bacterial strain to receive the donor plasmid library, and the mammalian host environment. In this study, we used a cloning strain of E. coli as the recipient bacteria, which enabled the generation of a robust, high-quality library. This strain has inactivated restriction systems, thus preventing underrepresentation of DNA inserts in the library that may contain otherwise recognized methylated sites from the donor source. Further, the lack of prior host-adaptation of this laboratory strain in vivo, in comparison to a wild-type adapted commensal strain, allows for stronger selection signals from clones harboring functional donor genes. As we saw, the recipient strain also plays a role in the co-evolution of the insert library and the bacterial genome. We observed a genomic change, specifically the galK reversion, driving the shift in library selection from Bt galactokinase (BT_0370) to Bt glycoside hydrolase (BT_1759). Furthermore, we found single nucleotide variations in E. coli galR and lacY loci that boosted galactose utilization in clones harboring functional Bt genes. Given that co-evolution drives genomic changes in the recipient strain, using a well-characterized recipient strain facilitates mechanistic interpretation of these changes. The state of the mammalian host is also a critical variable in our approach. In this work, germfree mice were mono-associated with the library. We expect that the results of in vivo selection may differ when mice are pre-colonized with a microbiota due to changes in nutrient availability and other ecological interactions, including competition or syntrophy. For instance, co-colonization experiments demonstrated that probiotic strains and commensal bacteria have adaptive substrate utilization. Bt shifts its metabolism from mucosal glycans to dietary plant polysaccharides when in the presence of Bifidobacterium animalis, Bifidobacterium longum, or 70 Lactobacillus casei (199). Bacteroides species are also known to engage in public-goods based syntrophy by releasing outer membrane vesicles (OMVs) that contain surface glycoside hydrolases or polysaccharide lyases (269). These enzymes catabolize large polysaccharides into smaller units, which can then be utilized by other species in the community. Given the complexities of multispecies bacterial communities, TFUMseq’s ability to track large numbers of clones over time will be important for detecting relevant genes that confer a fitness advantage within dynamically changing communities. Our results suggest several future studies using TFUMseq. Replication of our experiments in additional cohorts of mice would be valuable. In this study, mice were separately caged in the same gnotobiotic isolator, and we employed meticulous techniques to avoid crosscontamination. We did not observe evidence of isolates being exchanged between mice, and in fact, saw unique selection patterns for each mouse (Figure 2-8) and were able to isolate different clones carrying non-identical fragments from different mice. Nonetheless, future experiments in which our study was repeated in a different gnotobiotic isolator would be useful to characterize the variability of the entire process. Further, it would be of interest to understand the influence of host genetics and nutrition on the selection of genes in our library, which could be investigated by repeating our study using different strains of mice or placing mice on different diets such as high-fat/high-sugar chow. Also, potential investigations could use total metagenomic DNA from stool samples, rather than DNA from cultured organisms. Another area of interest would be probing community composition and dynamics of selection in different regions of the gut. These studies would provide insights into biogeographical niches coupled with temporal data provided by our method. TFUMseq could also be used to build a better probiotic strain. One could incorporate a metagenomic plasmid library into a probiotic strain and introduce the strain into a complex host-bacterial community to isolate genes that increase the strain’s fitness in vivo. We have already identified sucrose utilization as an important and feasible trait to incorporate into an enhanced probiotic strain. Ultimately, TFUMseq-based studies could enable the rational design of probiotic or commensal strains for various clinical applications, such as resisting pathogen colonization, compensating for a high-fat/high-sucrose Western diet, or tempering host autoimmunity. 2.6 Data Availability All sequencing data generated in this study are publicly available at NCBI SRA under accession number SRP051326. Detailed protocols and calculated effective gene coverage and FPKM values for each gene and mouse from the in vivo experiment are available online at http://msb.embopress.org/content/11/3/788 as Supplementary Materials and Datasets. 71 2.7 Acknowledgements This work was supported by grants from the Harvard Digestive Diseases Center (Pilot and Feasibility Grant to GKG, under grant P30DK034854), the National Institutes of Health Director’s Early Independence Award (grant 1DP5OD009172-01 to HHW), the US Department of Energy (grant DE-FG02-02ER63445 to GMC), and the Wyss Institute for Biologically Inspired Engineering. SJY also acknowledges support from the National Science Foundation Graduate Research Fellowship and the MIT Neurometrix Presidential Graduate Fellowship. GKG also acknowledges support from the Brigham and Women’s Department of Pathology. 72 Chapter 3 Delivering and maintaining genetic elements 3.1 Background This chapter explains our work in two areas: 1) introducing genetic elements into complex microbial systems, and 2) limiting the transfer of elements that pose a threat by immunizing native microbial strains. Our efforts to transfer genetic elements, to target specific species or particular genes, and to stably maintain the elements depend on bacterial conjugation and the prokaryotic adaptive immune system, CRISPR-Cas9. Since conjugation is thought to be most relevant in highly dense communities such as the human gut, we were interested in harnessing conjugative plasmids as a better alternative to current microbiota manipulations, including administering probiotics, antibiotics, and fecal transplants. (Limitations of these approaches are described in more detail in Section 3.1.1, while a brief primer on horizontal gene transfer is presented in Section 3.1.2.) To enable more precise and longer-lasting treatments, we investigated the potential of delivering self-transmissible vectors that carried Cas9 cassettes, which could not only prevent cell uptake of pathogenesis genes, but also copy themselves into specific strains to more stably immunize the cell. In Section 3.2, we propose to mobilize a vector from a donor strain into the native microbiota to circumvent the problem of colonization resistance. These vectors can be utilized to transfer immunomodulators or fitness genes identified in Chapter 2. In order to measure conjugation rates, we developed universal media to co-culture different representative microbiota species, profiled antibiotic resistance, and generated molecular identification and quantification methods for each species. These efforts form the basis for characterizing transfer efficiencies of different conjugative plasmids across various bacterial species. 73 In Section 3.3, we introduce a Cas9 payload that inhibits the acquisition of antibiotic resistance and toxin genes. This approach immunizes cells against entry of these mobile genetic elements and actively eliminates any that are already present in the cell. While building large arrays of CRISPR spacers that would target multiple sequences, we encountered issues with synthesizing and maintaining these highly repetitive constructs (due to the same CRISPR repeat sequence that must be interspersed between the spacers). Therefore we built alternate CRISPR structures that would lower the instability from recombination at the repeat sequences. Then, we demonstrated the feasibility of a “genome-copying” version of the Cas9 payload that can integrate itself into a pre-defined location on the bacterial genome. Combining efficient conjugative plasmids for a particular donor-recipient pair of strains and the genome-copying Cas9 cassette provides enhanced long-term stability of the engineered payload in the recipient population. 3.1.1 Limitations of current microbiota manipulations In spite of the popularity of probiotics (i.e., live microbes that may promote human health), few studies have proven their clinical benefit, with only examples in specific clinical conditions such as prevention of pouchitis and atopic dermatitis (270). Unless native microbiota are cleared, probiotic strains do not colonize the gut and provide only transient effects for the length of their passage through the GI tract. Similarly, probiotic strains engineered as vehicles for immuno-modulators (e.g., IL-10) and various mucosal vaccines are designed to survive long enough in the GI tract to delivery their payload (172, 271–273). Colonization resistance (CR) describes stably established microbiota that prevents introduced species from colonizing in the same niche. While the mechanisms of CR are incompletely understood, it is thought that the native microbiota deplete nutrients, directly inhibit pathogens, and stimulate host defenses (274). Antibiotics are commonly prescribed for bacterial infections, and may be effective against Clostridium difficile infection and IBD (275). However, widespread antibiotic use has led to rapid dissemination of antibiotic resistance genes through MGEs (276, 277). Antibiotics nonspecifically kill native microbiota while selecting for antibiotic-resistant strains that can now expand into cleared niches. This is a major concern in hospitals, where healthcare providers and common facilities can be transmission vectors, and patients may already be immunecompromised. Compounded with the diminishing arsenal of novel antibiotic compounds, multidrug resistance strains of opportunistic pathogens are an increasing threat in both hospital- and community-acquired infections. Documented in as early as the 4th century by a Chinese doctor, fecal microbiota transplants (FMT) has received renewed interest in the past few years as a cure for refractory C. 74 difficile infection (278). Current FMT practice involves a colonoscopy procedure in which a homogenized stool filtrate prepared from a healthy donor is infused into the patient’s colon (279) to restore microbial diversity and stability, though more work is needed to characterize long-term colonization and address concerns of pathogen transmission. 3.1.2 Horizontal gene transfer Lateral or horizontal gene transfer (HGT) is the incorporation of genetic material from another organism without being its offspring. Bacterial DNA transfer can occur via conjugation, transduction, or transformation. Conjugation occurs via direct cell-to-cell contact, transduction via bactierophage, and transformation via naked DNA uptake from the environment by induced or naturally competent cells. Multispecies communities harbor a dynamic gene pool consisting of mobile genetic elements (MGEs), such as transposons, plasmids, and bacteriophages, which serve as a source of HGT to share beneficial functions with neighbors to preserve community stability (24, 25). Dense communities are active sites for gene transfer and reservoirs for antibiotic resistance genes (12, 28–30, 280). In particular, bacterial conjugation can occur frequently in the densely populated mammalian gut (281, 282). 75 3.2 Engineering horizontal gene transfer networks 3.2.1 Introduction Of the 1011 cells in the human body, 90% are microbes that naturally inhabit the gastrointestinal tract, oral cavity, skin, and other mucosal surfaces. This commensal microbial community, called the microbiota, has intricate effects on human health, such as the development and function of the host’s metabolism and immune system. The human gut is home to the most densely populated microbial community characterized to date and is an active site of horizontal gene transfer. We propose to control DNA transfer using bacterial conjugation, the transfer of genetic material between bacteria in close contact. This work contributes to the broader goal of engineering the human microbiome for prophylactic and therapeutic applications. Mobile genetic elements, such as broadly conjugative plasmids, would be useful vehicles for delivering and propagating heterologous genes in the microbiota in a controlled manner. To allow for simultaneous growth of different species in conjugal matings, we developed universal growth media that were optimized for pH and supplementation of amino acids, sugars, fatty acids, vitamins, and minerals. For each species, we characterized the antibiotic resistance profile, which is used for selection. To confirm the identity of each strain, we designed speciesspecific primers to perform colony PCR on individual isolates after a conjugation and confirm the expected band size of the amplicon by gel electrophoresis. Furthermore, we developed quantitative real-time PCR primers to quantify each species – this enables measuring transfer rates in mixed cultures. The conjugation system we used is based on RK2 (283), a member of the IncPα plasmid family, whose only required cis-acting element for DNA transfer is the transfer origin (oriT). We successfully transferred plasmids carrying RK2-oriT from E. coli into Gram-negative species Bacteroides fragilis, B. thetaiotaomicron, B. vulgatus, and B. uniformis, and Gram-positive species Enterococcus faecalis, Lactobacillus reuteri, and Streptococcus mutans. We also demonstrated the ability of B. fragilis and B. vulgatus to transfer the plasmid back into E. coli. 3.2.2 Materials and Methods 3.2.2.1 Strains and constructs Our studies included the following strains: Escherichia coli K-12 MG1655, Bifidobacterium adolescentis ATCC 15703, Bacteroides fragilis ATCC 25285, Bacteroides thetaiotaomicron ATCC 29148, Bacteroides uniformis ATCC 8492, Bacteroides vulgatus ATCC 76 8482, Enterococcus faecalis ATCC 29200, Lactobacillus reuteri ATCC 23272, Lactobacillus rhamnosus GG ATCC 53103, Lactobacillus paracasei ATCC 25302, Salmonella enterica, Streptococcus mutans ATCC 700610, and Streptococcus sanguinis ATCC BAA-1455. These strains will be abbreviated MG, Bado, Bthe, Buni, Bvul, Efae, Lreu, Lrha, Lpar, Sent, Smut, and Ssan, respectively, hereafter. MG strains with the conjugative plasmid RK2 are referred to as MGRK2. For conjugations, we used E. coli strains S17-1λpir (a gift from Andy Goodman). We focused on conjugative plasmid pFD340 (284) (a gift from C. Jeffrey Smith) and a newly constructed plasmid pBC003 (Figure 3-1). pFD340 is an E. coli-Bacteroides shuttle vector constructed by merging RK2-oriT, E. coli origin of replication pBR322, and a selectable marker (bla) that functions in E. coli with a Bacteroides cryptic plasmid pBI143 (284) and a selectable marker that functions in Bacteroides – this was ermFS from a Bacteroidies transposon, Tn4351 (285). pBC003 combines the RK2-oriT, Bacteroides origin of replication pBI143, E. coli origin of replication pBR322, bla marker, and ermFS marker from pFD340 with the origin of replication from plasmid pAMbeta1 (a gift from Todd Klaenhammer) and ermBP marker from pTRKH3-ldhGFP (Addgene plasmid 27167, (112)). pAMbeta1 is a broad host range plasmid in Gram-positives. While ermFS is known to confer erythromycin resistance in Bacteroides, we included ermBP for erythromycin selection in Gram-positives. Figure 3-1 Maps of plasmids used in this study. 77 3.2.2.2 Microbiological selection methods We tested three sets of media formulations. Their compositions are listed in detail in Table 3-1, Table 3-2, and Table 3-3. To minimize the effect of evaporation in wells on the edges of our 96-well plates, we systematically altered the order of strains and media conditions across three plates, as depicted in Figure 3-2. The final 1X concentrations of antibiotics or other selective compounds we used in our study were: carbenicillin 50 μg/mL, chloramphenicol 20 μg/mL, cefoxitin 20 μg/mL, 2-deoxy-d-galactose 0.05%, erythromycin 25 μg/mL, gentamicin 200 μg/mL, kanamycin 50 μg/mL, rif: rifampicin 20 μg/mL, sodium dodecyl sulfate 0.01%, spectinomycin 100 μg/mL, tetracycline 10 μg/mL, and trimethoprim 10 μg/mL. Table 3-1 Composition of first set of growth media. 78 Table 3-2 Composition of second set of growth media. 79 Table 3-3 Composition of third set of growth media. 80 Figure 3-2 Triplicate design to minimize effects of evaporation in edge wells. 3.2.2.3 Molecular identification We wrote custom Perl scripts to interface with locally installed BLAST (286) to design potentially species-specific primers that could be validated experimentally. Our algorithm entailed the following steps: 1. Download nucleic acid and amino acid sequences of all genes in each strain. 2. Perform “all-against-all” BLASTp. This carries out BLASTp between all pairs of sequences in the set of strains. 3. Create table of genes not listed in the hit table (since hits correspond to highly similar sequences) – this minimizes inter-species and intra-species non-specificity. 4. Select longest genes for each strain to proceed with primer design. 5. Optimize primers for melting temperature (Tm), amplicon length, and species-specificity. a. Amplicon length and Tm are critical for PCR and qPCR considerations. For screening by PCR, one may consider designing different sized bands to easily identify different strains by running PCR products out on a gel. For qPCR, all amplicons were designed to be ~100 bp in length. b. To check species-specificity, run BLASTn of the candidate primers against the database of interested strains, as well as the entire NCBI database. 81 3.2.2.4 Conjugation experiments Prior to conjugation, donor and recipient strains were grown to saturation. Except for E. coli strains, which were grown aerobically in LB at 37oC, all other strains were grown anaerobically (GasPak 100 System, Becton Dickinson, Franklin Lakes, NJ) in rich medium, such as supplemented Brain Heart Infusion or 3:2pas (Table 3-4). In a typical conjugation experiment (Figure 3-3), 1 mL per conjugal mating of saturated cultures were washed by spinning down at 5000 rpm for 2.5 min. The pellet was resuspended with the rich medium. This was repeated once to wash out antibiotics. Then equal volumes of resuspended donor and recipient strains were combined and spun down again. The concentrated mixture was plated out as three 25 μL puddles on rich medium agar without antibiotics. The puddles were allowed to air-dry for 10 min on the bench. Then plates were incubated agar-down at 37oC for 5 hours aerobically and protected from light. The puddles were collected with 1 mL of media – a sterile cell scraper was used to gently detach cells from the agar surface. The collected mixture was plated at various dilutions on selection agar plates that would quantify all donors, all recipients, and all transconjugants. scrape off the cells. To calculate a transfer frequency, the number of transconjugants was divided by the number of recipients. To further confirm the stability of the transferred plasmid in transconjugants, isolated colonies were re-streaked onto fresh plates. The presence of the plasmid and the identity of the strain were confirmed by PCR using species- or plasmid-specific primers. Figure 3-3 Conjugation mating experimental workflow. 82 3.2.3 Results 3.2.3.1 Differential growth media and antibiotics We selected representative microbiota species (7) to use in our growth media studies, which explored various components, such as supplements and pH. From growth curves grouped by strain (Figure 3-5) and by media (Figure 3-6), we found that in the first set of media conditions, FF, HHB, FFB, and FFBC provided sufficient nutrients for all 11 microbial species. These results were consistent with the second set of media conditions, in which we explored the effect of pH (Figure 3-7 by strain and Figure 3-8 by media). FF media at pH 6.5, 7.0, and 7.5, in addition to defined medium EZm, allowed growth of all species. However, the more complex media, HB, that was supplemented with vitamins, minerial mix, and fatty acids proved detrimental to growth of several species. Therefore, we investigated the simpler HHB media from the first set of experiments and changed the half to half ratio of BHI to MRS mix to 3/5 to 2/5. Comparing variations of this medium and defined medium AZ (Figure 3-9), we determined that the optimal universal growth media would be “3:2pas” (Table 3-4). Using 3:2pas, we profiled the antibiotic resistance of each strain (Figure 3-10). 3.2.3.2 Species-specific PCR and qPCR primers We developed molecular identification and quantification methods for representative microbiota species. Validated PCR primers for each are listed in Table 3-5. For real-time qPCR, standard curves were validated for ten species (representative standard shown for B. vulgatus in Figure 3-4). A threshold cycle number was directly converted to an equivalent OD value, which was used to calculate the number of cells. Cell numbers were calibrated from serial dilutions and plating on solid agar to count colony forming units. We confirmed linearity of the standard curves and specificity by analyzing the melting curve. Final qPCR primers are listed in Table 3-6. Figure 3-4 Example validation for qPCR primer pair. 83 Figure 3-5 First set of growth curves by bacterial strain. n = 3; error bars: min and max; blank values for each media condition have been subtracted. 84 Figure 3-6 First set of growth curves by media condition. n = 3; error bars: min and max; blank values for each media condition have been subtracted. 85 Figure 3-7 Second set of growth curves by bacterial strain. n = 3; error bars: min and max; blank values for each media condition have been subtracted. 86 Figure 3-8 Second set of growth curves by media condition. n = 3; error bars: min and max; blank values for each media condition have been subtracted. 87 Figure 3-9 Third set of growth data. Anaerobic growth after 48 hours. AZ = rich defined media from Teknova based on MOPS buffer and amino acids. 3:2 = rich undefined media in a 3:2 mix of Brain Heart Infusion to de Man, Rogosa & Sharpe. a = Vitamin K1 & Hemin. p = undefined protein mix including peptone. S = sugars maltose, fructose, & cellobiose. V = vitamin & mineral mix from ATCC. n = 3 88 Table 3-4 Composition of the “3:2pas” medium. 89 Figure 3-10 Antibiotic resistance profiles of representative microbiota species. Example dose response curves are shown in the top panel. The table shows minimum inhibitory concentrations (MIC) for 12 drugs derived from dose response curves using five dosage levels (0.1, 0.5, 1, 5, 10 X) and three timepoints (0, 2, 5 days). Antibiotic minimal inhibitory concentrations. carb: carbenicillin 50 μg/mL, chlor: chloramphenicol 20 μg/mL, cfx: cefoxitin 20 μg/mL, 2dog: 2-deoxy-d-galactose 0.05%, erm: erythromycin 25 μg/mL, gent: gentamicin 200 μg/mL, kan: kanamycin 50 μg/mL, rif: rifampicin 20 μg/mL, sds: sodium dodecyl sulfate 0.01%, spec: spectinomycin 100 μg/mL, tet: tetracycline 10 μg/mL, tri: trimethoprim 10 μg/mL 90 Primer Sequence Bado_purB_f GTGTCTCACGACTTCCCAACC Bado_purB_r GGAAACCACACTTTGCAGCC Bfra_pATP_f ATGAATTCAACTTTTGACATACGCAG Bfra_pATP_r CTGAAACCCCATATAGTTGCATGG Bthe_silC_f ATGACCTTTATATCTAATATACAATCGGTAGC Bthe_silC_r AGAAAGATAGCCAGGCCAATAATG Buni_hypo_f ATGATAAGCAAACCTCACGGTCT Buni_hypo_r ATCGCTCCCTCCTTATTGATGG Bvul_hyp1_f ATGGACATTAGTTCTATATTATGGGGCT Bvul_hyp1_r TCTTAAGTTTCGTATTGGTTCTAACCTC Ecoli_glcB_f ATGAGTCAAACCATAACCCAGAGC Ecoli_glcB_r ACGATTTTCTGGTGCCAGATCAT Efae_ASP_f ATGAAAAAAATGTTTAGTTTTGAGTTTTGGC Efae_ASP_r TAACAATTCAATATTTCCAAACGAATGCAC Amplicon size 315 bp 400 bp 500 bp 599bp 700 bp 153 bp 1200 bp Lpar_ABCt_f ATGAAGTTAGATTTGGAACTACGCC Lpar_ABCt_r TAAAGGTATGACCTTGCGGATGA Lreu_rihA _r CCCTAAATTCAGTCGGTTTTTCAAGAT Lreu_rihA_f ATGTTAGATATCCTGGATTACACGAAACA Lrha_PrtP_f ATGCAAACAAAAAGGAAAGGGCTAT Lrha_PrtP_r CCTTTTTAGTATCACTAAGCCGCATG RK2_trbL_f ATGAAAATCCAGACTAGAGCTGCC RK2_trbL_r CTAATCACGGTCAAGGTCCAGAA Sent_tviC_f ATGAATTTAATGAAATCGTCAGGGATGTTT Sent_tviC_r TGAAGATTACGGACCGAAGTTGG Smut_purE_f ATGAAAAACAGACTGCTATTTTTAGAAGGT Smut_purE_r CTTCCTCATGTGTCGGCAAAATAG Ssan_dexS_f ATGAAAAAACAAGTTTCTTACAAGCAGC Ssan_dexS_r CGATATTGTTTAATGGCAAGGCTTG 400 bp 100 bp 700 bp 194 bp 1000 bp 250 bp 850 bp Table 3-5 List of species-specific primers. 91 Primer Sequence Bado_98_f CTTGGTACTTACCTCAACTGGAA Bado_98_r TTGGAGAAGAAGTCGGGAATG Bfra_101_f TATAAAAGCACGGAGATAGTGAAGA Bfra_101_r ACGAGATACTTCAGTTCGGC Bthe_100_f GACCTTTATATCTAATATACAATCGGTAGC Bthe_100_r GATAGTTACAGCGAGTACCGTG Buni_100_f2 ATGTTTTTAATGTTTATGAGCGCTTG Buni_100_r2 ACATACCATCTTCTATTGAAACGC Bvul_99_f TGGACATTAGTTCTATATTATGGGGC Bvul_99_r ACGTTGTTTTATCCTTCGTTGAA Ecoli_102_f3 TTGATATCGGTATTGCCAGTTAAAC Ecoli_102_r3 CATATAGGTGTCGTAAGCATGAAC Efae_99_f ATGAAAAAAATGTTTAGTTTTGAGTTTTGG Efae_99_r AATACTAATCATTAAACCCGCTGC Lpar_100_f ATGAAGTTAGATTTGGAACTACGC Lpar_100_r GTTGATTAAAATCCGCTAAGATCGTA Lreu_100_f ATGTTAGATATCCTGGATTACACGAA Lreu_100_r CCCTAAATTCAGTCGGTTTTTCAA Lrha_100_f GGTCTAATTACAAGTATAAAGGGGAAG Lrha_100_r TTTTTAGTATCACTAAGCCGCATG Smut_100_f AAAAACAGACTGCTATTTTTAGAAGGT Smut_100_r CAGGAGACAGGACATCAACTTT Table 3-6 List of species-specific qPCR primers. 92 3.2.3.3 Rates of conjugation First, we tested transfer rates of pFD340, which is an E. coli-Bacteroides shuttle vector constructed by merging RK2-oriT, an E. coli origin of replication, and a selectable marker that functions in E. coli with a Bacteroides cryptic plasmid pBI143 (284) and a selectable marker that functions in Bacteroides – this was ermFS from a Bacteroidies transposon, Tn4351 (285). For these conjugations, we used a 1:1 donor:recipient ratio and incubated the matings for 18 h. To select for Bacteroides, we used Brucella agar with kanamycin and vancomycin (BKV); to select for Bacteroides transconjugants, we added erythromycin. The conjugation rates are around 10-4 in B. fragilis, B. thetaiotaomicron, and B. uniformis, though the rate is a few orders of magnitude less efficient in B. vulgatus (Table 3-7). Donor E. coli S17-1λpir pFD340 Recipient log10(transfer frequency) Bacteroides fragilis -4.7 Bacteroides thetaiotaomicron -3.7 Bacteroides uniformis -4.4 Bacteroides vulgatus -7.2 Table 3-7 Conjugation frequencies of pFD340 into Bacteroides. In secondary transfers, we grew verified and purified B. fragilis and B. vulgatus transconjugants from the previous conjugation to serve as new donors of pFD340. Using LB agar with carbenicillin to select for E. coli transconjugants, we measured transfer efficiencies of 10-7 into the original E. coli donor strain and almost the same from B. vulgatus into E. coli MG1655 (Table 3-8). Donor Recipient log10(transfer frequency) Bacteroides fragilis pFD340 E. coli S17-1λpir -6.2 E. coli MG1655 -9.5 Bacteroides vulgatus pFD340 E. coli S17-1λpir -7.7 E. coli MG1655 -8.4 Table 3-8 Secondary transfers from Bacteroides into E. coli. 93 For plasmid pBC003, we measured transfer rates into E. faecalis, L. reuteri, and S. mutans around 10-8, and could not detect conjugation into B. fragilis and B. thetaiotaomicron (Table 3-9). For E. faecalis, our selection media was 3:2pas supplemented with gentamycin and erythromycin; for L. reuteri and S. mutans, we used MRS agar with erythromycin. Donor E. coli S17-1λpir pBC003 Recipient log10(transfer frequency) Bacteroides fragilis not detected Bacteroides thetaiotaomicron not detected Enterococcus faecalis -8.4 Lactobacillus reuteri -7.7 Streptococcus mutans -8.0 Table 3-9 Conjugation frequencies of pBC003. 3.2.4 Discussion To investigate and differentiate species in a complex microbial system by microbiology and molecular biology methods, we successfully developed growth media, antibiotic selections, and species-specific primers. With regard to conjugation rates, our findings are in line with prior frequencies in the literature (Table 3-10), though direct comparisons are difficult given that conjugation depends on numerous factors, such as the donor and recipient strains, media conditions, plasmids, and conjugation parameters. In particular, conjugation can vary by the growth phase of donors and recipients, their relative ratios, conjugation mixture density, and mating time. Shuttle vectors are often constructed by adding an origin of replication and a selection marker that function in E. coli to isolated cryptic plasmids from a species of interest – an origin of transfer is also included if conjugation will be used to introduce the plasmid. We expanded upon this strategy and combined multiple replication and mobilization machinery, though this may have rendered the pBC003 plasmid unstable in certain species; we observed lower rates of transfer with pBC003 than with the simpler pFD340 from E. coli to Bacteroides species. In future studies, it may be useful to isolate other cryptic plasmids in order to expand the set of possible origins of replication to test, or even explore ways to engineer conjugative transposons, such as Tn916 (86). 94 Plasmid features pBR322-oriR + RK2-oriT + pCP1 pRRI2 + pUC-oriR (+ RK2-oriT?) R6K-oriR + RK2-oriT + transposase pBR322 + RK2-oriT pAMbeta1-oriR pAMbeta1 pBR322 + RK2-oriT + pAMbeta1-oriR-tra RK2-oriT + ColE1 + pB44-rep Donor Recipient log10(Freq.) Ref. E. coli Bacteroides fragilis -6 (287) Bacteroides uniformis Bacteroides vulgatus Bacteroides thetaiotaomicron E. coli Enterococcus faecalis -4.3 -4.6 (288) -5.4 (104) -1.7 -6.3 (289) Enterococcus faecalis -2 (290) E. coli -8.3 (290) Bifidobacterium breve Bifidobacterium bifidium -6 E. coli E. coli E. coli Enterococcus faecalis Enterococcus faecalis E. coli -6 (291) Table 3-10 Conjugation frequencies from literature. Plasmid features indicate the relevant origins and proteins for replication and conjugation on the vector used in the specific donor and recipient mating associated with the listed log10(transfer efficiency). 3.2.5 Acknowledgements This work was supported by grants from the National Institutes of Health Director’s Early Independence Award (grant 1DP5OD009172-01 to HHW), the US Department of Energy (grant DE-FG02-02ER63445 to GMC), and the Wyss Institute for Biologically Inspired Engineering. SJY also acknowledges support from the National Science Foundation Graduate Research Fellowship and the MIT Neurometrix Presidential Graduate Fellowship. We thank Tara Gianoulis for bioinformatics assistance, Pooja Jethani for contributions to qPCR primer design and validation, Mary Delaney and Andrea DuBois for guidance on anaerobic culturing and techniques, Marc Lajoie for defined media, and Andy Goodman for conjugation advice. 95 3.3 Immunizing strains against acquisition of antibiotic resistance and toxins 3.3.1 Introduction Many prokaryotes use Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR-associated (Cas) genes to limit horizontal gene transfer (HGT) from the environment. In these CRISPR-Cas systems, an RNA-guided protein complex recognizes a target sequence (or “protospacer”) on an invading plasmid or phage genome. The host-encoded sequence for transcribing the RNA guide is called a spacer; spacer acquisition allows microbes to be immune against subsequent viral infection or plasmid transfer (292). In fact, strains without CRISPR-Cas more readily acquire plasmids and pathogenicity islands or become infected with bacteriophage (293). This is concerning given that bacteriophage can carry genes encoding virulence factors, such as diphtheria toxin on phage β in Corynebacteria diphtheria (294, 295), enterotoxin A on phage PS42-D in Staphylococcus aureus (296), shiga toxin (stx) on lambdoid phages in pathogenic Escherichia coli (297), and cholera toxin (ctx) on phage CTXΦ in Vibrio cholerae (298). In this section, we harness CRISPR-Cas9 and HGT to prevent the acquisition of undesirable antibiotic resistance or pathogenesis genes, as well as demonstrate a method to more stably introduce our engineered elements that expands upon conjugative plasmids used in the previous chapter. First, using the Streptococcus pyogenes CRISPR-Cas9 system (299), we validated spacers that prevent the transfer of several antibiotic resistance and toxin genes. Critically, we targeted multiple sequences within a target gene in order to diminish the likelihood of escape that could arise from mutations. We demonstrated applications of Cas9-mediated immunization in E. coli (against Shiga toxin and numerous clinically relevant beta-lactamases) and V. cholerae (against cholera toxin, to enhance live attenuated cholera vaccines). Second, over the course of constructing and testing multiple CRISPR spacers, we encountered difficulties with synthesis and stability. The natural structure of CRISPR is an array of spacer sequences flanked by the same repeat sequence. Not only were we limited by commercial gene synthesis services to construct these largely repetitive constructs, but we also observed that the repeats allowed for recombination and subsequent escape of targeted species from the loss of CRISPR spacers. We tested a variety of mutations in the repeat sequence that would not compromise Cas9 function; these alternative CRISPR repeats could then be interspersed in an array to minimize recombination risk and improve constructability. Third, we were interested in leveraging cell-to-cell transfer via conjugation as described in the previous section to propagate the Cas9 cassette to native microbiota. Instead of simply 96 using Cas9 to eliminate pathogens as suggested by others (300), we present a novel method to permanently immunize endogenous microbes using a “genome-copying” version of the cassette (Figure 3-11). This design contains spacers against a recombination hotspot on the genome, defined by nearby crossover hotspot instigator, or Chi (χ), sites (301), and has homology arms to serve as a donor repair template during homologous recombination. Once transferred into a recipient cell, the cassette copies itself into the bacterial chromosome. The continued distribution of the mobile element in the population will expand the proportion of immunized cells and limit the spread of Cas9-targeted sequences such as antibiotic resistance genes. Here, we demonstrated successful genome-copying in E. coli. Figure 3-11 Design of Cas9 cassette with genome-copying feature. A Cas9 cassette contains spacers targeting a region “A” on the genome and is flanked by homology arms for the genomic target site. The cassette can be carried on a conjugative plasmid or a prophage to propagate and replace genetic elements in a microbial community. 3.3.2 Materials and Methods 3.3.2.1 Strains and plasmids Chemically competent E. coli (NEB Turbo, New England Biolabs, Ipswich, MA) were used for routine cloning. Wild-type E. coli K-12 MG1655 and E. coli B were also used for 97 CRISPR assays. For studies with V. cholerae, we used the El Tor strain Bah-2 (E7946 ΔattRS1, (302)). For conjugations, we used E. coli MFDpir (a gift from Jean-Marc Ghigo), which is a diaminopimelic acid (DAP) auxotroph and free of the Mu prophage (85). Phage T4 and T7 stocks were propagated in E. coli K-12. pCTX-Km, the replicative form of CTX-KmΦ (ΔctxAB), was prepared from V. cholerae O395 (298). E. coli were grown at 37oC in LB broth and supplemented with antibiotics as needed at final concentrations of 100 μg/mL spectinomycin, 30 μg/mL chloramphenicol, 300 μg/mL erythromycin, 50 μg/mL kanamycin, 100 μg/mL carbenicillin, and 10 μg/mL tetracycline. V. cholerae were grown at 37oC in LB broth and supplemented with antibiotics as needed at final concentrations of 50 μg/mL kanamycin, 100 μg/mL carbenicillin, and 10 μg/mL tetracycline. E. coli cells expressing SpCas9 were constructed by transforming in DS-SPcas (Addgene plasmid 48645, (303)), which encodes SpCas9 and its cognate tracrRNA on a backbone with a cloDF13 origin of replication and aadA gene. We assembled compatible protospacer plasmids encoding target or control sequences with their PAMs on a plasmid with a pBR322 origin of replication and a bla gene or a plasmid with a colE1 origin and kan marker. In some assays, the spacer was expressed on DS-SPcas such that there was no separate spacer plasmid. In all other experiments, we maintained the designed spacer on a separate plasmid (based on PM-SP!TB, Addgene plasmid 48650, (303)) that expressed one spacer followed by the SpCas9 repeat on a backbone with a p15a origin of replication and cat gene. V. cholerae cells expressing SpCas9 were constructed by encoding SpCas9, the tracrRNA, and spacers on a backbone with a pBR322 origin of replication, bla gene, and RK2-oriT sequence for conjugation from E. coli MFDpir into the strain. Targeted protospacers were placed on plasmids with an SC101 origin of replication, tetracycline resistance, and RK2-oriT for conjugation from E. coli MFDpir into the strain. 3.3.2.2 Spacer validation Spacers were validated by transformation, conjugation, or phage infection. In general, “protected” cells were first prepared by introducing the plasmid(s) that encoded Cas9, tracrRNA, and the spacer of interest. Then, we transformed equimolar amounts of either the targeted protospacer plasmid or a control untargeted protospacer plasmid into the protected cells. After selecting for the co-existence of all plasmids (Cas9, spacer, and protospacer), we quantified Cas9 activity as the number of transformants in the targeted protospacer plasmid condition relative to the non-targeted plasmid. In conjugation assays, we prepared equal numbers of donor and recipient cells across 98 experimental conditions. Recipient cells were either protected E. coli or V. cholerae with Cas9, tracrRNA, and spacers to be validated, or unprotected cells with similar plasmids that lacked the Cas9 machinery. Donor cells were E. coli MFDpir with the targeted protospacer plasmid or a control untargeted plasmid. For each conjugal mating, we washed 1 mL of overnight donor and recipient cell cultures by spinning down at 8000 x g for 3 min, resuspending in PBS, repeating the wash, mixing the two resuspensions for another spin down, and transferring the final ~30 μL of densely resuspended pellet onto LB agar plates as spots of 5-10 μL puddles. DAP was supplemented in the final resuspension media prior to transferring onto the agar plate. The spots were allowed to air dry for 10 min before incubating face up (i.e., agar on the bottom) at 37oC for 5 h. The cells were then collected with 1 mL PBS and a sterile scraper. Dilutions were plated on selective media to quantify total number of recipients, donors, and transconjugants. To characterize the level of phage resistance conferred by Cas9, we infected normalized densities of protected E. coli with equal titers of phages and counted the number of formed plaques. We obtained equal cell densities by diluting an overnight culture and normalizing to an OD600nm of 0.3 after several hours of growth. Then we added 2 μL of phage to 120 μL of cells, mixed them in 1 mL of 0.6% top agar with appropriate antibiotics within 20 minutes, and poured the mixture onto 3 mL of 1.5% solid agar. Replicate experiments were performed with different phage dilutions. We measured Cas9 activity by comparing the number of plaques formed on a protected strain to the number formed on a susceptible strain. In CTXΦ transduction assays, AKI media was used for TCP induction of V. cholerae Bah-2 (304). Cas9 activity was measured as the relative pCTX-Km transduction efficiency of protected Bah-2 strains compared to an unprotected Bah-2 strain. 3.3.3 Results 3.3.3.1 Characterization of spacers against antibiotic resistance and toxins We designed spacers targeting aminoglycoside resistance (aphA), beta-lactamase (bla), Klebsiella pneumoniae carbapenemase (blakpc, (305)), New Delhi metallo-beta-lactamase 1 (blaNDM-1, (306)), vancomycin resistance (vanA and vanB, (307)), Shiga toxin (stx2A and stx2B), the primase/helicase gene in phage T7, and the major capsid protein in phage T4. For V. cholerae, we targeted tetracycline resistance genes carried on mobile genetic elements, such as the conjugative plasmid RK2 and the integrating conjugative element SXT, which can spread antibiotic resistance (308, 309). We also designed CRISPR spacers against cholera toxin (ctxA and ctxB) and rstA, required for replication of phage CTXΦ. Overall, we observed three to five orders of magnitude of Cas9-mediated protection in our transformation, conjugation, and 99 transduction assays in E. coli and V. cholerae for spacers targeting antibiotic resistance, toxin, and phage genes (Table 3-11). All spacers were first validated in a plasmid transformation assay in E. coli. Anti-phage spacers were further tested with T4 and T7 phage infection experiments in E. coli and CTXΦ transduction assays in V. cholerae. We also carried out conjugation assays for all V. cholerae spacers using E. coli MFDpir donors and V. cholerae Bah-2 recipients – a representative assay is shown in Figure 3-12. . E. coli Spacer Sequence V. cholerae PAM . Spacer . Antibiotic resistance aphA.1 CACTCATCCAATCTCACTGA C . aphA.2 CTGCTGGACGAACTTTTCTA A . bla ACTTTAAAAGTGCTCATCAT T . kpc.1 GCATTTTTGCCGTAACGGAT G . NDM.1 GAAGTGTGCTGCCAGACATT C . vanA.1 GCTGTTTCGGGCTGTGAGGT C . vanB.1 GCGATTTCGGGCTGTGAGGT C . Sequence Antibiotic resistance RK2tetA.2 ATCTTGCTCGTCTCGCTGGC C SXTtetA.1 CGGCGAGTAAGATTAATGTA G SXTtetA.2 ATATTACTACTATCTCTTGC A . Toxins PAM Toxins . ctxA.1 TAAACAAAGGGAGCATTATA T stx2A.1 CCCTCTTGAACATATATCTC A . ctxA.2 GGATTTGTTAGGCACGATGA T stx2A.2 CCCTGAGATATATGTTCAAG A . ctxA.3 CATCCATATATTTGGGAGTA T stx2A.3 GGGAGAGGATGGTGTCAGAG T . ctxA.4 TTTGTCTTTTAACTTTAGAT T stx2B.4 AAACTGCACTTCAGCAAATC C . ctxB.5 ATTATGATTAAATTAAAATT T . ctxB.6 GAATCTATATGTTGACTACC T . ctxA.7 TTTAACGTTAATGATGTATT A CCTGATGAAATAAAGCAGTC A Phage T4.Y AAGAACTTCCAACCGGTAAT G . ctxA.8 T7.7 TTCGGGAAGCACTTGTGGAA T . T7.8 GATGCTTGAGGAGTCCGTTG A . . rstA.1 Phage TTTTTGTCGATTATCTTGCT T Table 3-11 Validated spacers for E. coli and V. cholerae applications. All spacers listed conferred three to five orders of magnitude of Cas9-mediated protection relative to a control sequence across a variety of assays, including transformation, transduction, and conjugation. 100 Figure 3-12 Example CRISPR spacer validation assay in V. cholerae. In this test, we validated spacers against rstA, which is required for DNA replication of phage CTXΦ. We constructed two E. coli MFDpir donors, one with the rstA protospacer (“rst”) and the other without (“-”). We also prepared two V. cholerae Bah-2 recipients, one with a Cas9 plasmid targeting rstA (“!rst”) and another without the rstA spacer (“!con”). Then, we performed four conjugal matings (i.e., all the combinations). The purpose of the control conjugations is to account for possible differences in the background transfer rate of various plasmids or differences in the ability of various recipients to receive a plasmid. A transfer efficiency was calculated for each conjugation by dividing the number of transconjugants by total recipients. Here, the normalized ratio of transfer frequencies between the two plasmids into the control recipient was 10-3:10-3 = 1 as expected, but is critical to check in all conjugation experiments. The level of Cas9 activity was measured as the relative normalized transfer efficiency of the properly targeted protospacer plasmid compared to the non-targeted plasmid. In this experiment, the plasmid with a spacer targeting rstA provided Bah-2 at least four orders of magnitude of protection against an invading plasmid carrying the rstA sequence. 101 3.3.3.2 Investigating stability and designing alternative CRISPR-Cas9 repeat sequences Once we validated individual spacers, we sought to combine them into large multi-spacer arrays to build broadly immunized E. coli and V. cholerae strains. In conjugation assays using V. cholerae Bah-2 carrying Cas9 and one, three, or five different spacers against ctxA, we observed comparable levels of protection against the introduction of a plasmid carrying the ctxA protospacers. However, as we attempted to build even larger arrays, we encountered difficulties synthesizing these highly repetitive constructs. Furthermore, when we began to characterize escapees from our Cas9 assays, we found that recombination at the repeat sequences was a common mode of escape. In the simplest case, we began with an array of one spacer flanked by two repeats – this is the structure of native CRISPR systems, though we demonstrated previously that a spacer-repeat is equivalent in activity to a repeat-spacer-repeat format (303). In half of the clones we sequenced, we found recombination that excised the spacer and one repeat (Figure 3-13A). With another larger array of different spacers (numbered 1 through 11), we also found escapees with deletions of portions of the array. For example, when spacer 5 was targeted, spacers 3 through 8 were deleted, and when spacer 4 was targeted, spacers 3 through 5 or 4 through 9 were deleted (Figure 3-13B). Therefore, we designed alternative repeat sequences that would preserve Cas9 activity but improve array stability and ease of synthesis. We used previously validated spacers and transformation assays to benchmark our alternative repeat designs. First, we varied up to three bases in the repeat without making changes in the tracrRNA sequence (Figure 3-14A), and found they were comparable in Cas9 activity using the bla spacer. Then, we introduced more variations in the repeat sequence and made corresponding changes in the base-paired positions in the tracrRNA (Figure 3-14B); these maintained Cas9 activity in our transformation assays using a control spacer sequence, ACTTTAAAAGTATTCGCCAT, which is four bases different from the bla spacer Third, we shortened the repeat and tracrRNA lengths, from the native 36 nt to 28, 22, 16, and 14 nt (Figure 3-15A). We found that 16 and 14 nt versions no longer provided Cas9 activity with the control spacer, possibly because we destabilized the duplex structure from A6 to G9 of the repeat (310). Thus, we focused on 18 nt and 16 nt variations that maintained the AGAG at positions 6 to 9 while introducing a single mismatch in other regions (Figure 3-15B). Since the 16 nt version with a mismatch also disrupted Cas9 activity, this time in transformation assays using the T4.Y spacer, we proceeded to use variants of 18 nt with either one or two mismatches (Figure 3-16). We found that all of these designs functioned as well as the wild-type in transformation assays with the T4.Y spacer. 102 A B Figure 3-13 Escapees recombine at repeat regions to excise the spacer. A. Sequence view of recombined escapees from single spacer array in repeat-spacer-repeat format. B. An array of 11 spacers (colored boxes) with wild-type (wt) repeat sequences (gray diamonds). When protospacer 5 or 4 plasmids were introduced, escapees had lost portions of the array. 103 Figure 3-14 Alternative CRISPR repeats with base substitutions. The wild-type SPcas9 repeat (crRNA) and tracrRNA are displayed at the top for comparison. Mutations are in blue text and yellow highlighting. 104 Figure 3-15 Alternative CRISPR repeats with truncations. The wild-type SPcas9 repeat (crRNA) and tracrRNA are displayed at the top for comparison. Mutations are in blue text and yellow highlighting. Introduced mismatches between the crRNA and tracrRNA are in red text. 105 Figure 3-16 Alternative CRISPR repeats with length 18 nt and one to two mismatches. The modified 18 nt SPcas9 repeat (crRNA) and its tracrRNA are displayed at the top. Further introduced mismatches between the crRNA and tracrRNA for other 18 nt length versions are in red text. 106 3.3.3.3 Genome-copying for stable incorporation of engineered mobile elements in E. coli We identified a recombination hotspot on the E. coli genome (at psiE) that was flanked by two sets of appropriately oriented chi (χ) sites – basically “chi> chi> <chi <chi”, where “chi>” is 5’-GCTGGTGG-3’ and “<chi” is 5’-CCACCAGC-3’ (Figure 3-17). To demonstrate genomecopying, we constructed a plasmid with 2 kb homology arms to the region flanking a cassette with SPcas9, tracrRNA, and spacers targeting two sites on psiE. To avoid self-cutting of the plasmid, we recoded the protospacer regions within the 2 kb homology arms. We confirmed the self-insertion of the cassette with PCR. We also performed a control transformation of the plasmid into RecA-deficient E. coli; as expected, we did not observe viable transformants. Figure 3-17 Stable incorporation of engineered Cas9 mobile elements in E. coli. The cassette includes spacers targeting a recombination hotspot (flanked by χ sites). The dark vertical lines in the repaired version indicate recoded protospacers to preclude self-cutting. 107 3.3.4 Discussion Now that we have confirmed several spacers for antibiotic resistance genes and toxins in E. coli and Vibrio cholerae, we can design more spacers to target many other clinically relevant sequences (311). But to combine numerous spacers into one array requires novel CRISPR repeat sequences to facilitate array-building and promote in vivo stability. We have shown viable substitutions and truncations for single spacers; the next step is to construct arrays of these alternative repeats flanking multiple spacers targeting various antibiotic resistance, toxin, and phage genes. These arrays can then be incorporated in a Cas9 cassette for self-copying onto a bacterial genome. While we have shown the feasibility of genome-copying, there are important parameters to characterize, such as the optimal number of chi sites and length of homology arms, since the presence of more chi sites is synergistic for recombination (312). Given that the psiE site in E. coli was a viable genome-copying site, we searched for similar regions with nested chi sites. For notation, we labeled chi sites in this structure as “χ outer >”, “χ inner >”, “< χ inner”, and “< χ outer”. For searching through the genome, we used three criteria: 1) the inner chi sites are within 1 kb of each other, 2) the outer chi sites are within 8 kb of each other, and 3) the two chi sites on either side in the same orientation are within 4 kb of each other. Out of 1008 chi sites in E. coli K-12 MG1655, we found 30 unique “χ inner >” sites and 29 unique “< χ inner” sites satisfying the 1 kb requirement. For sites within 8 kb, there were 199 unique “χ outer >” sites and 225 unique “< χ outer” sites. There were 6 regions with the nested structure, which were essentially 4 due to multiple chi sites satisfying the criteria nearby (Table 3-12). We also considered the probiotic E. coli strain Nissle 1917, which exerts antagonistic effects against pathogenic enterobacteria (313) through nutrient competition (267). Of the 1100 chi sites in E. coli Nissle 1917, we identified 35 unique “χ inner >” sites and 34 unique “< χ inner” sites satisfying the 1 kb criterion. There were 230 unique “χ outer >” sites and 247 unique “< χ outer” sites within 8 kb of each other. 15 genomic regions contained the nested structure; 6 were unique (Table 3-13). Besides copying the Cas9-enabled immunization vector into E. coli, our approach can be applied to other species in the microbiota. This can be most readily achieved if the speciesspecific chi site is already known; otherwise candidate chi sties can be identified with a statistical model applied to the core genome (314). Then drawing upon methods for cross-species conjugation as described in the previous section, we can promote active immunization of targeted species in the native microbiota against antibiotic resistance, toxins, and other clinically relevant genes carried on mobile genetic elements. 108 Genomic locations of nested χ sites on MG1655 (GenBank: U00096.3) 719280, 722817, 723361, 723873 719280, 722817, 723361, 725739 1643621, 1643673, 1644198, 1644255 1891128, 1892555, 1893108, 1894619 1891128, 1892555, 1893108, 1895241 4237067, 4240099, 4240945, 4243580 Genes in region speF, ybfK, kdpE, kdpD, kdpC, kdpB ydfU (part of cryptic prophage Qin/Kim) tsaB, yoaA, yoaB, yoaC, yoaH, pabB yjbG, yjbH, yjbT, psiE, xylE, malG, malF Table 3-12 Nested chi site regions in E. coli MG1655. Genomic locations of nested χ sites on Nissle 1917 (GenBank: CP007799.1) Genes in region 765030, 768660, 769204, 771582 speF, kdpE, kdpD, kdpC, kdpB 1310494, 1311443, 1312297, 1313501 several hypothetical proteins, DNA primase 1310494, 1311443, 1312297, 1314360 1825808, 1829538, 1829741, 1831893 hypothetical proteins, tRNA-Val 1829538, 1831690, 1831893, 1833996 1975113, 1976540, 1977093, 1978604 1975113, 1976540, 1977093, 1979226 ATP-dependent helicase, endoribonuclease LPSP, hypothetical proteins, pabB 2062892, 2066233, 2066848, 2067119 2062892, 2066233, 2066848, 2067971 2062892, 2066233, 2066848, 2068122 2062892, 2066233, 2067119, 2067971 2062892, 2066233, 2067119, 2068122 DNA adenine methylase, serine protease, antitermination protein, antirepressor, crossover junction endodeoxyribnuclease, adenine methyltransferase, GntR family transcriptional regulator, hypothetical proteins 2062892, 2066233, 2067119, 2070916 2945787, 2948884, 2949574, 2950449 2945787, 2948884, 2949574, 2953293 L-aspartate oxidase, srmB, LysR family transcriptional regulator, grcA, uracil-DNA glycosylase Table 3-13 Nested chi site regions in E. coli Nissle 1917. 109 3.3.5 Acknowledgements This work was supported by US Department of Energy grant DE-FG02-02ER63445 (to GMC) and the Wyss Institute for Biologically Inspired Engineering. SJY was supported by a National Science Foundation Graduate Research Fellowship and KME by the Wyss Technology Development Fellowship. We thank members of the Waldor lab, including Yoshi Yamaichi, Brigid Davis, and Bill Robins, for assistance with V. cholerae methods. 110 Chapter 4 Replacing gut microbial strains with precision using phages and CRISPR 4.1 Background Although probiotics are advertised as health-promoting, these bacteria have limited efficacy due to their inability to persist in the gut for more than a few days because of competition from endogenous bacteria already residing in all available niches. One approach to displace native microbes is to administer antibiotics, but these are often too broadly acting, harming bystander bacteria as collateral damage. The long-term goal of this chapter is to modulate community composition by eliminating a specific native strain, thus emptying its niche for an engineered version. The use of bacteriophages for precise microbiota perturbations is promising for several reasons. Phages are highly specific to their bacterial host species, have been naturally used by bacteria to dominate a niche in the gut (315), and can be administered as a small cocktail to deplete a large fraction of bacteria – a collection of four T4-like phages decreases E. coli by 60% in the mouse gut (316). Furthermore, bacteriophages have been successfully applied in phage therapy against pathogenic bacteria in Eastern Europe (317). As described in the previous chapter, immunizing strains against acquiring pathogenic elements from the environment is important for preventing the spread of toxin or antibiotic resistance genes. With the rise of multidrug resistant pathogens in both hospital- and community- 111 acquired infections, there is increasing concern that unprotected probiotics or “naïve” endogenous microbiota may be compromised and lead to continued spread of virulence factors (318). We leverage a specific type of bacterial adaptive immune system, called CRISPR-Cas9, which destroys foreign DNA carried on bacteriophages or plasmids. In native prokaryotic Type II CRISPR systems, transcribed arrays are processed into CRISPR RNAs (crRNAs) that form a complex with Cas9 and a trans-activating RNA (tracrRNA) (299). The crRNA guides Cas9 to double-stranded DNA sequences called protospacers that match the sequence of the spacer and are flanked by a protospacer adjacent motif (PAM) unique to the CRISPR system (319). If spacer-protospacer base-pairing is a close match, Cas9 cuts both strands of DNA. We propose a protocol in which a targeted endogenous bacterial species is depleted with phages to benefit an engineered strain having Cas9-mediated phage immunity. This ecological approach creates a competitive advantage for the protected cells, which will more stably colonize. To prevent the introduced strain from losing its engineered function or acquiring pathogenic elements, we designed a population cycling method to systematically replace the strain with different versions (A, B, C, etc.) (Figure 4-1). The versions may be governed by phages with naturally different host ranges, or recoded sequences in the same parent phage. The CRISPR-Cas defense system in each bacterial version will carry corresponding spacers against its phage. Therefore, the wild-type occupant is first eliminated by phage A. Then strain A (which is immune to phage A but susceptible to all other versions) is introduced. After some time, phage B is introduced to clear out strain A for subsequent colonization of strain B. Figure 4-1 Strain rotation scheme using phage and corresponding susceptible and Cas9-mediated resistant host strains. 112 We sought to demonstrate this approach with E. coli strains and T4-like phages that have wide host ranges (320). Section 4.2 describes our pilot mouse experiments in which we tested our best spacers at the time against bacteriophages T6 and RB15. Recognizing that we needed to further optimize several aspects of the mouse experiment, such as the level of protection conferred by Cas9 against phage infection, we began to characterize other anti-phage spacers. For reasons not yet understood, CRISPR spacers vary greatly (up to six orders of magnitude) in their ability to confer resistance against phage infection, even though the spacers target the same phage gene. In Section 4.3, we investigated whether this was due to DNA modifications found in T4-like phages. We discovered that Cas9 could provide resistance against infection by phage T4, which has all its cytosines replaced with glucosyl hydroxymethylated cytosines. Since DNA modification is one method for phage to escape host restriction, this result suggests Cas9 can overcome a variety of DNA modifications and thus provide the cell with protection against phage. To develop a cocktail of phages to use in cyclic strain replacement, we needed to characterize CRISPR spacers against different phages, which in turn required the availability of genome sequences for T4-like phages of interest. In Section 4.4, we sequenced the genomes of T4-like phages RB3, RB5, RB6, RB7, RB9, RB10, RB15, RB27, RB33, RB55, RB59, and RB68. With sequenced phages and an understanding that phage-encoded DNA modifications will not impede Cas9 activity, we returned to identifying high-activity CRISPR spacers against phages. Section 4.5 describes our development of a high-throughput library selection for highly effective spacers against several T4-like phages. These validated spacers can then be used in a follow-up mouse experiment to demonstrate phage-assisted niche replacement in vivo. 113 4.2 Phage-assisted niche depletion in the murine gut 4.2.1 Introduction In this section, we explore the feasibility of using bacteriophages to deplete a niche in the microbiota and replace it with an engineered strain that is protected against phage infection by Cas9-mediated immunity. We sought to selectively eliminate existing strains of E. coli in the mouse gut with a probiotic E. coli strain, Nissle 1917 (EcN) that is a supplement in Europe marketed under trade name Mutaflor®, and then replace EcN with an engineered EcN strain that has phage immunity and is labeled with a distinct marker. To minimize conditions that may limit phage-mediated depletion, particularly the possibility that bacteria in physically isolated microenvironments such as crypts or mucus might be shielded from phage depletion at a given time of phage introduction, we also studied the effect of supplementing the drinking water with sugars and repeating phage dosing. We conducted two mouse experiments based on the model of streptomycin treatment to eliminate facultative anaerobes such as E. coli and its competitors in the mouse gut (321). In the first study, we tested whether phages T6 and RB33 would deplete the starting EcN strain we introduced to allow for colonization of a second phage-resistant EcN strain. Phage-resistance was encoded using Cas9 and a CRISPR spacer that had activity against phage T6 and RB33. We also tested whether sugar would enhance the effect of phage predation. Since phage-susceptible bacteria can resist infection if they are in stationary phase, we tested the hypothesis that delivering phages with a nutrient source would induce cells into exponential phase and thus become vulnerable to phages. We gave mice drinking water with phages and 2% arabinose, a sugar that is preferentially utilized by E. coli strains but poorly absorbed by animals (267). In the second study, we tested the re-administration of phage to enhance depletion of phage-susceptible cells and enrichment of phage-resistant strains. 4.2.2 Materials and methods 4.2.2.1 Strain construction and verification Two plasmids were used in the study. The control plasmid contains aadA (for streptomycin/spectinomycin resistance), an SC101 origin of replication, a non-fluorescent YFP (R96A mutation) driven by the pLlacO promoter, lacIq, and the Streptococcus pyogenes Cas9 and its cognate tracrRNA. The anti-phage plasmid (denoted “!phage” hereafter), is identical to 114 the control plasmid except for two key features: YFP is fluorescent, and there is a CRISPR spacer “T6.Y” encoded by the 20 nucleotide sequence 5’-AAGAACTTCCAACCAGTAAT-3’. For proper Cas9 processing of the T6.Y CRISPR RNA, there is the JS23119 promoter upstream and S. pyogenes CRISPR repeat downstream of the spacer. For the first mouse experiment, the two plasmids were introduced into probiotic E. coli strain Nissle 1917 to construct strains “EcN control” and “EcN !phage”. In the second experiment, we transformed plasmids into E. coli B to make “EcB control” and “EcB !phage”. We verified the activity of spacer T6.Y against phages T6 and RB33 using plaque assays in both E. coli strains. In a typical plaque assay, we infected normalized densities of E. coli with equal titers of phages, mixed them in 0.6% top agar to overlay on 1.5% solid agar, and counted the number of plaques that formed. To calculate the level of protection conferred by the CRISPR spacer, we divided the number of plaques formed on a protected strain by the number of plaques formed on a susceptible strain. 4.2.2.2 Batch phage production Small-scale phage stocks were first prepared by diluting an overnight bacterial host culture at 1:100 into Luria broth, inoculating it with phage, and growing the culture for 2.5-5 hours. Phage lysates were purified by centrifugation at 8000 x g for 5 minutes at 4oC to remove cell debris. The supernatant was then filtered through a 0.45 μM membrane and stored at 4oC. To obtain highly concentrated phage for resuspension in drinking water for animal experiments, we modified a protocol described for large-scale production of T4-like phages (322). First, an overnight bacterial host culture (E. coli B) was diluted 1:500 into 1 L of LB, grown to an OD600nm of 0.2, and inoculated with 1 mL of phage stock. The cultures were grown on a shaker at 37oC for 3-5 hours, depending on how much time each bacteriophage required to clear the bacterial culture; phage T6 took 3 hours and phage RB33 4.5 hours. We periodically checked the culture turbidity during the incubation period to confirm host bacterial strain growth and subsequent phage lysis, which would be reflected by an initial increase in turbidity followed by clearance. Phage lysates were cleared by centrifugation at 4oC for 10 min at 8000 x g using 500 mL Nalgene bottles (Thermo Scientific, Waltham, MA) in a Sorvall RC-6 Plus Superspeed Centrifuge with a F12S-6x500 LEX rotor (Thermo Scientific). For cultures with substantial cell debris, we used smaller volumes in 50 mL conical tubes with a typical tabletop centrifuge (7100 x g for 8 min). Then, we passed the supernatant through a 0.45 µm filter membrane and pelleted the phages by ultracentrifugation (28,880 x g for 1 hr at 4oC) using 50 mL conical tubes in a 115 Sorvall RC-6 Plus centrifuge with a F13-14x50cy rotor (Thermo Scientific). The supernatant was gently pour off into another collection bottle and any remaining liquid removed by pipet. The small white, opaque pellets of phage were resuspended in ddH2O at 1/100 of the starting volume and stored at 4oC. To quantify the concentrated phage, we performed plaque assays with serial dilutions of the resuspended phage pellets as well as the saved supernatants. Overall, we found that our method achieves 100-1000X concentration. Of note, there were still viable phage particles in the supernatant, albeit at lower concentrations (about 1000X lower than the resuspended pellets). 4.2.2.3 Animal experiments All of the mice used in this study were handled in accordance with protocols approved by the Harvard Medical Area Standing Committee on Animals (HMA IACUC). Female C57BL/6 mice (Charles River Laboratories, Wilmington, MA; 8-12 weeks of age) were individually housed to prevent cross-contamination of bacteria and feces between cage mates. Experiments were double-blinded. To prepare bacterial E. coli for administration in the drinking water, 30 mL of cultures were grown to late exponential phase, spun down at 8,000 rpm for 5 min, washed with 10% glycerol, spun down again, and resuspended in ddH2O with appropriate antibiotics or sugars according to the animal protocol. All antibiotics and sugars used were filter-sterilized and prepared as 10X or 1000X stock solutions in ddH2O to add to the drinking water. Primer Sequence EcN_fim_f CAATGCATGGGCTGATGATTCA EcN_fim_r ATACCCTTTTTTTGAAAACTTACCGAGATC Ecoli_glcB_f ATGAGTCAAACCATAACCCAGAGC Ecoli_glcB_r ACGATTTTCTGGTGCCAGATCAT Amplicon size E. coli specificity 102 bp Nissle 153 bp Nissle, B, MG1655 Table 4-1 Primers to identify E. coli Nissle 1917. Using both sets of primers, E. coli B would only show a 153 bp band, while E. coli Nissle would show both 102 bp and 153 bp bands. Details on generating species-specific primers based on potentially species-specific genes are described in Chapter 3. 116 Mouse experiment 1 (Figure 4-2): On Day 1, mice were given streptomycin in their drinking water at a final concentration of 5 mg/mL. The next day, the water was changed to a solution of EcN control cells at a final concentration of 105 CFU/mL, supplemented with 2% sucrose and 0.1 mg/mL streptomycin. Sucrose was included to boost the palatability of the water. The lower streptomycin concentration (referred to as “low strep” hereafter) was necessary to ensure EcN cells maintained the plasmid. The drinking water was switched to simply low strep on Day 3. On Day 4, four different treatments were administered via another water change. All water bottles contained low strep and 2% sucrose, with or without additional supplements. Group A (n = 4 mice) received no additional supplements in the drinking water, Group B (n = 4 mice) received 2% L-arabinose, Group C (n = 12 mice) received 1010 PFU/mL of phages T6 and RB33, and Group D (n = 12 mice) received 2% L-arabinose and 1010 PFU/mL of phages T6 and RB33. On Day 5, EcN !phage cells were administered in the drinking water at a final concentration of 108 CFU/mL, with 2% sucrose and low strep. The water was switched back to low strep only on Day 6 and maintained for the remainder of the experiment (to Day 16). Figure 4-2 Mouse experiment 1 design to test effect of phage and/or sugar. Phages T6 and RB33 were used with or without arabinose as the sugar source. Triangles represent days of fecal pellet collection and plating. Groups A and B: n = 4 mice each. Groups C and D: n = 12 mice each. 117 Mouse experiment 2 (Figure 4-3): Streptomycin at 5 mg/mL final concentration in drinking water was re-administered to the mice a day after the first experiment ended. The next day, now Day 2 of this second experiment, the water was changed to a solution of EcB control cells at a final concentration of 105 CFU/mL with 0.1 mg/mL streptomycin. On Day 3, the water was switched to low strep only. Since we noticed a mixture of two morphologies on the platings from Day 3, we suspected that there may be remaining EcN cells in the mouse gut from the previous experiment. Using EcN-specific primers (Table 4-1) in colony PCRs, we confirmed that two out of two colonies per mouse for 15 different mice across various groups were actually EcN and not EcB. Therefore, to properly test strain replacement, we proceeded with EcN !phage cells to displace the EcN control population. On Day 4, a different treatment was administered to mice reassigned to the four different groups. Group A continued to receive low strep water, while Groups B, C, and D all received 1010 PFU/mL of phage T6. On Day 5, 108 CFU/mL of EcN !phage cells were administered in low strep drinking water to all groups. On Day 6, Groups A and B returned to receiving low strep only water, while Group C received a low re-dosing of 108 PFU/mL of phage T6 and Group D received a regular re-dosing of 1010 PFU/mL of phage T6 for the next four days, after which both groups also returned to receiving low strep only water. Figure 4-3 Mouse experiment 2 design to test effect of repeated phage dosing. Phage T6 at two different concentrations were used in the repeat dosing after introduction of EcN !phage. Triangles represent days of fecal pellet collection and plating. Groups A and B: n = 4 mice each. Groups C and D: n = 12 mice each. 118 4.2.2.4 Plating Fecal pellets were collected throughout the experiments to measure the efficacy of the initial streptomycin treatment, colonization of control cells, depletion by phages, and relative levels of control and !phage populations in the mouse gut across different treatment groups. On the day of collection, pellets were weighed, resuspended in 1 mL 10% PBS (in ddH2O), and homogenized at 4oC for an hour with a tabletop vortexer fitted with an adapter for holding multiple 1.5 mL Eppendorf tubes. Multiple dilutions were plated on MacConkey agar with 1% lactose to quantify growth of all E. coli cells, including those native to the mouse (323). MacConkey agar plates with 1% lactose, 0.1 mg/mL streptomycin, and 100 μM isopropyl β-D-1thiogalactopyranoside (IPTG) were used to quantify administered YFP- versus YFP+ cells that survived the in vivo mouse gut. 4.2.3 Results 4.2.3.1 Spacer validation We observed that the T6.Y spacer provided both E. coli Nissle 1917 and E. coli B with 6X and 67X protection against infection by phages RB33 and T6, respectively, compared to unprotected E. coli (n = 4 replicate plaque assays). We proceeded to test whether this level of protection against phage infection could allow EcN or EcB !phage strains to displace precolonized control strains under in vivo phage selective pressures. 4.2.3.2 Mouse experiment 1 with L-arabinose and phages T6 and RB33 We observed 106 CFU/g of stool of lactose-fermenting Gram-negative enteric bacilli in the endogenous microbiota across the 32 mice at Day 0 (Figure 4-4). As expected, no CFU were detectable on MacConkey lactose plates on Day 1 after the 5 mg/mL streptomycin treatment. After the introduction of 105 CFU/mL of YFP- (EcN control) cells in the drinking water on Day 2, this population expanded to 109 CFU/g of stool on Day 3 in all four groups. In Group B, where mice received only additional sugar (L-arabinose) on Day 4, the biomass of YFP- cells increased by an order of magnitude. In Groups C and D, which received phages on Day 4, YFP- cells decreased by two orders of magnitude compared to the starting biomass on Day 3. However, regardless of the treatment, there was an overall decline of biomass of E. coli across all groups, and the introduced YFP+ (EcN !phage) cells did not persist at more than 106 CFU/g of stool beyond Day 7. 119 To measure the extent of strain replacement, we calculated the ratio of YFP+ to total cells after phage and/or sugar treatments (Figure 4-5). For several mice (8 out of 12) in Group D, there was an increase in the fraction of YFP+ cells in the first two days, but YFP+ cells were no longer detectable at high ratios after the fifth day post-phage (Day 10). There was complete replacement in mice 24 and 27 for at least one day – individual biomass values are shown for those two in Figure 4-11 for added clarity. In contrast, there were only 2 out of 12 mice in Group C where there was any increase in the ratio from Day 6 to Day 7. The low ratios in Group B are a result of the YFP- bloom from the sugar only treatment. The data from Group A suggest that the when a second population is introduction, it can persist for one day, but will be nearly undetectable a few days later, with the exception of mouse 32. Figure 4-4 Biomass of YFP- and YFP+ cells in mouse experiment 1. The quantified population of YFP- (EcN control) cells and YFP+ (EcN !phage) cells are plotted as boxplots at each day of fecal pellet collection. Whiskers on each boxplot represent the minimum and maximum values. Raw data points are shown in supplementary Figure 4-8. 120 Figure 4-5 Fraction of replaced cells in mouse experiment 1. The proportion of YFP+ cells out of total cells from quantitative culturing is shown for each mouse in each group. For each mouse, the values are plotted across four fecal pellet collection times after YFP+ introduction in the drinking water (increasingly lighter shades of blue). Group A: control. Group B: sugar only. Group C: phage only. Group D: sugar and phage. 4.2.3.3 Mouse experiment 2 with repeated phage T6 dosing Since we assumed that the same 5 mg/mL streptomycin treatment would eliminate the EcN cells we introduced in mice in the first experiment, we only began plating fecal pellets after the introduction of EcB control cells in this second experiment. Given that EcB control cells were YFP-, we suspected that there were residual EcN cells when we found YFP+ cells in a few of the mice. Moreover, we noticed a mixture of small and large colonies in platings from several mice. To determine if these cells were EcB or EcN, we performed PCRs on colonies using diagnostic primer sets that could distinguish E. coli Nissle from other E. coli, and found that all colonies we characterized were indeed residual EcN control cells from the previous experiment. 121 The presence of YFP+ cells after introduction of YFP- cells only at Day 3 as well as overall loss of all E. coli over the course of the experiment confounded the interpretation of the results (Figure 4-6). We could not determine if phage re-dosing improved colonization or persistence. The most conclusive observation was that phage T6 was able to bring down YFPcells by one to two orders of magnitude in Groups B, C, and D. Since there were no colonies on many of the platings, we focused on results from individual mice with any non-zero YFP+ counts (Figure 4-10 in Supplementary section). Figure 4-6 Biomass of YFP- and YFP+ cells in mouse experiment 2. The quantified population of YFP- (control) cells and YFP+ (!phage) cells are plotted as boxplots at each day of fecal pellet collection. Whiskers on each boxplot represent the minimum and maximum values. Raw data points are shown in supplementary Figure 4-9. 122 Interestingly, YFP+ cells were able to colonize in two mice in Group A, without phage treatment. These were consistent with the calculated ratios of YFP+ to total cells (Figure 4-7). Besides mice 24 and 28, which were anomalies from the first experiment, there were only six other mice with any YFP+ cells at any time point during this experiment. YFP- and YFP+ coexist at roughly equivalent levels in mouse 32, though that was likely carried over from the previous experiment. In Group B, only mouse 27 exhibited YFP+ enrichment after the first phage dose. In Group D, only mouse 15 had YFP+ cells appear transiently during the second phage dose. Figure 4-7 Fraction of replaced cells in mouse experiment 2. The proportion of YFP+ cells out of total cells from quantitative culturing is shown for each mouse in each group across the three time points, the initial value (orange) and two time points after introduction of YFP+ cells in the drinking water (increasingly lighter shades of purple). Group A: control. Group B: phage only. Group C: phage and low phage re-dose. Group D: phage and regular phage re-dose. 123 4.2.4 Discussion In the first experiment, we have preliminary evidence that L-arabinose may enhance phage selection against phage-susceptible strains, but it appeared inconsistent across the 12 mice in Group D (Figure 4-5). Furthermore, L-arabinose placed the YFP+ cells at a lower starting biomass relative to the pre-colonized YFP- cells; there were roughly equivalent CFU/g of stool values for YFP+ and YFP- cells at Day 6 in Groups A and C, which did not receive L-arabinose, while the YFP- biomass was at least 10X lower at Day 6 in Groups B and D, which received Larabinose (Figure 4-4). And the sugar accelerated the loss of YFP- cells in Group D compared to Group C, at Days 10 and 16. In the second mouse experiment, it is unclear whether repeated phage dosing enhanced the colonization of YFP+ cells, since only one mouse from Group B (without repeat phage) and one mouse from Group D (with repeated phage at full dose) exhibited new YFP+ enrichment. From these studies, we have learned valuable lessons about how to carry out future mouse experiments. First, we need to address the overall decline in the biomass of introduced E. coli after we drop the concentration of streptomycin from 5 mg/mL to 0.1 mg/mL, which reflects the recolonization of endogenous gut flora (324). We chose to maintain streptomycin in the drinking water to ensure the engineered E. coli maintained our plasmid, yet we could not continue with the 5 mg/mL concentration to keep endogenous gut microbiota in check because the level of streptomycin resistance conferred by aadA on our plasmid was 0.05 to 0.2 mg/mL. We also measured that 0.025 mg/mL streptomycin was too low to select against E. coli without the aadA plasmid. For follow-up studies, we have already used recombineering to construct a streptomycin-resistant EcN strain with the well-documented rpsL mutation (K43R; AAA to AGA) that confers high levels of streptomycin resistance in E. coli (325). In fact, we have confirmed that this EcN strR strain can grow in 5 mg/mL of streptomycin. Besides improving streptomycin resistance of our strains and preventing biomass loss in the streptomycin-treated mouse model, we also need to characterize more active CRISPR spacers against the phages we use in the animal experiment. Spacer T6.Y only confers at most 100X protection against infection by phage T6; this level of immunity is three to four orders of magnitude lower than the high-activity spacers we later characterized for phage T4 (Section 4.3). Using a high-throughput library screen (see Section 4.5) for effective anti-phage spacers, we have validated six spacers against phage T6 that are four orders of magnitude more protective than T6.Y. Using these new spacers, we can increase the phage resistance of our YFP+ EcN strain, which should in turn improve its fitness advantage under phage selection to more stably colonize the murine gut. 124 4.2.5 Acknowledgements This work was supported by the Wyss Institute for Biologically Inspired Engineering. SJY was also supported by a National Science Foundation Graduate Research Fellowship and KME by the Wyss Technology Development Fellowship. We thank Amanda Graveline and Andyna Vernet for assistance and training with mouse experiments. 125 4.2.6 Supplementary figures Figure 4-8 Raw data points from mouse experiment 1. Dots represent the mean for YFP+ (blue) or YFP- (orange) across mice for that day. Group A: control. Group B: sugar only. Group C: phage only. Group D: sugar and phage. 126 Figure 4-9 Raw data points from mouse experiment 2. Dots represent the mean for YFP+ (blue) or YFP- (orange) across mice for that day. Group A: control. Group B: phage only. Group C: phage and low phage re-dose. Group D: phage and regular phage re-dose. 127 Figure 4-10 Individual mouse data from experiment 2. M# represents the mouse number, assigned to Group A/B/C/D labeled above. Group A: control. Group B: phage only. Group C: phage and low phage re-dose. Group D: phage and regular phage re-dose. 128 Figure 4-11 Raw data points for mice #24 and #27. The top two graphs are from mouse experiment 1, in which mice 24 and 27 were in Group D (sugar and phage). The bottom two graphs are from mouse experiment 2, in which mouse 24 was assigned to Group C (phage and low phage re-dose) and mouse 27 to Group B (phage only). 129 4.3 CRISPR/Cas9-mediated phage resistance is not impeded by T4 DNA modifications This section has been adapted from: Stephanie J. Yaung, Kevin M. Esvelt, George M. Church. CRISPR/Cas9-mediated phage resistance is not impeded by the DNA modifications of phage T4. PLOS ONE 9(6):e98811 (2014). Ref. (326) 4.3.1 Abstract Bacteria rely on two known DNA-level defenses against their bacteriophage predators: restriction-modification and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-CRISPR-associated (Cas) systems. Certain phages have evolved countermeasures that are known to block endonucleases. For example, phage T4 not only adds hydroxymethyl groups to all of its cytosines, but also glucosylates them, a strategy that defeats almost all restriction enzymes. We sought to determine whether these DNA modifications can similarly impede CRISPR-based defenses. In a bioinformatics search, we found naturally occurring CRISPR spacers that potentially target phages known to modify their DNA. Experimentally, we show that the Cas9 nuclease from the Type II CRISPR system of Streptococcus pyogenes can overcome a variety of DNA modifications in Escherichia coli. The levels of Cas9-mediated phage resistance to bacteriophage T4 and the mutant phage T4 gt, which contains hydroxymethylated but not glucosylated cytosines, were comparable to phages with unmodified cytosines, T7 and the T4-like phage RB49. Our results demonstrate that Cas9 is not impeded by N6-methyladenine, 5-methylcytosine, 5-hydroxymethylated cytosine, or glucosylated 5-hydroxymethylated cytosine. 4.3.2 Introduction Bacteria utilize an assortment of anti-phage defense mechanisms, including two that act at the nucleic acid level: restriction-modification and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-CRISPR-associated (Cas) systems. Some bacteriophages have developed extensive modifications to their DNA that enable them to evade host restriction endonucleases. For example, phage T4 replaces each cytosine with hydroxymethylated cytosine (hmC), then glucosylates the hydroxymethyl group to form glucosylated hmC (ghmC) (327). The bound glucose shelters the phage genome from the host’s modified cytosine restriction systems, McrA, McrBC, and Mrr, which recognize methylcytosines and hmCs but not ghmCs (328). 130 CRISPR-Cas systems also function as endonucleases, though unlike restriction enzymes, their recognition sites are programmable by CRISPR RNAs (crRNAs) (329). As an adaptive immune system, CRISPR-Cas components incorporate fragments of DNA from invading viruses or plasmids into arrays composed of spacers interspersed with repeats on the genome (330, 331) . In Type II CRISPR systems, transcribed arrays are processed into crRNAs that form a complex with the RNA-guided Cas9 nuclease and a trans-activating RNA (tracrRNA) (299). The crRNA guides the complex to double-stranded DNA “protospacer” sequences that match the sequence of the spacer and are flanked by a “protospacer adjacent motif” (PAM) unique to the CRISPR system (319). If spacer-protospacer base-pairing is a close match, Cas9 cuts both strands of DNA, often eliminating the plasmid or phage. We sought to determine whether various DNA modifications known to block restriction systems can similarly impede CRISPR-Cas defenses. 4.3.3 Materials and methods 4.3.3.1 Bioinformatics search We derived a list of 1749 unique spacers from several sources: 49 E. coli strains with CRISPR structures in the CRISPRdb database (http://crispr.u-psud.fr/crispr/, (332)), 72 strains in the ECOR collection (333), 263 strains isolated from humans or animals in various regions of France (334), and 194 Shiga toxin-producing E. coli (STEC) strains (335). CRISPR array sequences were processed in CRISPRfinder (http://crispr.u-psud.fr/Server/, (336)) to extract spacer sequences. We performed BLASTn searches (http://blast.ncbi.nlm.nih.gov/, (286)) with a word size of seven optimized for short sequences and an E-value of less than 0.1, which corresponded to roughly at least 14 matched nucleotides in the T2/T4/T6 genomes search and at least 17 matched nucleotides in the all T4-like genomes search. We screened hits by first looking for a concentration of exact nucleotide matches at the 5’ end, which would be consistent with a sevennucleotide “seed” region that does not tolerate mismatches (337). Outside the seed sequence, at least five mismatches are tolerated (337), though the upper limit of tolerable mismatches has not been characterized in the E. coli CRISPR system. We then checked for a properly oriented E. coli Type I-E CRISPR PAM such as AAG, ATG, AGG, and GAG in the targeted sequence. 4.3.3.2 Bacterial strains and plasmid construction In addition to wild-type E. coli K-12 MG1655 and E. coli B, we used methyltransferasedeficient (dam–/dcm–) E. coli K-12 (ER2925, New England Biolabs, Ipswich, MA) and 131 restriction-deficient (mcrA– mcrBC– mrr– hsdR–) E. coli K-12 (ER1821, New England Biolabs). E. coli were grown at 37oC in LB broth and supplemented with antibiotics as needed at final concentrations of 100 μg/mL spectinomycin, 30 μg/mL chloramphenicol, 300 μg/mL erythromycin, and 100 μg/mL carbenicillin. Cells expressing SpCas9 were constructed by transforming in DS-SPcas (Addgene plasmid 48645, (303)), which encodes SpCas9 and its cognate tracrRNA on a backbone with a cloDF13 origin of replication and aadA gene. In the dam/dcm methylation studies, we assembled a compatible protospacer plasmid encoding all five of the target sequences with their PAMs; we placed the control, dam1, and dcm1 sequences after a pBR322 origin of replication, and the dam2 and dcm2 sequences after a bla gene. In the T7 infection assays, the spacer was expressed on DS-SPcas such that there was no separate spacer plasmid. In all other experiments, we maintained the designed spacer on a separate plasmid (based on PM-SP!TB, Addgene plasmid 48650, (303)) that expressed one spacer followed by the SpCas9 repeat on a backbone with a p15a origin of replication and cat gene. When a different resistance marker was needed, we switched cat with EryR. 4.3.3.3 Bacteriophage strains and propagation Phage T7 stock was propagated in E. coli K-12 MG1655 and RB49 stock (obtained from H. M. Krisch) propagated in E. coli B. Wild-type T4 stock was propagated in E. coli K-12 MG1655. Phage T4 gt (a gift from New England Biolabs) is T4 α-gt57 β-gt14, which does not have functional α- and β-glucosyltransferases (338). Because the E. coli restriction system recognizes and cleaves hmC, preventing T4 gt from plaquing efficiently, we conducted all experiments involving this phage in the restriction-deficient E. coli K-12 host ER1821. In phage stock preparation, an overnight bacterial host culture was diluted 1:100 in LB, inoculated with phage, and grown for 2.5-5 hours (during which the turbidity of cultures rose and then fell due to lysis). The lysates were spun down at 8000 x g for 5 minutes at 4oC to remove cell debris. The supernatant was filtered through a 0.45 μM membrane and stored at 4oC. 4.3.3.4 Transformation assays We prepared protospacer and spacer plasmids from a dam+/dcm+ strain, NEB Turbo (New England Biolabs), and performed transformation assays using E. coli K-12 MG1655 bacteria containing the protospacer plasmid and DS-SPcas. After transforming equimolar amounts of each spacer plasmid and selecting for all three plasmids (DS-SPcas, protospacer, and spacer), we quantified the number of transformants relative to a transformed spacer plasmid that did not target the protospacer plasmid. We also reversed the transformation order for one set of 132 experiments; that is, we transformed the protospacer plasmid into E. coli already carrying DSSPcas and each spacer plasmid. We observed comparable numbers of transformants regardless of order. We repeated the same transformations in methyltransferase-deficient E. coli K-12 using equimolar unmethylated protospacer and spacer plasmids, which were prepared from E. coli K12 dam–/dcm–. Again, for one set of experiments, we reversed the transformation order and noted similar numbers of transformants. 4.3.3.5 Plaque assays and efficiency-of-plating calculations To characterize the level of phage resistance conferred by Cas9, we infected normalized densities of protected E. coli with equal titers of phages and counted the number of plaques. Equal cell densities were obtained by diluting an overnight culture and normalizing to an OD600nm of 0.3 after several hours of growth. We added 2 μL of phage to 120 μL of cells, mixed them thoroughly in 1 mL of 0.6% top agar with appropriate antibiotics within 20 minutes, and poured the mixture onto 3 mL of 1.5% solid agar. Independent experiments were performed with different phage dilutions. To calculate an efficiency of plating (EOP), we divided the phage titer from plating the phage on a protected strain by the phage titer from plating the phage on a susceptible wild-type strain. 4.3.4 Results 4.3.4.1 Natural spacers target phages with modified DNA We began by attempting to discover naturally acquired spacers in bacteria that target phages known to contain modified DNA. Only a handful of phage families have been identified with completely modified DNA, including Bacillus subtilis phage PBS2, Synechococcus elongates phage S2L, and Escherichia coli phage T4 (339). Since CRISPR-Cas systems and phages of E. coli have been better studied than those of the other bacterial hosts, we focused on 1749 unique E. coli spacers in available array sequences from the ECOR collection, Shiga toxinproducing E. coli (STEC), and other databases. Upon searching for candidate protospacers in phages T2, T4, and T6, all of which contain ghmC DNA (327), we found one hit that matched 25 of 32 nucleotides in T2’s gene 38, although this spacer was only found in one human-associated E. coli (Figure 4-12A). In an expanded search including T4-like phages, we identified another hit with 29 nucleotides matching phage CC31’s gp35 (Figure 4-12B). CC31 is the only known non-T-even type phage with predicted glucosyltransferase genes (340), which are required for generating ghmC from hmC. This spacer was found in many different E. coli isolates. 133 Figure 4-12 Native E. coli spacers target phage with modified DNA. In a BLASTn search, 1749 unique spacers from sequenced E. coli CRISPR arrays were queried against T4-like phage genomes. (A) Spacer S641 matches 25 of 32 nucleotides in phage T2. The putative protospacer has a permissible E. coli CRISPR PAM AAG and the matching nucleotides are concentrated at the 5’ end as a seed sequence. The spacer originated from the CRISPR1 locus of E. coli strain 579, a human-associated isolate from France. (B) Spacer S134 matches 29 of 32 nucleotides in phage CC31. While the protospacer in phage CC31 has five nucleotides inserted in the center of the sequence, there are 15 exactly matched nucleotides at the 5’ end in addition to 14 matched nucleotides after the insertion. The PAM GAG and strongly matched seed region suggest it is a plausible E. coli CRISPR target. This spacer was found in several strains, including E. coli C str. ATCC 8739, ECOR strains 17 through 21, one farm pig and two human fecal samples in France, duck and cattle fecal samples in Australia (341), and enterotoxigenic E. coli (ETEC) strain UMNK88. The spacer and matching protospacer are in blue, the transcribed CRISPR RNA (crRNA) in bold black, and PAM sequence in red. The potential presence of natural spacers targeting phage with modified DNA suggests that CRISPR-Cas systems may overcome this form of phage defense. To test this hypothesis, we explored the extent to which the Type II-A Streptococcus pyogenes Cas9 (SpCas9), the most commonly used CRISPR-Cas system for genome engineering, is able to cleave various forms of modified DNA. 134 4.3.4.2 Cas9 cuts N6-methyladenine and 5-methylcytosine in E. coli DNA adenine methyltransferase (dam) methylates the adenine in 5’-GATC-3’, while DNA cytosine methyltransferase (dcm) methylates the internal cytosine in 5’-CCTGG-3’ and 5’CCAGG-3’ in E. coli. We designed target sequences containing one to two dam or dcm sites as well as a control target sequence with no methylation sites (Figure 4-13A). We prepared spacer and protospacer plasmids from a dam+/dcm+ strain and selected for the coexistence of each spacer and its targeted protospacer in transformation assays using dam+/dcm+ cells expressing SpCas9. All targeted sequences yielded 102 to 103 fewer transformants than the non-targeted control regardless of whether they contained dam or dcm methylation sites (Figure 4-13B). We observed similar values in methyltransferase-deficient (dam–/dcm–) E. coli K-12, in which all plasmids were prepared from a dam–/dcm– strain and were thus unmethylated. Overall, we detected no difference in Cas9 activity on adenine-methylated, cytosine-methylated, and unmethylated target sequences. These results are consistent with reports showing adenine methylation does not affect CRISPR-mediated phage resistance in Streptococcus thermophilus (342) and cytosine methylation does not affect SpCas9 activity on sequences with CpG sites in human cells (343). Figure 4-13 Cas9 cuts methylated cytosines and adenosines in E. coli. (A) Synthetic targets were designed to contain one to two dam (orange) or dcm (blue) sites. A control unmethylated sequence (+) was included. The PAM sequence NGG for SpCas9 recognition is underlined. (B) In serial transformations, we selected for the coexistence of DS-SPcas, the protospacer plasmid, and each spacer plasmid. The number of transformants was divided by the number of colonies resulting from a control transformation using a spacer plasmid (-) that did not target the protospacer plasmid. This relative number of transformants is plotted for E. coli K-12 and E. coli K-12 dam–/dcm– from three independent experiments. Lines represent the median. 135 4.3.4.3 Cas9 provides resistance against phages T7 and T4-like RB49 with unmodified DNA We next tested the ability of SpCas9 to provide resistance to lytic phages without DNA modifications by constructing spacers against phages T7 and RB49, neither of which contains modified DNA. RB49 is a T4-like phage that is missing hydroxymethylase and βglucosyltransferase, which are required for modifying cytosine to hmC and hmC to β-ghmC, respectively (344). We designed four spacers: two targeting the gene encoding the primase/helicase enzyme of T7 (Figure 4-14A) and two targeting the gene encoding the major capsid protein of RB49 (gp23), which is one of the most conserved regions across T-even phages (344) (Figure 4-14C). We transformed each spacer-encoding plasmid into SpCas9-expressing E. coli K-12 MG1655 and E. coli B to create strains protected from T7 and RB49 infection. We challenged these strains with phage to calculate an efficiency of plating (EOP) compared to unprotected strains; representative plaque plates are included (Figure 4-14A and Figure 4-14C). In E. coli B, T7 had an EOP of 10-3 on cells expressing spacer 1 or 2 relative to cells without spacers (Figure 4-14B). In E. coli K-12, spacer 1 reduced sensitivity to T7 infection by four orders of magnitude, though spacer 2 only lowered sensitivity by one order of magnitude for unknown reasons. RB49 had an EOP of 10-6 on E. coli B with spacer 1 or 2, and an EOP of 10-5 on E. coli K-12 with spacer 1 or 2 (Figure 4-14D). The decreased plaquing efficiencies of T7 and RB49 on protected strains reflect Cas9 activity against invading unmodified phage DNA. 136 Figure 4-14 Cas9 reduces E. coli susceptibility to phages T7 and RB49. (A) Spacers against T7 were targeted against the primase/helicase gene (gene 4A and 4B). The PAM is underlined in the sequence and shown as a black box in the diagram showing the orientation and location of the protospacer (white box) on the gene. In a representative T7 plaque assay of protected and unprotected strains, there is substantial lysis on wild-type (wt) E. coli K-12, visible plaquing on cells with spacer 2 (sp 2), and no plaques on cells with spacer 1 (sp 1). (B) The efficiency of plating of T7 was calculated for each protected strain relative to the unprotected wildtype strain. Independent replicates of E. coli B (n = 4, 3, 3) and E. coli K-12 (n = 5, 5, 7) are plotted. Lines represent the median. (C) Spacers against RB49 were constructed against the major capsid protein (gp23). In a typical RB49 plaque assay, there is notable lysis on wild-type E. coli B, some plaques on cells with spacer 1, and a few plaques on cells protected with spacer 2. (D) The efficiency of plating of RB49 was quantified for each protected strain relative to the unprotected wild-type strain. Shown are independent replicates of E. coli B (n = 5, 3, 3) and E. coli K-12 (n = 3, 3, 3). Lines represent the median. 137 4.3.4.4 Cas9 provides resistance against mutant phage T4 with hmC DNA and wild-type T4 with ghmC DNA Having established that Cas9 can confer resistance against non-modified phage, we proceeded to challenge it with T4 phage containing either hmC or ghmC DNA. During replication, wild-type T4 synthesizes hmC, which contains a hydroxymethyl group attached to the C5 position of cytosine, by using hydroxymethylated dCTP serially converted from dCTP (345). Then phage-encoded glucosyltransferases add a glucose group to the hydroxymethyl group in α- or β-configuration (346) (Figure 4-15A). To investigate Cas9 activity against T4 without glucosylated DNA, we included mutant phage “T4 gt”, which has hmC rather than ghmC due to non-functional glucosyltransferases (338). By using restriction enzymes with varying sensitivity to modified cytosines (according to REBASE, http://rebase.neb.com/), we confirmed that our stocks of phage T4 had ghmC, phage T4 gt had hmC, and phage RB49 did not have ghmC or hmC (Figure 4-16). Since T4 gp23 is homologous to gp23 from RB49, we modified our two spacers against RB49 to match the sequences of T4, and also designed an additional spacer Figure 4-15B). We tested these spacers using efficiency-of-plating experiments as before; representative plaque plates are shown (Figure 4-15C). Assays involving T4 gt used restriction-less E. coli K-12 because wild-type K-12 restricts hmC DNA; the EOP of T4 gt on E. coli K-12 MG1655 is 10-4 compared to T4 on MG1655 (Figure 4-17). In restriction-less E. coli K-12, T4 gt exhibited an EOP of 10-6 to 10-5 on cells carrying any one of the three spacers (Figure 4-15D). Wild-type T4 displayed an EOP of 10-5 on E. coli K-12 MG1655 with spacers 1 or 2, and an EOP of 10-3 on cells expressing spacer 3. On E. coli B with any three spacers, T4 had an EOP of 10-6 to 10-4. As the difference in EOP values for both T4 gt and wild-type T4 phages were comparable to those of the non-modified T4-like phage RB49, our results demonstrate that SpCas9 is not impeded by hydroxymethylation or glucosyl-hydroxymethylation of phage DNA. 138 Figure 4-15 Cas9 reduces E. coli susceptibility to phages T4 and T4 gt. (A) The structures of cytosine and modified cytosines are shown. T4 gt has 100% hydroxymethylated cytosines (hmCs). T4 has 100% glucosyl-hydroxymethylated cytosines (ghmCs), specifically 70% α- and 30% β-ghmCs. The ghmC structure shown is in the β-configuration. (B) Spacers against T4 were also designed against the major capsid protein (gp23), which is homologous to that of RB49. For comparison, the RB49 protospacers are aligned below in italics, where dots indicate identical nucleotides. In the T4 sequences, the PAM is underlined. The PAM (black box) and protospacer (white box) are represented on the gene. (C) In a typical plaque assay with T4 gt (left plate), there was complete lysis on wild-type (wt) restriction-less (r-l) E. coli K-12 and few plaques on cells with spacers 1, 2, or 3 (sp 1, sp 2, or sp 3). In an assay with T4 (right plate), there was complete lysis on wild-type E. coli K-12 MG1655, numerous plaques on cells with spacer 1 or 3, and about a dozen on spacer 2. (D) The efficiency of plating of T4 and T4 gt was quantified for each protected strain relative to the unprotected wild-type strain. Independent replicates of restriction-less E. coli K-12 (n = 5, 3, 3, 5), E. coli K-12 (n = 4, 4, 5, 6), and E. coli B (n = 5, 3, 3, 3) are plotted. Lines represent the median. 139 Figure 4-16 Restriction digest of phages. Phage DNA was extracted by using the Qiagen Blood and Tissue Kit on 200 μL of phage stock. 10 or 20 U of each enzyme (1 μL) was added to 5 μL of 10X CutSmart Buffer (NEB) in a 50 μL reaction volume containing approximately 100 ng of phage RB49 or T4 DNA, or 800 ng of T4 gt DNA. The reactions were incubated at 37oC for 4 hours before visualizing on a 1% agarose gel stained with SYBR Gold. As expected, DraI cuts all RB49, T4 gt, and T4; HpaII and NheI are sensitive to methylated cytosines and only cut RB49; and XbaI has 50% activity on hmC and partially cuts T4 gt. Blue text denotes cutting. 140 4.3.5 Discussion Our discovery that S. pyogenes Cas9 is insensitive to methylation, hydroxymethylation, and glucosyl-hydroxymethylation renders it unique among current genome-targeting technologies, as both zinc-fingers (ZFs) and transcription activator-like (TAL) effectors can be engineered to discriminate 5-methylcytosine from cytosine (347, 348). This difference may be useful for biotechnological applications. In our bioinformatics search for candidate natural spacers, we were only able to identify two possible sequences against T4-like phages. This type of bioinformatics search is hampered by the currently limited knowledge of specificity and tolerability of mutations in both the acquisition and interference stages of CRISPR systems. While this paper was under review, Fineran et al. published a report exploring the robustness of the E. coli CRISPR system, in which degenerate target regions with up to 13 mutations in the protospacer and PAM can promote “priming,” a positive-feedback mechanism to incorporate new spacers based on mutated or outdated spacers (349). This suggests more lenient bioinformatics searches would be allowable. Furthermore, our search is limited by available sequences of E. coli and phages known to modify their DNA, as well as the possibility that these isolates do not encounter T4-like phages in their environments. Future searches may provide additional evidence of CRISPR-based immunity to DNA-modifying phages. Interestingly, we observed that different spacers conferred differing levels of resistance against phage infection. Since mutations in the protospacer or PAM can allow phage to escape (350, 351), we sequenced Cas9-targeted regions of plaques that appeared on protected strains. Indeed, T4 and T7 plaques on protected E. coli had mutated one nucleotide in the PAM, or one to two nucleotides in the protospacer (Table 4-2). Less effective spacers may be targeting sequences that are more readily mutated, though we cannot rule out the non-mutually exclusive possibility that Cas9 acts more slowly on certain sequences and thus allows phage-induced lysis to outpace Cas9-enabled protection. In S. thermophilus CRISPR1 and CRISPR3 systems, the uncut phage genome can still be observed in bacteriophage-insensitive mutants (352, 353). Further investigation of how some but not all phage DNA molecules escape Cas9 cutting during phage infection is needed. While phages may inactivate CRISPR proteins (354) or encode their own CRISPR-Cas systems (355), we have demonstrated that DNA modifications that normally circumvent bacterial restriction systems do not impede Type II CRISPR systems. Our findings may help explain why DNA modifications remain uncommon among bacteriophages characterized to date whereas nearly half of bacteria have CRISPR structures (332). 141 Figure 4-17 Efficiency of plating of T4 gt on wild-type E. coli K-12. Calculated relative to either T4 infecting E. coli K-12 or T4 gt infecting restriction-less E. coli K-12, phage T4 gt forms plaques on E. coli K-12 at four orders of magnitude less efficiently (red data points). As a general comparison of restriction-modification versus Cas9-mediated protection, Cas9 provides around an order of magnitude greater resistance to phage infection on average, though the level of resistance varies by sequence (blue data points). Independent replicates (n = 11, 5, 5, 5, 4, 15) are plotted; lines represent the median. Cas9+ data were compiled from experiments with various spacer sequences as described in the text. 142 Phage Target Spacer-PAM sequence Mutation in phage Host T4 spacer 1 ATATCGAAAGCAATCAGGTTAGG ATATCGAAAGCAATCACGTTAGG ER1821 T4 gt spacer 1 ATATCGAAAGCAATCAGGTTAGG ATATCGAAAGCAATCAGGTTAGC ER1821 T4 spacer 2 AAGAACTTCCAACCGGTAATGGG AAGAACTTCCAACCGGTAATGGC MG1655 T4 spacer 2 AAGAACTTCCAACCGGTAATGGG AAGAACTTCCAACCGGTAATGGC MG1655 T4 spacer 3 GATGCTGATGCTGAACTGTCTGG GATGCTGATGCTGAACTGTCTGA MG1655 T4 spacer 3 GATGCTGATGCTGAACTGTCTGG GAAGCTGATGCTGAACTGTCTGG MG1655 T4 gt spacer 3 GATGCTGATGCTGAACTGTCTGG GATGCTGATGCTGAACTGTCTGT ER1821 T7 spacer 1 TTCGGGAAGCACTTGTGGAATGG TTCGGGAAGCACTTGTGGAATGT MG1655 T7 spacer 1 TTCGGGAAGCACTTGTGGAATGG TTCGGGAAGCACTTGTGGAATTG MG1655 T7 spacer 2 GATGCTTGAGGAGTCCGTTGAGG GATGCTTGAGGAGACCGCTGAGG MG1655 T7 spacer 2 GATGCTTGAGGAGTCCGTTGAGG GATGCTTGAGGAGACCGCTGAGG MG1655 T7 spacer 2 GATGCTTGAGGAGTCCGTTGAGG GATGCTTGAGGAGACCGCTGAGG B T7 spacer 2 GATGCTTGAGGAGTCCGTTGAGG GATGCTTGAGGAGACCGCTGAGG B Table 4-2 Phage escapee analysis. 13 plaques that formed on Cas9-protected host E. coli strains were sequenced at targeted sites to identify mutations. PAM sequences are underlined. Mutations are bolded and double-underlined. 4.3.6 Acknowledgements This work was supported by US Department of Energy grant DE-FG02-02ER63445 (to GMC) and the Wyss Institute for Biologically Inspired Engineering. SJY was supported by a National Science Foundation Graduate Research Fellowship and KME by the Wyss Technology Development Fellowship. 143 4.4 Complete genome sequences of 11 T4-like bacteriophages This section has been adapted from: Stephanie J. Yaung, Kevin M. Esvelt, George M. Church. Complete Genome Sequences of T4like Bacteriophages RB3, RB5, RB6, RB7, RB9, RB10, RB27, RB33, RB55, RB59, and RB68. Genome Announcements 3(1):e01122-14 (2015). Ref. (356) 4.4.1 Abstract T4-like bacteriophages have been explored for phage therapy and are model organisms for phage genomics and evolution. Here we describe the sequencing of 11 T4-like phages. We find high nucleotide similarity among T4, RB55, and RB59; RB32 and RB33; and RB3, RB5, RB6, RB7, RB9, and RB10. 4.4.2 Genome announcement Complete sequences of T4-like myoviruses would enhance studies of phage evolution and genomics as well as biotechnology applications involving phage cocktails. In this study, we sequenced RB3, RB5, RB6, RB7, RB9, RB10, RB27, RB33, RB55, RB59, and RB68. “RB” phages were originally isolated by Rosina Berry in 1964 from six sewage treatment plants in Long Island, New York for studies on T-even phage speciation (357). We prepared phage lysates as previously described (326) from host Escherichia coli B (CGSC 5365), extracted DNA with the Phage DNA Isolation Kit (Bio-world, Dublin, OH), and sequenced the samples as paired-end 250 bp reads on the MiSeq instrument (Illumina, San Diego, CA). 789,300 (RB6) to 3,932,449 (RB7) paired reads were generated per sample. On average, 82.8% pairs survived quality control and trimming with Trimmomatic (358). Insert sizes were ~330 bp; the median coverage of sequenced phages was 2,966 X, ranging from 259 X (RB55) to 6,985 X (RB7). We performed de novo assembly using Velvet (359) version 1.2.08 with k-mer lengths of K51, K57, and K63, and were able to obtain a single ~168 kbp contig from at least one of the assemblies. We used Geneious version 7.1.7 for post-assembly processing and filled any assembly gaps by iterative mapping of reads to the scaffold. The circularly permuted linear double-stranded DNA genomes of the 11 RB phages have lengths of ~168 kbp. Approximately 270 open reading frames (ORFs) per phage were predicted with Glimmer 3 (360). Annotations were transferred from published genomes of T4 and T4-like phages with at least 98% similarity. Remaining ORFs were annotated by lowering the similarity 144 cutoff to 70% or performing BLAST searches (286). Eight to ten tRNAs were predicted in each genome by tRNAscan-SE 1.21 (http://lowelab.ucsc.edu/tRNAscan-SE/, (361)). Following the convention in T4-like phages, we oriented completed genomes to start with rIIA. The sequenced phages share similar genome organization and nucleotide identity. Using progressiveMauve alignment (362), we found that RB7, RB27, RB33, and RB68 are 73-86% similar to one another, and are ~75% identical to T4. Furthermore, RB33 shares 99.93% similarity with RB32. RB55 and RB59 are 99.8% similar to T4 and are 99.96% identical to each other. We noted a high nucleotide similarity (99.99%) amongst RB3, RB5, RB6, RB7, RB9, and RB10. RB5 differs from RB6 by four bases (one nonsynonymous, one synonymous, and two intergenic); the nonsynonymous difference occurs in the baseplate wedge subunit and tail pin, gene product 11 (gp11). RB7 and RB9 differ by three nucleotides (two nonsynonymous and one intergenic); the two nonsynonymous bases are in the baseplate hub subunit tail length determinator (gp29) and hypothetical protein NrdC.4. The extent to which these differences affect host range is unclear given limited data on the total number but not exact profile of susceptible E. coli strains within the ECOR collection for each phage (320). Relationships between genome and host range variation could provide insights into mechanisms of host specificity. Nucleotide sequence accession numbers. Genome sequences have been deposited in GenBank. Accession numbers are listed in Table 4-3. KM606994 Genome size (bp) 168,402 Coverage (X) 2,831 No. of CDSs 273 No. of tRNAs 10 Enterobacteria phage RB5 KM606995 168,394 3,449 271 10 Enterobacteria phage RB6 KM606996 168,394 1,474 271 10 Enterobacteria phage RB7 KM606997 168,395 6,985 272 10 Enterobacteria phage RB9 KM606998 168,395 2,826 272 10 Enterobacteria phage RB10 KM606999 168,401 2,798 272 10 Enterobacteria phage RB27 KM607000 165,179 2,966 271 10 Enterobacteria phage RB33 KM607001 166,007 3,355 274 8 Enterobacteria phage RB55 KM607002 168,896 259 272 8 Enterobacteria phage RB59 KM607003 168,966 3,158 276 8 Enterobacteria phage RB68 KM607004 168,401 3,187 276 9 Strain Accession no. Enterobacteria phage RB3 Table 4-3 Genome features of the sequenced strains 145 4.4.3 Acknowledgements This work was supported by NSF Small-Business/ ERC Collaborative Opportunity grant IIP-1256446 to Gingko Bioworks and GMC. SJY was supported by a NSF Graduate Research Fellowship and KME by the Wyss Technology Development Fellowship. DNA preparation and sequencing were completed at the Molecular Biology Core Facilities of the Dana-Farber Cancer Institute. Analysis was performed on the Orchestra cluster supported by the Harvard Medical School Research Information Technology Group. We thank Henry M. Krisch for the phages. 146 4.5 Generating effective CRISPR spacers against bacteriophages 4.5.1 Introduction In this section, we describe a high-throughput library selection for effective spacers against T4-like phages. To demonstrate the approach, we focused on a subset of phages, particularly T6, RB15, RB33, and RB69, which infect a large number of E. coli strains ((320) and Figure 4-18). We included RB69 since it shares less sequence similarity with the other phages in this study (Table 4-4). Furthermore, we were interested in testing different phages, because we noticed that a highly effective spacer (with an efficiency of plating, or EOP, less than 10-4) against one phage could be similarly effective at a homologous region in another phage. For example, based on the T4 spacer 2 we characterized in Section 4.3, which we will call spacer T4.Y in this section, we discovered that homologous spacers in phages RB49 and RB69 were similarly effective (Figure 4-19). Figure 4-18 Host range of T4-like phages. The graph depicts the number of strains each phage could infect, based on 72 strains in the ECOR collection and 4 laboratory E. coli strains. Adapted from Ref. (320). T6 RB15 RB33 RB69 82% 80% 80% RB33 97% 96% RB15 97% Table 4-4 Pairwise similarity of phages T6, RB15, RB33, and RB69. Values were calculated using BLAST (megablast default settings) at http://blast.ncbi.nlm.nih.gov/ (286). 147 Figure 4-19 Spacer Y confers protection in phages T4, RB49, and RB69. Sequence differences from the T4.Y spacer are bolded for spacers RB49.Y and RB69.Y. All three target a homologous region in the major capsid protein (gp23) gene. Nevertheless, we found many ineffective (EOP ~ 10-2 to 1) spacers. Given the immense variation in spacer activity against phage infection (Figure 4-21), we tested a large library of spacers against T4-like phages to identify effective anti-phage activity. In developing a selection method, we observed inconsistent results using chemostats and batch culture. Thus, we used phage-embedded soft agar that allows for isolation of more effective spacers. After validating the approach with a mock selection experiment, we proceeded to construct a library of over 12,000 spacers targeting phages T6, RB15, RB33, and RB69. Using high-throughput sequencing to determine which spacers were enriched, we were able to identify and confirm top spacers against each phage. 4.5.2 Materials and methods 4.5.2.1 Strains and constructs Phage T4 was obtained from the Coli Genetic Stock Center (CGSC); phage T6 from DSMZ; and phages RB15, RB33, and RB69 from H. M. Krisch. Wild-type E. coli K-12 MG1655 and E. coli B (CGSC 5365) were used for phage propagation and selection experiments. Methods for phage propagation and plaque assays to determine the level of Cas9-mediated resistance to phage infection were conducted as described in Section 4.3. Highly competent E. coli NEB Turbo (New England Biolabs, Ipswich, MA) were used for plasmid library construction. In general, E. coli were grown at 37oC in LB broth and supplemented with antibiotics as needed at final concentrations of 100 μg/mL spectinomycin, 30 μg/mL chloramphenicol, and 100 μg/mL carbenicillin. 148 Cells expressing SpCas9 were constructed by transforming in DS-SPcas (Addgene plasmid 48645, (21)), which encodes SpCas9 and its cognate tracrRNA on a backbone with a cloDF13 origin of replication and aadA gene. We maintained the designed spacer on a separate plasmid (based on PM-SP!TB, Addgene plasmid 48650, (21)) that expressed one spacer followed by the SpCas9 repeat on a backbone with a p15a origin of replication and cat gene. For most of our experiments, we used the single guide RNA (sgRNA) form, instead of the dual RNA format (with a tracrRNA and a crRNA expressed from the spacer and repeat). In these cases we removed the tracrRNA portion from DS-SPcas to construct plasmid DS-cas9-nt. We also swapped the spacer and repeat on PM-SP!TB with the equivalent sgRNA sequence. For the mock selection experiment, we assembled a compatible plasmid with a pBR322 origin of replication and a bla gene, and if needed to label a strain, GFP. In constructing the spacer library for high-throughput screening against phage, we designed a destination vector for Golden Gate assembly (Figure 4-20). The vector is similar to the sgRNA format of plasmid PM-SP!TB, except that it has a stronger promoter (J23119 instead of J23110), and instead of a spacer, there is an RBS and GFP coding sequence flanked by BsaI sites. This design permits quick screening for backbone (GFP+) versus candidate clones (GFP-) after assembly with spacer inserts prepared to contain compatible overhangs. Figure 4-20 Library construction and sequencing design. 149 4.5.2.2 Spacer library design Below are steps we took to generate an oligonucleotide library that could be synthesized as a 79 nt CustomArray Oligo Pool (CustomArray, Inc, Bothell, WA): 1. Find all possible NGGs and store the 20 nt upstream as candidate spacers for phages T6, RB15, RB33, and RB69. 2. Filter by melting temperature, keeping spacers between 50 and 57 as calculated by the oligoprop function in MATLAB. The range was set based on effective spacers against phages T4, T7, and RB49 we characterized previously in Section 4.3. 3. Drop spacers with GGGG homopolymers. 4. Search and filter out spacers with off-targets to E. coli. Look at E. coli strains K-12 MG1655 and Nissle 1917 and allow one mismatch in the 15 nt closest to the PAM. 5. Rank the remaining spacers by secondary structure. Calculate the minimum free energy (MFE) of each spacer using the entire sgRNA sequence using ViennaRNA (363). Select the top 12,472 for synthesis (~4000 to 5000 per phage). Since we aimed to construct one input library per phage, but some spacers hit multiple phages, we devised a pooling strategy that would allow us to selectively combine spacers that hit a phage regardless of its potential cross-reactivity with other phages. For example, for the T6 library, we needed to combine spacers hitting only T6, spacers hitting both T6 and one of the other three phages, spacers hitting T6 and two of the other three spacers, and spacers hitting all four phages. This amounted to a total of 15 different sub-pools for four phages (since we excluded the combinatorial case of 4 choose 0 in which spacers do not hit any phage), and required pooling 8 different sub-pools for each phage. The barcodes we used were derived from species-unique primers from Chapter 3 for nonE. coli species and tweaked for no or minimal predicted secondary structure. The barcodes and primers are listed in Table 4-5. We then designed the oligos to contain appropriate barcodes and BsaI recognition sites, which are underlined: barcode + TGGTGCCGGTCTCATAGC + spacer + GTTTAGAGACCAGCCGTTGTG 150 name lib_f lib_r name lib_fN1.1 lib_fN1.2 lib_fN1.3 lib_fN1.4 lib_fN2.12 lib_fN2.13 lib_fN2.14 lib_fN2.23 lib_fN2.24 lib_fN2.34 lib_fN3.o1 lib_fN3.o2 lib_fN3.o3 lib_fN3.o4 lib_fN4 sequence CTTTATATCTAATATACAATGGTGCCGGTCTCATAGC total members 12472 CTCATAAACATTAAAAACACAACGGCTGGTCTCTAAAC sequence GACCTTTGATAGTTACAGCGTGG ACCATCTTCTATTGAAACGCTGG TGGAGAAGAAGTCGGGAATGT AAGTATCACTAAGCCGCATGTG GGAGACAGGACATCAACTTTTGG GACGCTTATGGTTAGAACCTTGG AGTAACGGAGATAGTGAAGATGG GGTATCCTGGATTACACGAATGG GGTACTTACGTCAACTGGAATGG GGACAAGTATAAAGGGGAAGTGG GCGGAGTTCTATAGTATGGCTG TAATCATTAAACCCGCTGCTGG TGTTGCATCCTTCGTTGAATGG TGGGTAGTTTTGAGTTTTGGTGG ACGAGATACTTCAGTTCGGCT target phage(s) T6 RB15 RB33 RB69 T6, RB15 T6, RB33 T6, RB69 RB15, RB33 RB15, RB69 RB33, RB69 RB15, RB33, RB69 T6, RB33, RB69 T6, RB15, RB69 T6, RB15, RB33 T6, RB15, RB33, RB69 total members 1493 1404 1550 4936 586 478 15 577 26 12 21 9 18 1287 60 Table 4-5 Primers for amplifying sub-pools of oligonucleotides based on barcodes. The barcode portion is in underlined. Primer lib_r is paired with each of the other 15 forward primers. Pairing it with primer lib_f will amplify all oligos. 4.5.2.3 Spacer library construction and selection on phage-embedded agar Using 0.3 μM of primers for each sub-pool (Table 4-5) and 10 ng of the oligo library as template, we performed PCRs (KAPA HiFi HotStart ReadyMix PCR Kit, Kapa Biosystems, Wilmington, MA) consisting of 25 cycles of 15 s annealing at 64oC and 10 s extension. For some samples, we used 20 cycles and 10 s annealing, or decreased the primer concentration to 0.1 μM. The products were purified with MinElute purification (QIAGEN), quantified by Nanodrop, and pooled by the expected frequency of each PCR sub-pool in the final library for each phage. The pooled PCRs were combined with the destination vector in 15 μL Golden Gate reactions, each composed of: 3 μL of destination plasmid (33 ng/μL) 3 μL of each pooled PCRs for each phage 1.5 μL 10X T4 ligase buffer (NEB) 0.15 μL 100X BSA (NEB) 151 1 μL BsaI (NEB) 1 μL T4 ligase (NEB; 400,000 U/mL) 5.35 μL ddH2O Golden Gate assemblies were carried out for 25 cycles of 3 min at 37oC and 4 min at 16oC, followed by 1 cycle of 5 min at 50oC 5 and 5 min at 80oC. Then 5 μL of each was transformed into 25 μL of NEB Turbo chemically competent cells. After 2 hours of recovery in 250 μL SOC, 10 μL was plated on LB+chloramphenicol agar for characterization while the rest of the culture was diluted into 3 mL LB+chloramphenicol for overnight growth. Plasmids were isolated from 2 mL of culture the next day (QIAprep Spin Miniprep Kit, QIAGEN). From the platings of recovered assemblies, we found 5-7% GFP+ cells across the different libraries. When we picked some clones for Sanger sequencing (Genewiz, South Plainfield, NJ) to verify appropriate incorporation of our spacers, we discovered that 31% of clones had an indel in the 20 nt spacer region of the plasmid. This was in the expected range of oligo synthesis, which has a 0.5-1% error rate per cycle, corresponding to 1 – 0.99579 = 33% for oligos of length 79. To prepare cells for phage selection experiments, we transformed 100 ng of each library into 50 μL of electrocompetent E. coli cells. We used both E. coli B and MG1655 strains that carried the DS-cas9-nt plasmid. Cells were recovered in 500 mL SOC for 2 hours. Aliquots were plated for verification, while the rest of the culture was diluted into 5 mL with antibiotics (spectinomycin+chloramphenicol) for overnight recovery. Fresh phage stocks with titers of ~109 PFU/mL were used for preparing phage-embedded agar at 10 μL phage/mL soft agar. 6 mL of the mixture with appropriate antibiotics was poured onto one-well rectangular petri plates, which already contained 6-7 mL of cooled regular LB agar on the bottom. After the soft agar cooled, we gently spread 60 μL of cells (10 μL cells per mL of soft agar) on top, let the plates dry, and inverted them for incubation overnight at 37oC. Two of the selections (for phages T6 and RB69) were performed in duplicate for host strain E. coli B. The next day, 3 mL of PBS buffer was used to gently but quickly scrape off cells without disturbing the phage-embedded soft agar if possible. The final 1-1.5 mL of recovered mixture was kept on ice until all samples were ready for plasmid extraction (Miniprep, QIAGEN). 4.5.2.4 Library sequencing and analysis The extracted plasmids were used as template for nested PCRs to amplify the spacer sequences for high-throughput sequencing. We pooled each of the four input phage libraries by E. coli strain, which resulted in a total of two input samples to sequence (Bin and Min). The ten 152 output libraries were kept separate in individual PCRs. We designed primers for two sequential PCRs – the first was an inner PCR to amplify the spacer region and the second was an outer PCR to add compatible sequencing indices (Table 4-6). We used ~50 ng of plasmids in the first 20 μL PCR with 25 cycles of 15 s annealing at 62oC and 10 s extension. The second PCR was modified to anneal at 65oC. Custom sequencing primers (Table 4-7) were used for a paired-end 2 x 30 bp run on the MiSeq instrument (Illumina, San Diego, CA) at the Molecular Biology Core Facilities of the Dana-Farber Cancer Institute. First (inner) PCR primers: GAAGCCGTTCTCGATGGACGAcagctagctcagtcctaggtataa L1_initial_f AGGGAACTGAAAGTGGTGGATGTGcagctagctcagtcctaggtataa L2_initial_f GACGGACAGACGGcaagttgataacggactagcctta L1L2_initial_r Second (outer) PCR primers: AATGATACGGCGACCACCGAGATCTACACCCTGCGGAAGCCGTTCTCGATGGACGA L1_F CAAGCAGAAGACGGCATACGAGATATTACTCGGACGGACAGACGGcaagttga input_1_R CAAGCAGAAGACGGCATACGAGATCGCTCATTGACGGACAGACGGcaagttga T6_M_R RB15_M_R CAAGCAGAAGACGGCATACGAGATATTCAGAAGACGGACAGACGGcaagttga RB33_M_R CAAGCAGAAGACGGCATACGAGATCTGAAGCTGACGGACAGACGGcaagttga RB69_M_R CAAGCAGAAGACGGCATACGAGATCGGCTATGGACGGACAGACGGcaagttga CAAGCAGAAGACGGCATACGAGATTCTCGCGCGACGGACAGACGGcaagttga T6_B2_R L2_F input_2_R T6_B_R RB15_B_R RB33_B_R RB69_B_R RB69_B2_R AATGATACGGCGACCACCGAGATCTACACGGAAGGTAGGGAACTGAAAGTGGTGGATGTG CAAGCAGAAGACGGCATACGAGATTCCGGAGAGACGGACAGACGGcaagttga CAAGCAGAAGACGGCATACGAGATGAGATTCCGACGGACAGACGGcaagttga CAAGCAGAAGACGGCATACGAGATGAATTCGTGACGGACAGACGGcaagttga CAAGCAGAAGACGGCATACGAGATTAATGCGCGACGGACAGACGGcaagttga CAAGCAGAAGACGGCATACGAGATTCCGCGAAGACGGACAGACGGcaagttga CAAGCAGAAGACGGCATACGAGATAGCGATAGGACGGACAGACGGcaagttga Table 4-6 Primers for amplifying libraries for high-throughput sequencing. The first PCR uses primer L1_initial_f or L2_initial_f with L1L2_initial_r. The second PCR uses primer L1_F or L2_F with appropriate _R primers for each sample. Sequencing indices are in red. Read 1 sequencing primers: CGTTCTCGATGGACGAcagctagctcagtcctaggtataatgctagc Set1_r1 CTGAAAGTGGTGGATGTGcagctagctcagtcctaggtataatgctagc Set2_r1 Index read sequencing primer: Set1and2_index tagaaatagcaagttaaaataaggctagtccgttatcaacttgCCGTCTGTCCGTC Read 2 sequencing primer: GACGGACAGACGGcaagttgataacggactagccttattttaacttgctatttcta Set1and2_r2 Table 4-7 Custom sequencing primers. 153 From the 1.1 to 1.6 million paired reads per sample, we obtained 0.8 to 1.4 million paired reads per sample (72-90%) after post-processing, which consisted of merging the reads using SeqPrep (https://github.com/jstjohn/SeqPrep) and custom Python scripts that retained reads with an exact 10 nt match to the 3’ end of the sgRNA backbone (Figure 4-20) and an exact match with the remaining 20 nt to the designed spacer library. Any 20 nt sequences that did not match to the expected library accounted for very few reads and could be discarded. Statistical analysis of enriched spacers was performed using edgeR (364). Spacers with a false discover rate (FDR) below 5% were considered significant. 4.5.2.5 Top spacer validation We ordered two oligos for each spacer we sought to validate to ligate into the sgRNA backbone vector. The oligos were of the form: 5’-TAGC-(20 nt spacer)-3’ and 5’-AAAC-(20 nt reverse complement of spacer)-3’. Oligos were mixed in annealing buffer (10 mM Tris, 50 mM NaCl, 1 mM EDTA, pH 7.5–8.0) and incubated at 95 °C for 3 min. After cooling to room temperature, the oligos were combined with BsaI-digested backbone in 20 μL ligation reactions (NEB T4 DNA Ligase), transformed into NEB Turbo cells, and sequenced for correct clones. 4.5.3 Results 4.5.3.1 Mock selection To validate the selection approach, we started with a mixture of eight strains, each carrying one ineffective spacer against T4 (Figure 4-21). We spiked in 1% of GFP-labeled cells that had an effective spacer. After one round of selection on the phage agar, we observed ~70% GFP+ colonies, though this enrichment did not improve with a second round of selection in which we re-introduced the surviving cells onto the same concentrations of phage (Figure 4-22). We used this phage-embedded agar as a one-round selection method on our larger spacer library. 4.5.3.2 Library selection Relative to the input library, the abundance of spacers after phage selection ranged from 1000X depletion to 1000X enrichment, though medians were near zero across samples (Figure 4-23). We also found potential differences between host strains; while relative fold changes correlated (R2 ~ 0.5) for phages T6 and RB15, values were not as consistent between E. coli B and MG1655 for RB33 and RB69 (Figure 4-24). Therefore, we ranked statistically significant spacers from analyses run separately for each strain as well as run on both strains taken together (Table 4-8). We selected four to six top spacers for each phage to validate experimentally. 154 Figure 4-21 Mock library composition of T4 spacers. Plaque assays testing 24 different T4 spacers. Dark quadrants represent completely lysed host cells from the phage infection. Light quadrants represent phage-resistance and therefore cell growth. Intermediate levels of immunity correspond to visible plaques (dark spots) formed by the phage on host cells. Figure 4-22 Mock library selection enriched for effective spacer. As described for the larger library selection, we mixed phage with soft agar and appropriate antibiotics to prepare the selection substrate. Cells were then gently plated on top; surviving cells were collected and replated on fresh phage-embedded agar for a second round of selection. 155 Figure 4-23 Fold change of spacers after phage selection. Counts for each spacer were normalized to the total counts in the sample for that phage library. Then the log base 2 fold change was calculated for each postselection sample relative to the input, matched for phage and host E. coli strain (B or MG1655). Duplicate selections for strain B are labeled as B1 and B2. 156 Figure 4-24 Host strain differences across selection experiments. 157 Table 4-8 Features of top spacers used for validation assays. The functional category of targeted genes is based on phage T4 annotation (365). n.s. = not statistically significant. While only top spacers are shown here and included in validation experiments, there were many more that were significant (FDR < 0.05), had a log fold change greater than 1, and did not have zero input counts. The number of spacers that fit these criteria for analyses run on E. coli B and MG1655 data were 300 for T6, 461 for RB15, and 29 for RB69. In the separate strain evaluations for RB33, there were 296 spacers for E. coli B and 251 for E. coli MG1655. 158 4.5.3.3 Validation of top spacers We performed initial validation by spotting cells containing a single spacer on phageembedded agar (Figure 4-25). Since we were interested in potential cross-reactivity (i.e., a spacer from phage A selection also being effective against phage B), we screened all cloned spacers against all four phages as well as phage T4. Since the cell densities were normalized and presumably phage were sufficiently well-distributed across the soft agar, we tallied crude counts of plaques that formed on the plates (Figure 4-26). We then took the subset of apparently active spacers and conducted a secondary validation with better quantification using plaque assays. Many spacers provided four to six orders of magnitude of protection against phage lysis compared to unprotected controls (Figure 4-27). Figure 4-25 Initial validation of top spacers using phage-embedded agar. (Left panel) Cells (E. coli B or MG1655) carrying each spacer were arrayed in half of a 96-well plate. WT = wild-type, unprotected cells. LB = media only control. Various “Y” spacers were additional controls. (Right panel) The result for phage RB33 is shown here as a representative plate. Protected cells were able to grow and form a visible spot on the agar. Left half of imaged plate: Cells were normalized to OD600nm = 0.3. Each spot has approximately 105 cells and 105 phage (MOI ~ 1). Right imaged half: 10X diluted cells, which corresponded to MOI to ~10. 159 Figure 4-26 Semi-quantitative results of initial validation screen of top anti-phage spacers. Top spacers from each phage selection are listed on the left and labeled as [phage].[spacer#], while the infecting phage in the validation assay is listed across the top. “Y” spacers are previously constructed strains that serve as comparison. Results for each E. coli strain, B and MG1655 (abbreviated “M”), are separated. Each value represents the mean number of plaques formed on the spot of E. coli cells from two experiments, one at a 10 X more dilute cell density than the other. The values are roughly colored from white to blue for increasing immunity against phage; completely lysed E. coli are represented as white cells with no numerical value in the table, whereas E. coli with no visible plaques are “0” in blue. 160 Figure 4-27 Quantitative validation of screened spacers using plaque assays. For each spacer and infecting phage, the efficiency of plating (EOP) was calculated relative to unprotected E. coli strains (“WT”). Smaller circles indicate values near that assay’s detection limit (i.e., true values are further to the right). 161 4.5.4 Discussion We demonstrated that phage-embedded agar can selectively enrich for high-activity antiphage spacers. The most active spacers conferred protection at efficiencies of plating ranging from 10-4 to 10-6. Interestingly, not all top spacers provided phage resistance in the validation assays. It is possible that the less effective spacers resulted from an expansion of receptor mutant populations, or that not enough selection pressure (i.e., phage) was applied. For instance, we actually used 10X less phage in the RB33 selection in the E. coli B strain – both of the top spacers (RB33.1 and RB33.4) we attempted to validate from that experiment did not provide protection against RB33 infection. The most successful selections in this study were with phage T6, in which all six of the top spacers provided phage resistance at EOP ~10-5. Some spacers were broadly active and provided resistance against infection by other phages. These included T6.1, T6.2, T6.4, and RB69.5. Since the RB69.5 sequence has exact matches to the genomes of phages T6, RB15, and RB33, its cross-reactivity is not unexpected. However, the three T6 spacers have mismatches to homologous regions in the RB15, RB33, and RB69 genomes (Table 4-9). This suggests that in future studies, it would be worthwhile to screen all selected spacers against all phages of interest in the follow-up validation, not just the expected target phage on which the selection was performed. Furthermore, it suggests that it may be sufficient to generate a simpler library, without the need for barcoding several sub-pools as we did here, since a member of another phage library could provide protection. One caveat to this alternative approach would be to ensure sufficient sequencing coverage of the entire library. As we observed from our pooled input library sequencing results, low counts could skew calculated enrichment values; we decided to exclude spacers that had zero counts in the input sample, and may have thereby missed effective spacers that happened to not be sequenced in the input. Nevertheless, a critical consideration for continuing to use a sub-pooling design is that for our ultimate application for phage-assisted population cycling in a microbial community, we would like to employ orthogonal spacers and phages to control replacement of different bacterial strains. This requires phage specificity in high-activity spacers. Using the results from this work, if we were to use phages T6, RB15, RB33, and RB69, we would select the following phagespecific spacers: T6.3 and T6.5 against T6, RB15.3 and RB15.4 against RB15, spacer RB33.3 against RB33, and spacer RB69.3 against RB69. However, if we were interested in using a crossreactive spacer, in the case where we would like to introduce two phages at once for example, we would select spacer T6.6 against phages T6 and RB15, or spacer RB69.6 against phages RB15 and RB69. 162 Phage Protospacer PAM T6 RB15 RB33 RB69 Spacer T6.1 GCAATCGACTAATCCAGAAT ACAATCGACTAATCCAGAAT ACAATCGACTAATCCAGAAT ACAATCAACTAAACCAGAAT GGG GGG GGG GGG T6 RB15 RB33 Spacer T6.2 TTGAACCATACACTGCTATT TGG TTGAACCATATACTGCTATT TGG TTGAACCATATACTGCTATT TGG T6 RB15 RB33 Spacer T6.4 ATTAATGGTCTTCCTGTTGT AGG ATTAACGGTCTTCCTGTTGT TGG ATTAACGGTCTTCCTGTTGT TGG T6 RB15 Spacer T6.6 TTAACTCTCGCTCGCATAGT AGG TTAACTCTTGCTCGCATAGT AGG Table 4-9 Sequence analysis of cross-reactive spacers. For each spacer, the homologous regions in other phages are listed below, where mismatches are in bold and underlined. 163 Table 4-10 Comparison of quantified spacer activity with library selection data. Raw counts are included for each input and output library by phage. With the exception of phage RB33, all analyses here were based on considering E. coli B and MG1655 strains together in edgeR. The log base 2 fold change (logFC) and false discovery rate (FDR) are reported. FDR values less than 0.05 are in red. Any validated phage resistance activity is relative to an unprotected control and reported as log base 10 of the efficiency of plating (EOP). 164 Interestingly, when we re-examined the log fold change and FDR values for any crossreactive spacers that were present in the full list of selected spacers for each phage, we found that they were usually several fold lower in enrichment or not statistically significant (Table 4-10). Moreover, by including some quantitative data for less effective spacers, we observed that a meaningful log fold change cutoff could be ~4, though that does not always hold up, such as for spacer RB69.5 in the selections with phage RB15 and phage RB33. We also noticed that the top spacers almost all contained the nucleotide “T” at the position closest to the PAM (Table 4-10). To investigate this further, we calculated the nucleotide frequencies at each position of the 20 nt spacer for both input and output libraries (Figure 4-28). In general, there is already a slight enrichment for T across all positions in the input libraries, but further enrichment of T is most noticeable at the last position in phage T6 in both E. coli strains. A similar trend is seen in RB15 for both strains, RB33 for E. coli B, and RB69 for E. coli B. Amongst our validated spacers, only spacer RB69.6 does not have a T in the final position – it has an A instead. Further validation of other effective spacers in quantitative plaque assays are needed to examine this finding. For future studies, several parameters should be considered, include using different phage concentrations in the selection and validation. The use of higher phage concentrations during the selection could better enrich for highly effective spacers. And the use of a range of phage titers in the validation assays could reveal a gradient of anti-phage activity. Specifically, more spacers that are mildly effective (EOP ~ 10-2) could be validated and matched back to the selection data. Furthermore, spacer libraries should use a sgRNA format for consistent selection. By coincidence, spacer RB69.3 is the same sequence as RB69.Y, which we had previously picked based on effective “Y” spacers targeting the same region in other T4-like phages. This provided insight into effects of promoter strength and the dual RNA versus sgRNA format, since the plasmid encoding RB69.Y has a weaker promoter (J23110 instead of J23119) and expresses the CRISPR RNA from a spacer-repeat array – the tracrRNA is encoded by the DS-SPcas plasmid. Although phage resistance is comparable between RB69.3 and RB69.Y in E. coli B, the weaker promoter and possibly less efficient RNA processing (as two RNAs must come together in the cell) render the RB69.Y version less stable and less effective in E. coli MG1655 (Figure 4-27). Finally, we mapped enriched spacers back to the phage genomes to investigate whether particular genes or regions were sensitive to Cas9-mediated cleavage. We discovered that commonly targeted genes encoded proteins for nucleotide metabolism, DNA packaging, and structural elements such as tail fibers and head vertices. dNTP synthesis is particularly important because the rate of dNTP synthesis is limiting in DNA replication (366). In phage T4, the initiation of DNA replication is controlled by the synthesis of ribonucleotide reductase, a tetrameric enzyme (α2β2), which appears at 4.8 min after infection and is the last enzyme 165 available for the dNTP synthetase complex. We found spacers against nrdA, which encodes the α subunit, enriched in our selections using T6 (Figure 4-29) and RB15 (Figure 4-30). Moreover, in RB69 (Figure 4-32), several spacers target nrdD and nrdG, which are involved in anaerobic de novo synthesis of deoxyribonucleotides (367). In T6 and RB15, enriched spacers also targeted DNA terminase (gp17); in T4, this protein is required for DNA packaging, in which it cleaves and packs DNA into phage proheads (368). Several other regions with spacer enrichment encoded structural proteins, such as long tail fibers in RB33 (Figure 4-31), short tail fibers in T6 and RB15, and various head proteins in RB69. All of these genes are essential. Thus, our selection assay can be used for identifying effective CRISPR spacers for phage resistance applications as well as studying essential genes and phage biology. Figure 4-28 Nucleotide frequencies at each position in the spacer sequence across libraries. 166 For each library, the nucleotide frequencies are calculated for each position based on all spacers that matched to the given phage library. Figure 4-29 Enriched regions on the phage T6 genome. A few regions (A, B, and C) are highlighted in the bottom panels. Of the top spacers validated in this study, T6.3 is located at position 85,799 and T6.4 at 87,647 in in region A. Spacer T6.2 is at 93,432 in region B. The other spacers are elsewhere on the genome: T6.1 at 98,415, T6.5 at 109,061, and T6.6 at 60,188. 167 Figure 4-30 Enriched regions on the phage RB15 genome. Some regions (A, B, C, and D) are displayed at higher resolution in the bottom panels. Spacer RB15.1 is at position 3,150 in region A, RB15.4 at 148,337 in region C, and RB15.3 at 140,131 in region D. 168 Figure 4-31 Enriched regions on the phage RB33 genome. The data here is based on MG1655 data. Regions A, B, C, and D are highlighted in the bottom panels. Spacer RB33.3 is located at position 150,362 in region D. We included spacer RB33.2 (at 3,158 in region A) in validation, but it did not confer protection against RB33 infection. 169 Figure 4-32 Enriched regions on the phage RB69 genome. A few enriched regions are shown in higher resolution in the lower panels. Spacer RB69.3 is located at position 109,036 and RB69.5 at 109,717 in region C. RB69.4 is at 89,565 in region B. Elsewhere on the genome, spacer RB69.6 is at position 158,867. 170 4.5.5 Acknowledgements This work was supported by NSF Small-Business/ ERC Collaborative Opportunity grant IIP-1256446 (to Gingko Bioworks and GMC), US Department of Energy grant DE-FG0202ER63445 (to GMC), and the Wyss Institute for Biologically Inspired Engineering. SJY was also supported by a National Science Foundation Graduate Research Fellowship and KME by the Wyss Technology Development Fellowship. DNA preparation and sequencing were completed at the Molecular Biology Core Facilities of the Dana-Farber Cancer Institute. Analysis was performed on the Orchestra cluster supported by the Harvard Medical School Research Information Technology Group. 171 Chapter 5 Conclusions and outlook on microbiome engineering The human body is naturally colonized by a vast number of microbes, collectively called the human microbiota and whose genes constitute the human microbiome. These microbes benefit the human host by extracting otherwise inaccessible nutrients, helping to develop the immune system, and protecting the host against pathogen colonization (3, 35, 38–40, 42, 46). Yet dysbiosis, or an imbalance between protective and harmful gut flora, can lead to human disease (369). For instance, disturbances to the homeostasis between intestinal microbial antigens and the host’s immune system may bring about type 1 diabetes and inflammatory bowel disease (50, 52). Next-generation sequencing has enabled systematic studies of the microbial and genetic composition of the human microbiota, but we still know relatively little about the function of these microbes and their genes. We set out to study what to edit and how to edit the microbiome. In Chapter 2, we described a novel approach for functional discovery of genetic elements in the microbiota and identified fitness genes conferring an advantage in the mammalian host. Genes characterized in this manner can enable more competitive nutrient utilization, as we demonstrated, or provide other benefits, depending on the in vivo selection conditions. Such selected genes can then be introduced onto mobile genetic vectors or engineered strains to restore microbial imbalances or enhance the fitness of our engineered elements. In Chapter 3, we established foundational tools for working with complex microbial systems, exploring microbiota gene delivery, and actively immunizing native gut flora against acquiring pathogenic elements. In Chapter 4, we harnessed bacteriophages for precise manipulation of endogenous microbiota; a near-term application would be to selectively deplete native E. coli with a set of phages and introduce an enhanced probiotic Nissle 1917 that is not only resistant to those phages, but also immune to acquiring Shiga toxin and multiple antibiotic resistance genes. 172 Overall, the future is bright for microbiome engineering, given innovations across various fields. First, we are now ever better at reading and writing “omes,” enabled by technological advances and cost reductions in high-throughput sequencing and DNA synthesis (370). In our work, this has allowed for temporal functional metagenomics, as well as a generalized approach for addressing large-scale biological questions based on synthesizing libraries of sequences for selection experiments and subsequent sequencing. Given the complex interplay between the microbiota and human host, these methods will continue to be invaluable for interrogating different omes (e.g., DNA, RNA, epigenome) in conjunction with other omics data (e.g., metabolites, antibodies). Second, the advent of precise genome engineering tools, most recently, CRISPR-Cas9, has transformed prospects for gene-based therapies. While precision editing is undoubtedly of interest for clinical applications, it also allows for more precise methods to study the microbiota and its impact on human health. Third, in light of antibiotic overuse in the clinic, there has been renewed interest in phage therapy, which has been a form of personalized medicine in Eastern Europe, where patients can receive phage therapy tailored to the infection based on results from patients’ samples sent to phage collection centers. Our efforts leverage all of these advances for editing the genomes of endogenous species or replacing them with protective versions that could immunize microbiota ecosystems against pathogenicity and dysbiosis as well as sense and secrete therapeutic molecules. Clearly, from improving understanding to enhancing engineering of the microbiota, we are closer to realizing the vision of precise and even personalized in vivo editing of the human microbiome. Figure 5-1 Engineering microbiomes from diseased to healthy states. Efforts to engineer microbiomes rely on the abilities to sequence, discover, and edit. Although one may consider these sequentially, from metagenomics sequencing to functional gene discovery to precise genome editing, these processes can be thought of as general tools that enhance one another for more powerful studies that improve biological understanding and build better therapeutics. 173 Bibliography 1. Yaung SJ, Church GM, Wang HH (2014) Recent Progress in Engineering Human-Assosciated Microbiomes. Methods Mol Biol 1151:69–74. 2. Huttenhower C et al. (2012) Structure, function and diversity of the healthy human microbiome. Nature 486:207–214. 3. Ley RE, Peterson DA, Gordon JI (2006) Ecological and evolutionary forces shaping microbial diversity in the human intestine. Cell 124:837–48. 4. Turnbaugh PJ et al. (2007) The human microbiome project. Nature 449:804–10. 5. Nicholson JK, Holmes E, Wilson ID (2005) Gut microorganisms, mammalian metabolism and personalized health care. Nat Rev Microbiol 3:431–438. 6. Dethlefsen L, McFall-Ngai M, Relman DA (2007) An ecological and evolutionary perspective on human-microbe mutualism and disease. Nature 449:811–8. 7. Qin J et al. (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464:59–65. 8. Kaeberlein T, Lewis K, Epstein SS (2002) Isolating “uncultivable” microorganisms in pure culture in a simulated natural environment. Science 296:1127–9. 9. Hayes CS, Aoki SK, Low DA (2010) Bacterial contact-dependent delivery systems. Annu Rev Genet 44:71–90. 10. Bassler BL, Losick R (2006) Bacterially speaking. Cell 125:237–46. 11. Walker AW et al. (2008) The species composition of the human intestinal microbiota differs between particle-associated and liquid phase communities. Environ Microbiol 10:3275–83. 12. Smillie CS et al. (2011) Ecology drives a global network of gene exchange connecting the human microbiome. Nature 480:2–5. 13. Bradshaw DJ, Homer KA, Marsh PD, Beighton D (1994) Metabolic cooperation in oral microbial communities during growth on mucin. Microbiology 140:3407–12. 14. Falony G, Vlachou A, Verbrugghe K, De Vuyst L (2006) Cross-feeding between Bifidobacterium longum BB536 and acetate-converting, butyrate-producing colon bacteria during growth on oligofructose. Appl Environ Microbiol 72:7835–41. 174 15. Salazar N, Gueimonde M, Hernández-Barranco AM, Ruas-Madiedo P, de los Reyes-Gavilán CG (2008) Exopolysaccharides produced by intestinal Bifidobacterium strains act as fermentable substrates for human intestinal bacteria. Appl Env Microbiol 74:4737–4745. 16. Gibson GR et al. (1990) Alternative pathways for hydrogen disposal during fermentation in the human colon. Gut 31:679–683. 17. Dabard J et al. (2001) Ruminococcin A, a New Lantibiotic Produced by a Ruminococcus gnavus Strain Isolated from Human Feces. Appl Environ Microbiol 67:4111–4118. 18. Santagati M, Scillato M, Patanè F, Aiello C, Stefani S (2012) Bacteriocin-producing oral streptococci and inhibition of respiratory pathogens. FEMS Immunol Med Microbiol. 19. Gillor O, Etzion A, Riley MA (2008) The dual role of bacteriocins as anti- and probiotics. Appl Microbiol Biotechnol 81:591–606. 20. Davey ME, George AO, Toole GAO (2000) Microbial Biofilms : from Ecology to Molecular Genetics Microbial Biofilms : from Ecology to Molecular Genetics. 64. 21. Marsh PD, Moter A, Devine DA (2011) Dental plaque biofilms: communities, conflict and control. Periodontol 2000 55:16–35. 22. Boles BR, Thoendel M, Singh PK (2004) Self-generated diversity produces “insurance effects” in biofilm communities. Proc Natl Acad Sci U S A 101:16630–5. 23. Stewart PS, Franklin MJ (2008) Physiological heterogeneity in biofilms. Nat Rev Microbiol 6:199–210. 24. Frost LS, Leplae R, Summers AO, Toussaint A (2005) Mobile genetic elements: the agents of open source evolution. Nat Rev Microbiol 3:722–32. 25. Gogarten JP, Townsend JP (2005) Horizontal gene transfer, genome innovation and evolution. Nat Rev Microbiol 3:679–87. 26. Norman A, Hansen LH, Sørensen SJ (2009) Conjugative plasmids: vessels of the communal gene pool. Philos Trans R Soc Lond B Biol Sci 364:2275–89. 27. Jones B V, Marchesi JR (2007) Accessing the mobile metagenome of the human gut microbiota. Mol Biosyst 3:749–58. 28. Dobrindt U, Hochhut B, Hentschel U, Hacker J (2004) Genomic islands in pathogenic and environmental microorganisms. Nat Rev Microbiol 2:414–424. 29. Baquero F (2004) From pieces to patterns: evolutionary engineering in bacterial pathogens. Nat 175 Rev Microbiol 2:510–518. 30. Salyers AA (1993) Gene transfer in the mammalian intestinal tract. Curr Opin Biotechnol 4:294– 298. 31. Reid G et al. (2010) Microbiota restoration: natural and supplemented recovery of human microbial communities. Nat Rev Microbiol 9:27–38. 32. Koenig JE et al. (2010) Succession of microbial consortia in the developing infant gut microbiome. Proc Natl Acad Sci U S A 108 Suppl :4578–85. 33. Van den Abbeele P, Van de Wiele T, Verstraete W, Possemiers S (2011) The host selects mucosal and luminal associations of coevolved gut microorganisms: a novel concept. FEMS Microbiol Rev 35:681–704. 34. Giraud A et al. (2008) Dissecting the genetic components of adaptation of Escherichia coli to the mouse gut. PLoS Genet 4:e2. 35. Gill SR et al. (2006) Metagenomic analysis of the human distal gut microbiome. Science 312:1355–1359. 36. Bäckhed F, Ley RE, Sonnenburg JL, Peterson DA, Gordon JI (2005) Host-Bacterial Mutualism in the Human Intestine. Science 307:1915–1920. 37. Guarner F, Malagelada J-R (2003) Gut flora in health and disease. Lancet 361:512–519. 38. Stappenbeck TS, Hooper L V, Gordon JI (2002) Developmental regulation of intestinal angiogenesis by indigenous microbes via Paneth cells. Proc Natl Acad Sci U S A 99:15451–5. 39. Rakoff-Nahoum S, Paglino J, Eslami-Varzaneh F, Edberg S, Medzhitov R (2004) Recognition of commensal microflora by toll-like receptors is required for intestinal homeostasis. Cell 118:229– 41. 40. Hooper L V (2004) Bacterial contributions to mammalian gut development. Trends Microbiol 12:129–134. 41. Pryde SE, Duncan SH, Hold GL, Stewart CS, Flint HJ (2002) The microbiology of butyrate formation in the human colon. FEMS Microbiol Lett 217:133–9. 42. Round JL, Mazmanian SK (2010) Inducible Foxp3+ regulatory T-cell development by a commensal bacterium of the intestinal microbiota. Proc Natl Acad Sci U S A 107:12204–12209. 43. Wu GD et al. (2011) Linking long-term dietary patterns with gut microbial enterotypes. Science 334:105–8. 176 44. Serino M et al. (2012) Metabolic adaptation to a high-fat diet is associated with a change in the gut microbiota. Gut 61:543–553. 45. Honda K, Littman DR (2011) The Microbiome in Infectious Disease and Inflammation. Annu Rev Immunol 30:759–795. 46. Ley RE et al. (2005) Obesity alters gut microbial ecology. Proc Natl Acad Sci U S A 102:11070–5. 47. Turnbaugh PJ, Bäckhed F, Fulton L, Gordon JI (2008) Diet-induced obesity is linked to marked but reversible alterations in the mouse distal gut microbiome. Cell Host Microbe 3:213–23. 48. Murphy EF et al. (2010) Composition and energy harvesting capacity of the gut microbiota: relationship to diet, obesity and time in mouse models. Gut 59:1635–42. 49. Cerf-Bensussan N, Gaboriau-Routhiau V (2010) The immune system and the gut microbiota: friends or foes? Nat Rev Immunol 10:735–44. 50. Wen L et al. (2008) Innate immunity and intestinal microbiota in the development of Type 1 diabetes. Nature 455:1109–13. 51. Lee YK, Menezes JS, Umesaki Y, Mazmanian SK (2010) Proinflammatory T-cell responses to gut microbiota promote experimental autoimmune encephalomyelitis. Proc Natl Acad Sci U S A 108:Suppl 1:4615–22. 52. Abraham C, Cho JH (2009) Inflammatory bowel disease. N Engl J Med 361:2066–78. 53. Hong P-Y et al. (2010) Comparative analysis of fecal microbiota in infants with and without eczema. PLoS One 5:e9964. 54. Saulnier DM et al. (2011) Gastrointestinal microbiome signatures of pediatric patients with irritable bowel syndrome. Gastroenterology 141:1782–1791. 55. Claesson MJ et al. (2011) Composition, variability, and temporal stability of the intestinal microbiota of the elderly. Proc Natl Acad Sci U S A 108 Suppl:4586–91. 56. Yatsunenko T et al. (2012) Human gut microbiome viewed across age and geography. Nature 486:222–227. 57. Spor A, Koren O, Ley R (2011) Unravelling the effects of the environment and host genotype on the gut microbiome. Nat Rev Microbiol 9:279–290. 58. Nell S, Suerbaum S, Josenhans C (2010) The impact of the microbiota on the pathogenesis of IBD: lessons from mouse infection models. Nat Rev Microbiol 8:564–77. 177 59. Sokol H et al. (2009) Low counts of Faecalibacterium prausnitzii in colitis microbiota. Inflamm Bowel Dis 15:1183–1189. 60. Manichanh C et al. (2006) Reduced diversity of faecal microbiota in Crohn’s disease revealed by a metagenomic approach. Gut 55:205–211. 61. He T et al. (2008) The role of colonic metabolism in lactose intolerance. Eur J Clin Invest 38:541– 7. 62. He T et al. (2006) Colonic fermentation may play a role in lactose intolerance in humans. J Nutr 136:58. 63. Tehrani AB, Nezami BG, Gewirtz A, Srinivasan S (2012) Obesity and its associated disease: a role for microbiota? Neurogastroenterol Motil 24:305–311. 64. Everard A et al. (2011) Responses of Gut Microbiota and Glucose and Lipid Metabolism to Prebiotics in Genetic Obese and Diet-Induced Leptin-Resistant Mice. Diabetes 60:1–12. 65. Giongo A et al. (2010) Toward defining the autoimmune microbiome for type 1 diabetes. ISME J 5:1–10. 66. Wu H-J et al. (2010) Gut-residing segmented filamentous bacteria drive autoimmune arthritis via T helper 17 cells. Immunity 32:815–27. 67. Lam V et al. (2012) Intestinal microbiota determine severity of myocardial infarction in rats. FASEB J:1–9. 68. Wardwell LH, Huttenhower C, Garrett WS (2011) Current concepts of the intestinal microbiota and the pathogenesis of infection. Curr Infect Dis Rep 13:28–34. 69. Gori A et al. (2008) Early impairment of gut function and gut flora supporting a role for alteration of gastrointestinal mucosa in human immunodeficiency virus pathogenesis. J Clin Microbiol 46:757–8. 70. Stecher B, Hardt W-D (2008) The role of microbiota in infectious disease. Trends Microbiol 16:107–114. 71. Walk ST, Young VB (2008) Emerging Insights into Antibiotic-Associated Diarrhea and Clostridium difficile Infection through the Lens of Microbial Ecology. Interdiscip Perspect Infect Dis 2008:125081. 72. Vrieze A et al. (2010) The environment within: how gut microbiota may influence metabolism and body composition. Diabetologia 53:606–13. 178 73. Hou JK, Abraham B, El-Serag H (2011) Dietary intake and risk of developing inflammatory bowel disease: a systematic review of the literature. Am J Gastroenterol 106:563–573. 74. Fava F, Lovegrove JA, Gitau R, Jackson KG, Tuohy KM (2006) The gut microbiota and lipid metabolism: implications for human health and coronary heart disease. Curr Med Chem 13:3005– 21. 75. Wang Z et al. (2011) Gut flora metabolism of phosphatidylcholine promotes cardiovascular disease. Nature 472:57–63. 76. Dobkin JF, Saha JR, Butler VP, Neu HC, Lindenbaum J (1983) Digoxin-inactivating bacteria: identification in human gut flora. Science 220:325–327. 77. Clayton TA, Baker D, Lindon JC, Everett JR, Nicholson JK (2009) Pharmacometabonomic identification of a significant host-microbiome metabolic interaction affecting human drug metabolism. Proc Natl Acad Sci U S A 106:14728–33. 78. Wallace BD et al. (2010) Alleviating Cancer Drug Toxicity by Inhibiting a Bacterial Enzyme. Science 330:831–835. 79. Marsh P (1994) Microbial ecology of dental plaque and its significance in health and disease. Adv Dent Res 8:263. 80. Azarpazhooh A, Leake JL (2006) Systematic review of the association between respiratory diseases and oral health. J Periodontol 77:1465–82. 81. Ford PJ et al. (2007) Anti-P. gingivalis Response Correlates with Atherosclerosis. J Dent Res 86:35–40. 82. Li L, Messas E, Batista ELL, Levine R, Amar S (2002) Porphyromonas gingivalis Infection Accelerates the Progression of Atherosclerosis in a Heterozygous Apolipoprotein E-Deficient Murine Model. Circulation 105:861–867. 83. Koren O et al. (2010) Human oral, gut, and plaque microbiota in patients with atherosclerosis. Proc Natl Acad Sci U S A 108:4592–8. 84. Haug MC, Tanner SA, Lacroix C, Stevens MJA, Meile L (2011) Monitoring horizontal antibiotic resistance gene transfer in a colonic fermentation model. FEMS Microbiol Ecol 78:210–9. 85. Nelson KE et al. (2010) A catalog of reference genomes from the human microbiome. Science 328:994–9. 86. Human T et al. (2012) A framework for human microbiome research. Nature 486:215–221. 179 87. De Filippo C et al. (2010) Impact of diet in shaping gut microbiota revealed by a comparative study in children from Europe and rural Africa. Proc Natl Acad Sci U S A 107:14691–6. 88. Peterson DA, Frank DN, Pace NR, Gordon JI (2008) Metagenomic approaches for defining the pathogenesis of inflammatory bowel diseases. Cell Host Microbe 3:417–27. 89. Larsen N et al. (2010) Gut microbiota in human adults with type 2 diabetes differs from nondiabetic adults. PLoS One 5:e9085. 90. Yang F et al. (2012) Saliva microbiomes distinguish caries-active from healthy human populations. ISME J 6:1–10. 91. Kong HH et al. (2012) Temporal shifts in the skin microbiome associated with disease flares and treatment in children with atopic dermatitis. Genome Res 22:850–859. 92. Keijser BJF et al. (2008) Pyrosequencing analysis of the Oral Microflora of healthy adults. J Dent Res 87:1016–1020. 93. Gao Z, Tseng C, Pei Z, Blaser MJ (2007) Molecular analysis of human forearm superficial skin bacterial biota. Proc Natl Acad Sci U S A 104:2927–2932. 94. Park J, Kerner A, Burns MA, Lin XN (2011) Microdroplet-enabled highly parallel co-cultivation of microbial communities. PLoS One 6:e17019. 95. Bollmann A, Lewis K, Epstein SS (2007) Incubation of environmental samples in a diffusion chamber increases the diversity of recovered isolates. Appl Environ Microbiol 73:6386–6390. 96. Thomas CM, Nielsen KM (2005) Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat Rev Microbiol 3:711–21. 97. Lorenz MG, Wackernagel W (1994) Bacterial gene transfer by natural genetic transformation in the environment. Microbiol Rev 58:563–602. 98. Wirth R, Friesenegger A, Fiedler S (1989) Transformation of various species of gram-negative bacteria belonging to 11 different genera by electroporation. MGG Mol Gen Genet 216:175–177. 99. Sanford JC, Smith FD, Russell JA (1993) Optimizing the biolistic process for different biological applications. Methods Enzym 217:483–509. 100. Wyber JA, Andrews J, D’Emanuele A (1997) The use of sonication for the efficient delivery of plasmid DNA into cells. Pharm Res 14:750–756. 101. Swords WE (2003) Chemical transformation of E. coli. Methods Mol Biol 235:49–53. 180 102. Thomson AM, Flint HJ (1989) Electroporation induced transformation of Bacteroides ruminicola and Bacteroides uniformis by plasmid DNA. FEMS Microbiol Lett 52:101–4. 103. Calvin NM, Hanawalt PC (1988) High-efficiency transformation of bacterial cells by electroporation. J Bacteriol 170:2796–2801. 104. Goodman AL et al. (2009) Identifying genetic determinants needed to establish a human gut symbiont in its habitat. Cell Host Microbe 6:279–89. 105. Phillips-Jones MK (1995) Introduction of recombinant DNA into Clostridium spp. Methods Mol Biol 47:227–35. 106. Bouillaut L, McBride SM, Sorg JA (2011) Genetic manipulation of Clostridium difficile. Curr Protoc Microbiol Chapter 9:Unit 9A.2. 107. Jennert KC, Tardif C, Young DI, Young M (2000) Gene transfer to Clostridium cellulolyticum ATCC 35319. Microbiology 146 Pt 12:3071–80. 108. Young DI, Evans VJ, Jefferies JR (1999) Genetic Methods in Clostridia. Methods in Microbology 29:191–207. 109. Cocconcelli PS, Ferrari E, Rossi F, Bottazzi V (1992) Plasmid transformation of Ruminococcus albus by means of high-voltage electroporation. FEMS Microbiol Lett 73:203–7. 110. Van Pijkeren J-P et al. (2012) High efficiency recombineering in lactic acid bacteria. Nucleic Acids Res 40:1–13. 111. Damelin LH, Mavri-Damelin D, Klaenhammer TR, Tiemessen CT (2010) Plasmid transduction using bacteriophage Phi(adh) for expression of CC chemokines by Lactobacillus gasseri ADH. Appl Environ Microbiol 76:3878–85. 112. Lizier M, Sarra PG, Cauda R, Lucchini F (2010) Comparison of expression vectors in Lactobacillus reuteri strains. FEMS Microbiol Lett 308:8–15. 113. Ljungh Å, Wadström T eds. (2009) Lactobacillus molecular biology: from genomics to probiotics (Caister Academic Press, Norfolk, UK). 114. Sørvig E, Mathiesen G, Naterstad K, Eijsink VGH, Axelsson L (2005) High-level, inducible gene expression in Lactobacillus sakei and Lactobacillus plantarum using versatile expression vectors. Microbiology 151:2439–2449. 115. Thompson K, Collins MA (1996) Improvement in electroporation efficiency for Lactobacillus plantarum by the inclusion of high concentrations of glycine in the growth medium. J Microbiol Methods 26:73–79. 181 116. Shepard BD, Gilmore MS (1995) Electroporation and efficient transformation of Enterococcus faecalis grown in high concentrations of glycine. Methods Mol Biol 47:217–226. 117. Holo H, Nes IF (1995) Transformation of Lactococcus by electroporation. Methods Mol Biol 47:195–199. 118. Biswas I, Jha JK, Fromm N (2008) Shuttle expression plasmids for genetic studies in Streptococcus mutans. Microbiology 154:2275–2282. 119. McLaughlin RE, Ferretti JJ (1995) Electrotransformation of Streptococci. Methods Mol Biol 47:185–193. 120. Lee JC (1995) Electrotransformation of Staphylococci. Methods Mol Biol 47:209–216. 121. Alexander JE, Andrew PW, Jones D, Roberts IS (1990) Development of an optimized system for electroporation of Listeria species. Lett Appl Microbiol 10:179–181. 122. Kuramitsu HK, Chi B, Ikegami A (2005) Genetic manipulation of Treponema denticola. Curr Protoc Microbiol Chapter 12:Unit 12B.2. 123. Hyde JA, Weening EH, Skare JT (2011) Genetic transformation of Borrelia burgdorferi. Curr Protoc Microbiol Supplement:1–17. 124. Rosa P, Stevenson B, Tilly K (1999) Genetic Methods in Borrelia and Other Spirochaetes. Methods in Microbology 29. 125. Mayo B, van Sinderen D eds. (2010) Bifidobacteria: Genomics and Molecular Aspects (Caister Academic Press, Norfolk, UK). 126. Yeung MK, Kozelsky CS (1994) Transformation of Actinomyces spp. by a gram-negative broadhost-range plasmid. J Bacteriol 176:4173–4176. 127. Parish T, Brown AC (2009) Mycobacteria Protocols eds Parish T, Brown AC (Humana Press, Totowa, NJ). 128. Sassetti CM, Boyd DH, Rubin EJ (2001) Comprehensive identification of conditionally essential genes in mycobacteria. Proc Natl Acad Sci U S A 98:12712–12717. 129. Luijk N Van et al. (2002) Genetics and molecular biology of propionibacteria. Lait 82:45–57. 130. Binet R, Maurelli AT (2009) Transformation and isolation of allelic exchange mutants of Chlamydia psittaci using recombinant DNA introduced by electroporation. Proc Natl Acad Sci U S A 106:292–297. 182 131. Bélanger M, Rodrigues P, Progulske-Fox A (2007) Genetic manipulation of Porphyromonas gingivalis. Curr Protoc Microbiol Chapter 13:Unit13C.2. 132. Flint HJ, Martin JC, Thomson AM (2000) in Electrotransformation of Bacteria, eds Eynard N, Teissié J, pp 140–149. 133. Salyers AA, Shoemaker NB, Nikolich MP (1992) METHOD AND MATERIALS FOR INTRODUCING DNA INTO PREVOTELLA RUMINICOLA. 134. Bacic MK, Smith CJ (2008) Laboratory maintenance and cultivation of bacteroides species. Curr Protoc Microbiol Chapter 13:Unit 13C.1. 135. Salyers AA et al. (1999) Genetic Methods for Bacteroides Species. Methods in Microbology 29:229–249. 136. Smith CJ (1995) Genetic transformation of Bacteroides spp. using electroporation. Methods Mol Biol 47:161–169. 137. Kinder Haake S, Yoder S, Gerardo SH (2006) Efficient gene transfer and targeted mutagenesis in Fusobacterium nucleatum. Plasmid 55:27–38. 138. Segal ED (1995) Electroporation of Helicobacter pylori. Methods Mol Biol 47:179–184. 139. Taylor D (1992) Genetics of Campylobacter and Helicobacter. Annu Rev Microbiol:35–64. 140. Rachek LI et al. (2000) Transformation of Rickettsia prowazekii to Erythromycin Resistance Encoded by the Escherichia coli ereB Gene Transformation of Rickettsia prowazekii to Erythromycin Resistance Encoded by the Escherichia coli ereB Gene. J Bacteriol 182:3289–3291. 141. McQuiston J, Schurig G (1995) Transformation of Brucella species with suicide and broad hostrange plasmids. Methods Mol 47:143–148. 142. Scarlato V, Ricci S, Rappuoli R, Pizza M (1996) in Microbial Genome Methods, ed Adolph KW (CRC Press), pp 247–262. 143. Bogdan JA, Minetti CASA, Blake MS (2002) A one-step method for genetic transformation of non-piliated Neisseria meningitidis. J Microbiol Methods 49:97–101. 144. Genco CA, Knapp JS, Clark VL (1984) Conjugation of Plasmids of Neisseria gonorrhoeae to other Neisseria Species: Potential Reservoirs for the -Lactamase Plasmid. J Infect Dis 150:397–401. 145. O’Dwyer C et al. (2005) A novel neisserial shuttle plasmid: a useful new tool for meningococcal research. FEMS Microbiol Lett 251:143–7. 183 146. Dennis JJ, Sokol PA (1995) Electrotransformation of Pseudomonas. Methods Mol Biol 47:125– 133. 147. Kleckner N (1981) Transposable elements in prokaryotes. Annu Rev Genet 15:341–404. 148. Goodman AL, Wu M, Gordon JI (2011) Identifying microbial fitness determinants by insertion sequencing using genome-wide transposon mutant libraries. Nat Protoc 6:1969–1980. 149. Van Opijnen T, Bodi KL, Camilli A (2009) Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms. Nat Methods 6:767–72. 150. Gawronski JD, Wong SM, Giannoukos G, Ward D V, Akerley BJ (2009) Tracking insertion mutants within libraries by deep sequencing and a genome-wide screen for Haemophilus genes required in the lung. Proc Natl Acad Sci U S A 106:16422–16427. 151. Langridge GC et al. (2009) Simultaneous assay of every Salmonella Typhi gene using one million transposon mutants. Genome Res 19:2308–2316. 152. Sommer MOA, Dantas G, Church GM (2009) Functional characterization of the antibiotic resistance reservoir in the human microflora. Science 325:1128–31. 153. Warner JR, Reeder PJ, Karimpour-Fard A, Woodruff LB, Gill RT (2010) Rapid profiling of a microbial genome using mixtures of barcoded oligonucleotides. Nat Biotechnol 28:856–862. 154. Sandoval NR et al. (2012) Strategy for directing combinatorial genome engineering in Escherichia coli. Proc Natl Acad Sci U S A. 155. Wang HH et al. (2009) Programming cells by multiplex genome engineering and accelerated evolution. Nature 460:894–8. 156. Wang HH et al. (2012) Genome-scale promoter engineering by coselection MAGE. Nat Methods 9:591–593. 157. Carr PA et al. (2012) Enhanced multiplex genome engineering through co-operative oligonucleotide co-selection. Nucleic Acids Res 40:e132. 158. Wang HH, Church GM (2011) Multiplexed genome engineering and genotyping methods applications for synthetic biology and metabolic engineering. Methods Enzym 498:409–426. 159. Sharan SK, Thomason LC, Kuznetsov SG, Court DL (2009) Recombineering: a homologous recombination-based method of genetic engineering. Nat Protoc 4:206–223. 160. Isaacs FJ et al. (2011) Precise manipulation of chromosomes in vivo enables genome-wide codon replacement. Science 333:348–353. 184 161. Swingle B et al. (2010) Oligonucleotide recombination in Gram-negative bacteria. Mol Microbiol 75:138–148. 162. Swingle B, Bao Z, Markel E, Chambers A, Cartinhour S (2010) Recombineering using RecTE from Pseudomonas syringae. Appl Env Microbiol 76:4960–4968. 163. Van Kessel JC, Hatfull GF (2007) Recombineering in Mycobacterium tuberculosis. Nat Methods 4:147–152. 164. Sonnenburg JL, Angenent LT, Gordon JI (2004) Getting a grip on things: how do communities of bacterial symbionts become established in our intestine? Nat Immunol 5:569–73. 165. Faith JJ, McNulty NP, Rey FE, Gordon JI (2011) Predicting a human gut microbiota’s response to diet in gnotobiotic mice. Science 333:101–4. 166. Hosoda K et al. (2011) Cooperative adaptation to establishment of a synthetic bacterial mutualism. PLoS One 6:e17105. 167. Shou W, Ram S, Vilar JM (2007) Synthetic cooperation in engineered yeast populations. Proc Natl Acad Sci U S A 104:1877–1882. 168. Wintermute EH, Silver PA (2010) Emergent cooperation in microbial metabolism. Mol Syst Biol 6:407. 169. Mee JM, Wang HH (2012) Engineering Ecosystems and Synthetic Ecologies. Mol BioSyst 8:2470–2483. 170. Saeidi N et al. (2011) Engineering microbes to sense and eradicate Pseudomonas aeruginosa, a human pathogen. Mol Syst Biol 7:521. 171. Duan F, March JC (2010) Engineered bacterial communication prevents Vibrio cholerae virulence in an infant mouse model. Proc Natl Acad Sci U S A 107:11260–11264. 172. Steidler L, Hans W, Schotte L, Neirynck S (2000) Treatment of Murine Colitis by Lactococcus lactis Secreting Interleukin-10. Science 289:1352–1355. 173. Steidler L, Rottiers P, Coulie B (2009) Actobiotics as a novel method for cytokine delivery. Ann N Y Acad Sci 1182:135–45. 174. Duncan SH et al. (2003) Effects of alternative dietary substrates on competition between human colonic bacteria in an anaerobic fermentor system. Appl Environ Microbiol 69:1136–42. 175. Leitch ECM, Walker AW, Duncan SH, Holtrop G, Flint HJ (2007) Selective colonization of insoluble substrates by human faecal bacteria. Environ Microbiol 9:667–679. 185 176. Macfarlane GT, Hay S, Gibson GR (1989) Influence of mucin on glycosidase, protease and arylamidase activities of human gut bacteria grown in a 3-stage continuous culture system. J Appl Bacteriol 66:407–17. 177. Molly K, Woestyne M, Verstraete W (1993) Development of a 5-step multi-chamber reactor as a simulation of the human intestinal microbial ecosystem. Appl Microbiol Biotechnol 39:254–258. 178. Possemiers S, Verthé K, Uyttendaele S, Verstraete W (2004) PCR-DGGE-based quantification of stability of the microbial community in a simulator of the human intestinal microbial ecosystem. FEMS Microbiol Ecol 49:495–507. 179. Pratten J (2007) Growing oral biofilms in a constant depth film fermentor (CDFF). Curr Protoc Microbiol Chapter 1:Unit 1B.5. 180. Ready D (2002) Composition and antibiotic resistance profile of microcosm dental plaques before and after exposure to tetracycline. J Antimicrob Chemother 49:769–775. 181. Roberts AP, Pratten J, Wilson M, Mullany P (1999) Transfer of a conjugative transposon, Tn5397 in a model oral biofilm. FEMS Microbiol Lett 177:63–66. 182. Roberts AP et al. (2001) Transfer of Tn916-like elements in microcosm dental plaques. Antimicrob agents 45:2943–2946. 183. Kim HJ, Huh D, Hamilton G, Ingber DE, Links DA (2012) Human Gut-on-a-Chip inhabited by microbial flora that experiences intestinal peristalsis-like motions and flow. Lab Chip:2165–2174. 184. Foster JS, Kolenbrander PE (2004) Development of a multispecies oral bacterial community in a saliva-conditioned flow cell. Appl Environ Microbiol 70:4340. 185. Doucet-Populaire F, Trieu-Cuot P, Dosbaa I, Andremont A, Courvalin P (1991) Inducible transfer of conjugative transposon Tn1545 from Enterococcus faecalis to Listeria monocytogenes in the digestive tracts of gnotobiotic mice. Antimicrob Agents Chemother 35:185–7. 186. Launay A, Ballard SA, Johnson PDR, Grayson ML, Lambert T (2006) Transfer of Vancomycin Resistance Transposon Tn1549 from Clostridium symbiosum to Enterococcus spp . in the Gut of Gnotobiotic Mice. Antimicrob Agents Chemother 50:1054–1062. 187. Turnbaugh PJ et al. (2009) The effect of diet on the human gut microbiome: a metagenomic analysis in humanized gnotobiotic mice. Sci Transl Med 1:6ra14. 188. Lalla E et al. (2003) Oral infection with a periodontal pathogen accelerates early atherosclerosis in apolipoprotein E-null mice. Arterioscler Thromb Vasc Biol 23:1405–11. 189. Sellon RK et al. (1998) Resident enteric bacteria are necessary for development of spontaneous 186 colitis and immune system activation in interleukin-10-deficient mice. Infect Immun 66:5224– 5231. 190. Caricilli AM et al. (2011) Gut microbiota is a key modulator of insulin resistance in TLR 2 knockout mice. PLoS Biol 9:e1001212. 191. Vijay-Kumar M et al. (2010) Metabolic syndrome and altered gut microbiota in mice lacking Tolllike receptor 5. Science 328:228–31. 192. Deng W, Vallance BA, Li Y, Puente JL, Finlay BB (2003) Citrobacter rodentium translocated intimin receptor (Tir) is an essential virulence factor needed for actin condensation, intestinal colonization and colonic hyperplasia in mice. Mol Microbiol 48:95–115. 193. Newman J, Zabel B, Jha S, Schauer D (1999) Citrobacter rodentium espB is necessary for signal transduction and for infection of laboratory mice. Infect Immun 67:6019–6025. 194. Alex P et al. (2009) Distinct cytokine patterns identified from multiplex profiles of murine DSS and TNBS-induced colitis. Inflamm Bowel Dis 15:341–52. 195. Oz HS, Puleo DA (2011) Animal Models for Periodontal Disease. J Biomed Biotechnol 2011:1–8. 196. Naglik JR, Fidel PL, Odds FC (2008) Animal models of mucosal Candida infection. FEMS Microbiol Lett 283:129–139. 197. Mcbride BC, van der Hoeven JS (1981) Role of interbacterial adherence in colonization of the oral cavities of gnotobiotic rats infected with Streptococcus mutans and Veillonella alcalescens. Infect Immun 33:467–472. 198. Mahowald MA et al. (2009) Characterizing a model human gut microbiota composed of members of its two dominant bacterial phyla. Proc Natl Acad Sci U S A 106:5859–64. 199. Sonnenburg JL, Chen CTL, Gordon JI (2006) Genomic and metabolic studies of the impact of probiotics on a model gut symbiont and host. PLoS Biol 4:e413. 200. Lewis NE, Nagarajan H, Palsson BO (2012) Constraining the metabolic genotype-phenotype relationship using a phylogeny of in silico methods. Nat Rev Microbiol 10:291–305. 201. Zomorrodi AR, Maranas CD (2012) OptCom: a multi-level optimization framework for the metabolic modeling and analysis of microbial communities. PLoS Comput Biol 8:e1002363. 202. Mahadevan R, Edwards JS, Doyle 3rd FJ (2002) Dynamic flux balance analysis of diauxic growth in Escherichia coli. Biophys J 83:1331–1340. 203. Greenblum S, Turnbaugh PJ, Borenstein E (2012) Metagenomic systems biology of the human gut 187 microbiome reveals topological shifts associated with obesity and inflammatory bowel disease. Proc Natl Acad Sci U S A 109:594–599. 204. Zhuang K et al. (2011) Genome-scale dynamic modeling of the competition between Rhodoferax and Geobacter in anoxic subsurface environments. ISME J 5:305–316. 205. Taffs R et al. (2009) In silico approaches to study mass and energy flows in microbial consortia: a syntrophic case study. BMC Syst Biol 3:114. 206. Turnbaugh PJ et al. (2009) A core gut microbiome in obese and lean twins. Nature 457:480–4. 207. Rohlke F, Surawicz CM, Stollman N (2010) Fecal flora reconstitution for recurrent Clostridium difficile infection: results and methodology. J Clin Gastroenterol 44:567–70. 208. Miele E et al. (2009) Effect of a probiotic preparation (VSL#3) on induction and maintenance of remission in children with ulcerative colitis. Am J Gastroenterol 104:437–443. 209. Gionchetti P et al. (2003) Prophylaxis of pouchitis onset with probiotic therapy: a double-blind, placebo-controlled trial. Gastroenterology 124:1202–1209. 210. Mimura T et al. (2004) Once daily high dose probiotic therapy (VSL#3) for maintaining remission in recurrent or refractory pouchitis. Gut 53:108–114. 211. Culligan EP, Hill C, Sleator RD (2009) Probiotics and gastrointestinal disease: successes, problems and future prospects. Gut Pathog 1:19. 212. Sartor RB (2004) Therapeutic manipulation of the enteric microflora in inflammatory bowel diseases: antibiotics, probiotics, and prebiotics. Gastroenterology 126:1620–1633. 213. Cronin M et al. (2010) Orally administered bifidobacteria as vehicles for delivery of agents to systemic tumors. Mol Ther 18:1397–407. 214. Fu G-F et al. (2005) Bifidobacterium longum as an oral delivery system of endostatin for gene therapy on solid liver cancer. Cancer Gene Ther 12:133–40. 215. Li X et al. (2003) Bifidobacterium adolescentis as a delivery system of endostatin for cancer gene therapy: selective inhibitor of angiogenesis and hypoxic tumor growth. Cancer Gene Ther 10:105– 11. 216. Duan F, Curtis KL, March JC (2008) Secretion of insulinotropic proteins by commensal bacteria: rewiring the gut to treat diabetes. Appl Environ Microbiol 74:7437–8. 217. Rao S et al. (2005) Toward a live microbial microbicide for HIV: commensal bacteria secreting an HIV fusion inhibitor peptide. Proc Natl Acad Sci U S A 102:11993–8. 188 218. Braat H et al. (2006) A phase I trial with transgenic bacteria expressing interleukin-10 in Crohn’s disease. Clin Gastroenterol Hepatol 4:754–9. 219. Degnan FH (2008) The US Food and Drug Administration and probiotics: regulatory categorization. Clin Infect Dis 46 Suppl 2:S133–6; discussion S144–51. 220. Yaung SJ et al. (2015) Improving microbial fitness in the mammalian gut by in vivo temporal functional metagenomics. Mol Syst Biol 11:788–788. 221. Peterson J et al. (2009) The NIH Human Microbiome Project. Genome Res 19:2317–23. 222. Walker AW, Duncan SH, Louis P, Flint HJ (2014) Phylogeny, culturing, and metagenomics of the human gut microbiota. Trends Microbiol 22:267–74. 223. Healy FG et al. (1995) Direct isolation of functional genes encoding cellulases from the microbial consortia in a thermophilic, anaerobic digester maintained on lignocellulose. Appl Microbiol Biotechnol 43:667–674. 224. Stein J, Marsh T, Wu K, Shizuya H, DeLong E (1996) Characterization of uncultivated prokaryotes: isolation and analysis of a 40-kilobase-pair genome fragment from a planktonic marine archaeon. J Bacteriol 178:591–599. 225. Rondon MR et al. (2000) Cloning the Soil Metagenome : a Strategy for Accessing the Genetic and Functional Diversity of Uncultured Microorganisms. 66:2541–2547. 226. Tasse L et al. (2010) Functional metagenomics to mine the human gut microbiome for dietary fiber catabolic enzymes. Genome Res 20:1605–12. 227. Cecchini DA et al. (2013) Functional metagenomics reveals novel pathways of prebiotic breakdown by human gut bacteria. PLoS One 8:e72766. 228. Gloux K et al. (2011) A metagenomic β-glucuronidase uncovers a core adaptive function of the human intestinal microbiome. Proc Natl Acad Sci U S A 108 Suppl :4539–46. 229. Culligan EP, Sleator RD, Marchesi JR, Hill C (2012) Functional metagenomics reveals novel salt tolerance loci from the human gut microbiome. ISME J 6:1916–25. 230. Lakhdari O et al. (2010) Functional metagenomics: a high throughput screening method to decipher microbiota-driven NF-κB modulation in the human gut. PLoS One 5:1–10. 231. Xu J et al. (2003) A genomic view of the human-Bacteroides thetaiotaomicron symbiosis. Science 299:2074–6. 232. Sonnenburg JL et al. (2005) Glycan foraging in vivo by an intestine-adapted bacterial symbiont. 189 Science 307:1955–9. 233. Bjursell MK, Martens EC, Gordon JI (2006) Functional genomic and metabolic studies of the adaptations of a prominent adult human gut symbiont, Bacteroides thetaiotaomicron, to the suckling period. J Biol Chem 281:36269–79. 234. Martens EC, Chiang HC, Gordon JI (2008) Mucosal glycan foraging enhances fitness and transmission of a saccharolytic human gut bacterial symbiont. Cell Host Microbe 4:447–57. 235. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25. 236. Trapnell C et al. (2013) Differential analysis of gene regulation at transcript resolution with RNAseq. Nat Biotechnol 31:46–53. 237. Li H et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078– 9. 238. Haardt M, Kempf B, Faatz E, Bremer E (1995) The osmoprotectant proline betaine is a major substrate for the binding-protein-dependent transport system ProU of Escherichia coli K-12. Mol Gen Genet 246:783–6. 239. Usui Y et al. (2012) Investigating the effects of perturbations to pgi and eno gene expression on central carbon metabolism in Escherichia coli using (13)C metabolic flux analysis. Microb Cell Fact 11:87. 240. Winson MK et al. (1998) Engineering the luxCDABE genes from Photorhabdus luminescens to provide a bioluminescent reporter for constitutive and promoter probe plasmids and mini-Tn5 constructs. FEMS Microbiol Lett 163:193–202. 241. Jost L (2006) Entropy and diversity. Oikos 113:363–375. 242. Schloss PD et al. (2009) Introducing mothur: Open-source, platform-independent, communitysupported software for describing and comparing microbial communities. Appl Environ Microbiol 75:7537–7541. 243. Bar-Joseph Z, Gerber G, Jaakkola T, Gifford D, Simon I (2003) Continuous representations of time series gene expression data. J Comput Biol 3-4:341–356. 244. Byers JP, Sarver JG (2009) in Pharmacology: Principles and Practice, eds Hacker M, Messer W, Bachmann K (Elsevier), pp 201–277. 245. Valvano M, Messner P, Kosma P (2002) Novel pathways for biosynthesis of nucleotide-activated glycero-manno-heptose precursors of bacterial glycoproteins and cell surface polysaccharides. 190 Microbiology 148:1979–1989. 246. Kneidinger B et al. (2002) Biosynthesis pathway of ADP-L-glycero-β-D-manno-heptose in Escherichia coli. J Bacteriol 184:363–369. 247. Wang L et al. (2010) Divergence of biochemical function in the HAD superfamily: D-glycero-Dmanno-heptose-1,7-bisphosphate phosphatase (GmhB). Biochemistry 49:1072–81. 248. Chiang S, Mekalanos J (1999) rfb mutations in Vibrio cholerae do not affect surface production of toxin-coregulated pili but still inhibit intestinal colonization. Infect Immun 67:976–980. 249. Burns S, Hull S (1998) Comparison of Loss of Serum Resistance by Defined Lipopolysaccharide Mutants and an Acapsular Mutant of UropathogenicEscherichia coli O75: K5. Infect Immun 66:4244–4253. 250. Wexler HM, Tenorio E, Pumbwe L (2009) Characteristics of Bacteroides fragilis lacking the major outer membrane protein, OmpA. Microbiology 155:2694–706. 251. Sato K et al. (2010) OmpA variants affecting the adherence of ulcerative colitis-derived Bacteroides vulgatus. J Med Dent Sci 57:55–64. 252. Soulas C et al. (2000) Cutting Edge: Outer Membrane Protein A (OmpA) Binds to and Activates Human Macrophages. J Immunol 165:2335–2340. 253. Mehra R, Drabble W (1981) Dual Control of the gua Operon of Escherichia coli K12 by Adenine and Guanine Nucleotides. J Gen Microbiol 123:27–37. 254. Ratnayake-Lecamwasam M, Serror P, Wong K-W, Sonenshein A (2001) Bacillus subtilis CodY represses early-stationary-phase genes by sensing GTP levels. Genes Dev 15:1093–1103. 255. Buckstein MH, He J, Rubin H (2008) Characterization of nucleotide pools as a function of physiological state in Escherichia coli. J Bacteriol 190:718–26. 256. Pang B et al. (2012) Defects in purine nucleotide metabolism lead to substantial incorporation of xanthine and hypoxanthine into DNA and RNA. Proc Natl Acad Sci U S A 109:2319–24. 257. Sonnenburg ED et al. (2010) Specificity of polysaccharide use in intestinal bacteroides species determines diet-induced microbiota alterations. Cell 141:1241–52. 258. Blattner FR et al. (1997) The complete genome sequence of Escherichia coli K-12. Science 277:1453–62. 259. Keseler IM et al. (2011) EcoCyc: a comprehensive database of Escherichia coli biology. Nucleic Acids Res 39:D583–90. 191 260. Gaudin HM, Silverman PM (1993) Contributions of promoter context and structure to regulated expression of the F plasmid fraV promoter in Escherichia coii K-12. Mol Microbiol 8:335–342. 261. Guan L, Murphy FD, Kaback HR (2002) Surface-exposed positions in the transmembrane helices of the lactose permease of Escherichia coli determined by intermolecular thiol cross-linking. Proc Natl Acad Sci U S A 99:3475–80. 262. Soupene E et al. (2003) Physiological Studies of Escherichia coli Strain MG1655 : Growth Defects and Apparent Cross-Regulation of Gene Expression. J Bacteriol 185:5611–5626. 263. Weickert MJ, Adhyat S (1992) Isorepressor of the gal Regulon in Escherichia coli. J Mol Biol 226:69–83. 264. Weickert MJ, Adhya S (1993) Control of transcription of gal repressor and isorepressor genes in Escherichia coli. J Bacteriol 175:251–8. 265. Juge N (2012) Microbial adhesins to gastrointestinal mucus. Trends Microbiol 20:30–9. 266. Freter R, Brickner H, Botney M, Cleven D, Aranki A (1983) Mechanisms that control bacterial populations in continuous-flow culture models of mouse large intestinal flora. Infect Immun 39:676. 267. Maltby R, Leatham-Jensen MP, Gibson T, Cohen PS, Conway T (2013) Nutritional basis for colonization resistance by human commensal Escherichia coli strains HS and Nissle 1917 against E. coli O157:H7 in the mouse intestine. PLoS One 8:e53957. 268. Lee SM et al. (2013) Bacterial colonization factors control specificity and stability of the gut microbiota. Nature 501:426–9. 269. Rakoff-Nahoum S, Coyne MJ, Comstock LE (2014) An ecological network of polysaccharide utilization among human intestinal symbionts. Curr Biol 24:40–9. 270. Ringel Y, Quigley EMM, Lin HC (2012) Using Probiotics in Gastrointestinal Disorders. Am J Gastroenterol Suppl 1:34–40. 271. Bermúdez-Humarán LG, Kharrat P, Chatel J-M, Langella P (2011) Lactococci and lactobacilli as mucosal delivery vectors for therapeutic proteins and DNA vaccines. Microb Cell Fact 10 Suppl 1:S4. 272. Motta J-P et al. (2012) Food-grade bacteria expressing elafin protect against inflammation and restore colon homeostasis. Sci Transl Med 4:158ra144. 273. Wells JM, Mercenier A (2008) Mucosal delivery of therapeutic and prophylactic molecules using lactic acid bacteria. Nat Rev Microbiol 6:349–62. 192 274. Lawley TD, Walker AW (2013) Intestinal colonization resistance. Immunology 138:1–11. 275. Bäckhed F et al. (2012) Defining a healthy human gut microbiome: current concepts, future directions, and clinical applications. Cell Host Microbe 12:611–22. 276. Cohen ML (1992) Epidemiology of drug resistance: implications for a post-antimicrobial era. Science 257:1050–5. 277. Shoemaker N, Vlamakis H, Hayes K, Salyers A (2001) Evidence for extensive resistance gene transfer among Bacteroides spp. and among Bacteroides and other genera in the human colon. Appl Environ Microbiol 67:561–568. 278. Zhang F, Luo W, Shi Y, Fan Z, Ji G (2012) Should we standardize the 1,700-year-old fecal microbiota transplantation? Am J Gastroenterol 107:1755; author reply p.1755–6. 279. Brandt LJ (2013) American Journal of Gastroenterology Lecture: Intestinal microbiota and the role of fecal microbiota transplant (FMT) in treatment of C. difficile infection. Am J Gastroenterol 108:177–85. 280. Hehemann J-H et al. (2010) Transfer of carbohydrate-active enzymes from marine bacteria to Japanese gut microbiota. Nature 464:908–12. 281. Stecher B et al. (2012) Gut inflammation can boost horizontal gene transfer between pathogenic and commensal Enterobacteriaceae. Proc Natl Acad Sci U S A 109:1269–74. 282. Dionisio F, Matic I, Radman M, Rodrigues OR, Taddei F (2002) Plasmids spread very fast in heterogeneous bacterial communities. Genetics 162:1525–32. 283. Pansegrau W et al. (1994) Complete nucleotide sequence of Birmingham IncP alpha plasmids. J Mol Biol 239:623–663. 284. Smith CJ, Rogers MMB, McKee ML (1992) Heterologous gene expression in Bacteroides fragilis. Plasmid 27:141–54. 285. Rasmussen JL, Odelson D a, Macrina FL (1987) Complete nucleotide sequence of insertion element IS4351 from Bacteroides fragilis. J Bacteriol 169:3573–80. 286. Altschul SF et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–402. 287. Guiney DG, Hasegawa P, Davis CE (1984) Plasmid transfer from Escherichia coli to Bacteroides fragilis: differential expression of antibiotic resistance phenotypes. Proc Natl Acad Sci U S A 81:7203–6. 193 288. Garrigues-Jeanjean N, Wittmer A, Ouriet MM., Duval-Iflah Y (1999) Transfer of the shuttle vector pRRI207 between Escherichia coli and Bacteroides spp. in vitro and in vivo in the digestive tract of axenic mice and in gnotoxenic mice inoculated with a human microflora. FEMS Microbiol Ecol 29:33–43. 289. Trieu-Cuot P, Carlier C, Martin P, Courvalin P (1987) Plasmid transfer by conjugation from Escherichia coli to Gram-positive bacteria. FEMS Microbiol Lett 48:289–294. 290. Trieu-Cuot P, Carlier C, Courvalin P (1988) Conjugative plasmid transfer from Enterococcus faecalis to Escherichia coli. J Bacteriol 170:4388–91. 291. Shkoporov AN et al. (2008) Characterization of plasmids from human infant Bifidobacterium strains: sequence analysis and construction of E. coli-Bifidobacterium shuttle vectors. Plasmid 60:136–48. 292. Horvath P, Barrangou R (2010) CRISPR/Cas, the immune system of bacteria and archaea. Science 327:167–70. 293. Palmer KL, Gilmore MS (2010) Multidrug-resistant enterococci lack CRISPR-cas. MBio 1:e00227–10. 294. Groman NB (1953) The relation of bacteriophage to the change of Corynebacterium diphtheriae from avirulence to virulence. Science 117:297–9. 295. Freeman V (1951) Studies on the virulence of bacteriophage-infected strains of Corynebacterium diphtheriae. J Bacteriol 61:675–688. 296. Betley M, Mekalanos J (1985) Staphylococcal enterotoxin A is encoded by phage. Science 456:233–235. 297. Acheson DWK et al. (1998) In vivo transduction with shiga toxin 1-encoding phage. Infect Immun 66:4496–4498. 298. Waldor M, Mekalanos J (1996) Lysogenic conversion by a filamentous phage encoding cholera toxin. Science 272:1910–1914. 299. Deltcheva E et al. (2011) CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III. Nature 471:602–7. 300. Bikard D, Hatoum-Aslan A, Mucida D, Marraffini LA (2012) CRISPR interference can prevent natural transformation and virulence acquisition during in vivo bacterial infection. Cell Host Microbe 12:177–86. 301. Cheng K, Smith G (1984) Recombinational hotspot activity of Chi-like sequences. J Mol Biol 194 151:371–377. 302. Pearson GD, Woods A, Chiang SL, Mekalanos JJ (1993) CTX genetic element encodes a sitespecific recombination system and an intestinal colonization factor. Proc Natl Acad Sci U S A 90:3750–3754. 303. Esvelt KM et al. (2013) Orthogonal Cas9 proteins for RNA-guided gene regulation and editing. Nat Methods 10:1116–21. 304. Iwanaga M, Yamamoto K (1985) New medium for the production of cholera toxin by Vibrio cholerae O1 biotype El Tor. J Clin Microbiol 22:405–8. 305. Arnold R et al. (2012) Emergence of Klebsiella pneumoniae Carbapenemase (KPC)- Producing Bacteria. South Med J 104:40–45. 306. Moellering Jr RC (2010) NDM-1—A Cause for Worldwide Concern. N Engl J Med 363:2377– 2379. 307. Courvalin P (2006) Vancomycin Resistance in Gram-Positive Cocci. Clin Infect Dis 42:S25–34. 308. Dalsgaard A, Forslund A, Sandvang D, Arntzen L, Keddy K (2001) Vibrio cholerae O1 outbreak isolates in Mozambique and South Africa in 1998 are multiple-drug resistant, contain the SXT element and the aadA2 gene located on class 1 integrons. J Antimicrob Chemother 48:827–38. 309. Beaber JW, Burrus V, Hochhut B, Waldor MK (2002) Comparison of SXT and R391, two conjugative integrating elements: Definition of a genetic backbone for the mobilization of resistance determinants. Cell Mol Life Sci 59:2065–2070. 310. Nishimasu H et al. (2014) Crystal structure of Cas9 in complex with guide RNA and target DNA. Cell 156:935–49. 311. Van Hoek AHAM et al. (2011) Acquired antibiotic resistance genes: an overview. Front Microbiol 2:203. 312. Friedman-Ohana R, Karunker I, Cohen A (1998) Chi-dependent intramolecular recombination in Escherichia coli. Genetics 148:545–57. 313. Rund SA, Rohde H, Sonnenborn U, Oelschlaeger TA (2013) Antagonistic effects of probiotic Escherichia coli Nissle 1917 on EHEC strains of serotype O104:H4 and O157:H7. Int J Med Microbiol 303:1–8. 314. Halpern D et al. (2007) Identification of DNA motifs implicated in maintenance of bacterial core genomes by predictive modeling. PLoS Genet 3:1614–21. 195 315. Duerkop BA, Clements C V, Rollins D, Rodrigues JLM, Hooper L V (2012) A composite bacteriophage alters colonization by an intestinal commensal bacterium. Proc Natl Acad Sci U S A 109:17621–6. 316. Chibani-Chennoufi S et al. (2004) In vitro and in vivo bacteriolytic activities of Escherichia coli phages: implications for phage therapy. Antimicrob Agents Chemother 48:2558–2569. 317. Abedon ST, Kuhl SJ, Blasdel BG, Kutter EM (2011) Phage treatment of human infections. Bacteriophage 1:66–85. 318. Rossmann FS et al. (2015) Phage-mediated Dispersal of Biofilm and Distribution of Bacterial Virulence Genes Is Induced by Quorum Sensing. PLOS Pathog 11:e1004653. 319. Mojica FJM, Díez-Villaseñor C, García-Martínez J, Almendros C (2009) Short motif sequences determine the targets of the prokaryotic CRISPR defence system. Microbiology 155:733–40. 320. Kutter E (2009) Phage host range and efficiency of plating. Methods Mol Biol 501:141–9. 321. Myhal ML, Laux DC, Cohen PS (1982) Relative colonizing abilities of human fecal and K 12 strains of Escherichia coli in the large intestines of streptomycin-treated mice. Eur J Clin Microbiol 1:186–92. 322. Bourdin G et al. (2014) Amplification and purification of T4-like escherichia coli phages for phage therapy: from laboratory to pilot scale. Appl Environ Microbiol 80:1469–76. 323. Macconkey A (1905) Lactose-Fermenting Bacteria in Faeces. J Hyg (Lond) 5:333–79. 324. Kotula JW et al. (2014) Programmable bacteria detect and record an environmental signal in the mammalian gut. Proc Natl Acad Sci U S A 111:4838–43. 325. Timms AR, Steingrimsdottir H, Lehmann AR, Bridges BA (1992) Mutant sequences in the rpsL gene of Escherichia coli B/r: mechanistic implications for spontaneous and ultraviolet light mutagenesis. Mol Gen Genet 232:89–96. 326. Yaung SJ, Esvelt KM, Church GM (2014) CRISPR/Cas9-Mediated Phage Resistance Is Not Impeded by the DNA Modifications of Phage T4. PLoS One 9:e98811. 327. Lehman IR, Pratt EA (1960) On the structure of the glucosylated hydroxymethylcytosine nucleotides of coliphages T2, T4, and T6. J Biol Chem 235:3254–9. 328. Kelleher J, Raleigh E (1991) A novel activity in Escherichia coli K-12 that directs restriction of DNA modified at CG dinucleotides. J Bacteriol 173:5220–3. 329. Jinek M et al. (2012) A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial 196 immunity. Science 337:816–21. 330. Barrangou R et al. (2007) CRISPR provides acquired resistance against viruses in prokaryotes. Science 315:1709–12. 331. Marraffini LA, Sontheimer EJ (2008) CRISPR interference limits horizontal gene transfer in staphylococci by targeting DNA. Science 322:1843–5. 332. Grissa I, Vergnaud G, Pourcel C (2007) The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats. BMC Bioinformatics 8:172. 333. Díez-Villaseñor C, Almendros C, García-Martínez J, Mojica FJM (2010) Diversity of CRISPR loci in Escherichia coli. Microbiology 156:1351–61. 334. Touchon M et al. (2011) CRISPR distribution within the Escherichia coli species is not suggestive of immunity-associated diversifying selection. J Bacteriol 193:2460–7. 335. Toro M et al. (2014) Association of Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) Elements with Specific Serotypes and Virulence Potential of Shiga Toxin-Producing Escherichia coli. Appl Environ Microbiol 80:1411–20. 336. Grissa I, Vergnaud G, Pourcel C (2007) CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res 35:W52–7. 337. Semenova E et al. (2011) Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence. Proc Natl Acad Sci U S A 108:10098–103. 338. Georgopoulos CP (1967) Isolation and preliminary characterization of T4 mutants with nonglucosylated DNA. Biochem Biophys Res Commun 28:179–184. 339. Warren RAJ (1980) Modified bases in bacteriophage DNAs. Annu Rev Microbiol 34:137–58. 340. Petrov VM, Ratnayaka S, Nolan JM, Miller ES, Karam JD (2010) Genomes of the T4-related bacteriophages as windows on microbial genome evolution. Virol J 7:292. 341. Sheludchenko MS, Huygens F, Hargreaves MH (2010) Highly discriminatory single-nucleotide polymorphism interrogation of Escherichia coli by use of allele-specific real-time PCR and eBURST analysis. Appl Environ Microbiol 76:4337–45. 342. Dupuis M-È, Villion M, Magadán AH, Moineau S (2013) CRISPR-Cas and restrictionmodification systems are compatible and increase phage resistance. Nat Commun 4:2087. 343. Hsu PD et al. (2013) DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 31:827–32. 197 344. Monod C, Repoila F, Kutateladze M, Tétart F, Krisch HM (1997) The genome of the pseudo Teven bacteriophages, a diverse group that resembles T4. J Mol Biol 267:237–49. 345. Snyder L, Gold L, Kutter E (1976) A gene of bacteriophage T4 whose product prevents true late transcription on cytosine-containing T4 DNA. Proc Natl Acad Sci U S A 73:3098–102. 346. Kornberg S, Zimmerman S, Kornberg A (1961) Glucosylation of deoxyribonucleic acid by enzymes from bacteriophage-infected Escherichia coli. J Biol Chem 236:1487–1493. 347. Choo Y (1998) Recognition of DMA methylation by zinc fingers. Nat Struct Biol 5:264–265. 348. Valton J et al. (2012) Overcoming transcription activator-like effector (TALE) DNA binding domain sensitivity to cytosine methylation. J Biol Chem 287:38427–32. 349. Fineran PC et al. (2014) Degenerate target sites mediate rapid primed CRISPR adaptation. Proc Natl Acad Sci U S A 111:E1629–38. 350. Deveau H et al. (2008) Phage response to CRISPR-encoded resistance in Streptococcus thermophilus. J Bacteriol 190:1390–1400. 351. Levin BR, Moineau S, Bushman M, Barrangou R (2013) The population and evolutionary dynamics of phage and bacteria with CRISPR-mediated immunity. PLoS Genet 9:e1003312. 352. Magadán AH, Dupuis M-È, Villion M, Moineau S (2012) Cleavage of phage DNA by the Streptococcus thermophilus CRISPR3-Cas system. PLoS One 7:e40913. 353. Garneau JE et al. (2010) The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Nature 468:67–71. 354. Bondy-Denomy J, Pawluk A, Maxwell KL, Davidson AR (2013) Bacteriophage genes that inactivate the CRISPR/Cas bacterial immune system. Nature 493:429–32. 355. Seed KD, Lazinski DW, Calderwood SB, Camilli A (2013) A bacteriophage encodes its own CRISPR/Cas adaptive response to evade host innate immunity. Nature 494:489–91. 356. Yaung SJ, Esvelt KM, Church GM (2015) Complete Genome Sequences of T4-Like Bacteriophages RB3, RB5, RB6, RB7, RB9, RB10, RB27, RB33, RB55, RB59, and RB68. Genome Announc 3:e01122–14. 357. Russell R (1967) Speciation among the T-even bacteriophages. Dissertation (California Institute of Technology). 358. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. 198 359. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–9. 360. Delcher AL, Bratke KA, Powers EC, Salzberg SL (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23:673–9. 361. Schattner P, Brooks AN, Lowe TM (2005) The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res 33:W686–9. 362. Darling AE, Mau B, Perna NT (2010) progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5:e11147. 363. Lorenz R et al. (2011) ViennaRNA Package 2.0. Algorithms Mol Biol 6:26. 364. Robinson MD, McCarthy DJ, Smyth GK (2009) edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140. 365. Miller ES et al. (2003) Bacteriophage T4 genome. Microbiol Mol Biol Rev 67:86–156. 366. Greenberg GR, He P, Hilfinger J, Tseng MJ (1994) in Molecular biology of bacteriophage T4, pp 14–27. 367. Young P, Ohman M, Sjoberg BM (1994) Bacteriophage T4 gene 55.9 encodes an activity required for anaerobic ribonucleotide reduction. J Biol Chem 269:27815–27818. 368. Black LW, Showe MK, Steven AC (1994) in Molecular biology of bacteriophage T4, pp 218–258. 369. Tamboli CP, Neut C, Desreumaux P, Colombel JF (2004) Dysbiosis in inflammatory bowel disease. Gut 53:1–4. 370. Church GM (2013) Reading and writing omes. Mol Syst Biol 9:642. 199