The Eco-evolutionary Dynamics of Extrachromosomal Elements in Environmental Vibrio by MASSACH41 LSETTS INGTIMlTE OF TECHNOLOGY Hong Xue B.S., Sichuan University (1998) oc 27 LIB RARIES Submitted to the Department of Civil and Environmental Engineering in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2014 @2014 Massachusetts Institute of Technology. All rights reserved Signature redacted Signature of Author..................................................... Department of Civil and Environmental Engineering August 22, 2014 Signature redacted Certified by.......................... Martin F. Polz Professor of Civil and Environmental Engineering Thesis Advisor Signature redacted Accepted by........................ 20 ----------------.---Heidi M. Nepf Chair, Departmental Committe for Graduate Students The Eco-evolutionary Dynamics of Extrachromosomal Elements in Environmental Vibrio by Hong Xue Submitted to the Department of Civil and Environmental Engineering on August 22, 2014 in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Environmental Biology ABSTRACT Plasmids and other extrachromosomal elements (ECEs) are recognized as key factors mediating horizontal gene transfer; however, their diversity and dynamics among ecologically structured host populations in the wild remains poorly understood. Here we take a population-genomic approach to determine carriage of different types of ECEs in a recently established model for ecologically and genetically cohesive bacterial populations, asking whether different ECE types (i) are primarily associated to host phylogeny or ecology, (ii) have distinct transfer (and loss) patterns, and (iii) display different microevolutionary dynamics. We employed two models of environmental bacterial populations: a Vibrio cholerae population isolated from a coastal brackish pond (Oyster Pond, Woods Hole, MA), and diverse co-existing Vibrio populations comprising several species from Plum Island Sound (Ipswich, MA). High frequency (>40%) of a novel filamentous phage, VCYD, was detected in a collection of 531 isolates of V. cholerae. VCYD occurs both in the host-genome integrative form (IF) and a plasmid-like replicative form (RF). The relative frequency of each form differed among isolates from portions of the pond displaying different salinities, suggesting potential impact of host habitat on the biology of bacteriophages. Using the second model, we isolated 187 ECEs from 660 isolates previously categorized into 25 different ecologically and genetically cohesive populations. We identified the following elements: 22 bacteriophages, and 24 conjugative, 38 mobilizable and 103 so-called non-transmissible ECEs. While mobilizable ECEs require co-occurring conjugative plasmids for successful transfer, non-transmissible ECEs do not encode any genes for self-transfer. We further found that ECEs were significantly enriched in free-living cells, suggesting association of ECEs with host environment. The finding of phage as a major and stable ECE component is surprising and the absence of any integrase genes suggests that these are lysogens that do not integrate into the host genome. Finally, our data show that a type of plasmids previously defined as "nontransmissible" appears to be most common among Vibrio ECEs and that they have been transferred recently and frequently among distantly related populations through mechanisms yet to be uncovered. Overall, this study suggests a dynamic mobile gene pool with high turnover among host populations. Thesis Supervisor: Martin F. Polz Title: Professor of Civil and Environmental Engineering ACKNOWLWDGEMENT I want to express my deepest appreciation to my mentor Dr. Martin F. Polz, for the inspiration, generosity and patience he extended to me along this long journey that is finally coming to an end. Martin has never doubted on my ability and always encourages me to keep learning and never give up. I truly appreciate and value everything Martin has taught me and am very proud to be Martin's student. I thank all my thesis committee members, Dr. Edward Delong, Dr. Eric Alm and Dr. Janelle Thompson, for their valuable advises and support. I also particularly thank my coworker Dr. Otto Cordero, Dr. Francisco Camas and our collaborator Dr. William Trimble, Dr. Julien Guglielmini; Dr. Eduardo P. C. Rocha for their enthusiasm in developing pioneering tools that helped tremendously with this project. Without their efforts, this study could never reach so far. I sincerely thank my fellow student Katherine Kauffman, for never hesitating sharing her thoughts about my research and for always telling me to believe in myself. My gratitude is also extended to my dear colleagues, Dr. Young Boucher, Ms. Yan Xu, Dr. Dana Hunt, Dr. Sarah Preheim, Michael Cutler and all Polz lab members for making this lab such a loving family that I am honored to have been a part of. I would also like to express my special thanks to Dr. Matthew Waldor, Edward H. Kass Professor of Medicine at Harvard Medical School, for offering me the precious opportunity to participate in his research. His kindness, generosity and a great sense of humor has helped gain my confidence. I am honored to have worked with Mingshu Zhan from the EUROP program, and Rafal Sledziewski from Research Science Institute 2008 program. They all have shown their eagerness to explore and to try and it is my honor to have mentored them. Lastly, I want to thank my parents Yun Chao and Baoping Xue, for never showing any doubts on my ability, decision and stubbornness and for never stopping me from fulfilling my dream. I thank my wife, Su Xu, for helping with my thesis and creating a positive environment at home. I am proud to be the father of my two wonderful boys Yuran and Roman who make me happy whenever and wherever I go. -5- -6- Table of Contents Page Abstract............................................................................................3 Acknowledgements............................................................................. 5 Table of Contents............................................................................... 7 List of Figures.....................................................................................9 List of Tables .................................................................................... 10 Chapter 1 Introduction....................................................................11 Chapter 2 High Frequency of a Novel Filamentous Phage, VCYO, within an Environmental Vibrio cholerae Population.................................39 Chapter 3 Diversity and Dynamics of Excrachromosomal Elements among Ecologically-Defined Host Populations..............................................63 Chapter 4 Conclusions and Future Directions..........................................107 -7- -8- List of Figures Chapter One Figure 1. Comparison of two explanations for unexpected phylogenetic distribution........14 Figure 2. Overview of plasmids and conjugative transfer in the horizontal spread of gene. 17 Figure 3. Schematic models for type I, II and III partitioning system...................22 Figure 4. Schematic view of the genetic constitution of transmissible plasmids (A) and some essential interactions in the process of conjugation (B)..................26 Chapter Two Figure 1. Genome organization of VCYD phage..............................................62 Figure 2. Electron micrograph of VCY(D phage particles...................................54 Figure 3. attP site of VCYD and attB site of integration of VCYD into chromosome II of strain 4A01LW1........................................................................56 Chapter Three Figure 1. Distribution of ECEs among Vibrio hosts.........................................78 Figure 2. Distribution of the number of ECEs per strain for all Vibrio isolates with at least one ECE ................................................................................ 79 Figure 3. ECE family diversity as a function of family size, ECE size and classes...........82 Figure 4. ECE family distribution across the Vibrio phylogeny.......................... Figure 5. ECE genome cluster network........................................................84 Figure 6. Sequence alignment of ECEs in three representative clusters from the network analysis....................................................................................86 -9- 83 List of Tables: Chapter One Table 1. Summary of characterized Vibrio plasmids...........................................30 Chapter Two Table 1. List of primers used in this study......................................................48 Table 2. Frequency of the IF and RF of VCY(D phage........................................57 Chapter Three Table 1. An extend bar codes set beyond Roche Titanium-compatible bar codes kit........89 Table 2. The number, GC content and size of contigs.................................................. 90 Table 3. Classification, host and population of ECEs.......................................... 94 Table 4. Summary of strains carrying ECEs..................................................95 - 10- CHAPTER ONE Introduction -11- -12- 1. Chapter One: Introduction 1.1. Horizontal Gene Transfer (HGT) The determination of evolutionary relationships among microbes was enabled by comparison of 16S rRNA sequences. These proved useful for the analysis of phylogenetic relationships among all living organisms (1) since ribosomal RNAs are universally distributed among all three domains of life: Archaea, Bacteria and Eukaryotes. As a component of the translation systems, the 16S rRNA maintains high functional constancy and can be readily sequenced by use of PCR-based methods to amplify the gene (2). However, increasing availability of sequence data from other genes has shown that these can reveal dramatically different evolutionary relationships of species, causing incongruent signals (3) (Figure 1). While other hypotheses were being developed to explain such incongruence, Smith et al. (1992) proposed that horizontal gene transfer (HGT) might be an important contributing factor (4). In fact, the very first evidence of HGT was the observation that virulence determinants could be transferred between pneumococci in infected mice (5). This was later proven to be the consequence of the uptake of genetic material through transformation, which we now know is one of the principle HGT mechanisms; however, HGT was ignored for a long time as an event that occurs only occasionally under specific conditions. A breakthrough took place when comparative genomics of bacteria and archaea revealed that a significant amount of genes in bacteria were acquired from distantly related species (6, 7). For example, it was suggested that more than 20% of the ORFs originating from the genome of the bacterium Thermotoga maritima are homologous to archaeal species, indicating frequent cross-species HGT events (8, 9). Moreover, genomic analyses have 13 - - detected several microbial species containing two types of rRNA operons which come from different origins (10). a Species A b B D C Species A B C D Figure 1. Comparison of two explanations for unexpected phylogenetic distribution. a. The presence of a gene with characteristics that are typical for an unrelated group can be due to horizontal gene transfer (HGT, arrow). b. An alternative explanation is an ancient gene duplication (*) followed by differential gene loss (x). The more sister lineages have only the typical gene, the more independent gene-loss events must be postulated under this scenario. Gogarten J and Townsend J, Nature Reviews Microbiology, Volume 3, September 2005 (Reprinted by permission from Nature Publishing Group). Horizontal gene transfer can be detected by employing different methodologies such as atypical nucleotide composition, anomalous phylogenetic distribution, difference in gene contents among closely related species and incongruent phylogenetic trees (9, 11-13). In theory, HGT should cause incongruent phylogenetic tress when comparing different genes. Although other factors should not be ruled out, such as poor data and different species sampling for different genes, incongruent trees have been the most reliable way for identifying HGT. For example, Martinez et al detected phylogenetic incongruence between a tree based on 16S rRNA and on 48 PIB-type ATPase sequences, suggesting an ancient gene transfer from a member of the [-proteobacteria (14). - 14- In addition, atypical gene composition, referred to as compositional bias of codons or nucleotides (15-17), has also been used to identify HGT. In general, sequences that belong to the same genome share common patterns in G+C composition, and usage of codons and oligonucleotides that can be determined by natural selection and mutational bias (18, 19). Each species has its own characteristic evolutionary path, making it possible to detect HGT (20). Genes displaying base composition that is not typical of their host strains may suggest origin from distantly related donor organisms. HGT often results in appearance of a new gene in a particular species. If a gene is present in only some strains from one particular species, one might suspect involvement of HGT. However, such uneven distribution pattern can also be caused by gene loss or rapid sequence divergence (11). Therefore, distribution patterns alone might not be a reliable method in assessing HGT. Another method that looks at the patterns of best matches to different species is advantageous in speed and automatability but at the cost of accuracy, and is therefore not popularly employed in most cases (21). 1.2. Mechanisms of HGT HGT can occur through three principal mechanisms: natural transformation, transduction and conjugative transfer. Natural transformation refers to stable uptake of DNA, including plasmids and chromosomal DNA fragments, under natural growth conditions (22-24). It most often occurs when bacteria are exposed to environmental change such as temperature, nutrient supplies, or experience high cell density. To be stable, the uptake needs to be followed by integration of the alien DNA into the host genome (25). Newly acquired genes can cause deleterious effects in the recipient cells that may not survive selection whereas 15 - - genes that offer a selective advantage may improve the survival rate and hence expand in the population. Transduction is a process where genetic materials are transferred from donor cells to recipient cells via viral infection. Unlike in transformation, DNA in bacteriophage transduction is protected during the transfer and no cell-to-cell contact is involved (26, 27). Depending on the type of DNA material that is transferred, transduction can be referred to as generalized or specialized. In the former case, bacterial DNA fragments can be randomly packaged into phage particles. These infective particles can eject bacterial DNA into new host cells where the DNA might recombine with the chromosome. In specialized transduction, temperate phage that insert into the host chromosome can excise incorrectly taking a piece of host genome with them that borders the phage. This can lead to high transfer and recombination rates of these pieces of DNA (28). Conjugative transfer involves direct cell-to-cell contact where the DNA released from the donor cells is channeled through a cell junction into the recipient cell. Conjugative transfer is often associated with circular DNA-plasmids. As shown in the step-by-step illustration of conjugative transfer in Figure 2 (29), transfer of plasmids is facilitated by a pore complex temporarily formed at the tight cell junction. After the transfer process is completed, the complex collapses until the next transfer occurs. After the plasmids enter the recipient cells, they may either maintain their independence from their host chromosomes or be integrated into the host chromosome by recombination. Plasmids that do not become part of the chromosome may replicate independently using a replication machinery encoded by the plasmids themselves. 16 - - Figure 2. Overview of plasmids and conjugative transfer in the horizontal spread of genes. In the donor, the events depicted are: a, integration of the plasmid into the chromosome by recombination between insertion sequence elements; b, movement of a transposable element through a circular intermediate from the chromosome to the plasmids; c, initiation of rolling-circle replication at the mating-pair apparatus. In the recipient cell, the events dedicated are: d, recircularization; e, attack by restriction endonucleases (scissors); f, replication; g, integration into the chromosome by an illegitimate Campbell recombination; h, recombination between transferred chromosomal DNA and the resident chromosome. Thomas C and Nielsen K, Nature Reviews Microbiology Volume 3, September 2005 (Reprinted by permission from Nature Publishing Group) 1.3. Plasmids as a vector of HGT 1.3.1. Overview of plasmids and their impact on microbial ecology and evolution Plasmids are extra-chromosomal, autonomously replicating genetic units that play -17- important roles in the ecology and evolution of microbes and are present in all three domains of life (30). The diverse characteristics of plasmids, including size, host range, genetic composition and function, have a significant impact on bacterial diversity, habitat association and adaptation, and ultimately microbial evolution (31). As detailed above, many plasmids can be mobile among host cells through conjugation and are thus a key element of a gene pool that can be rapidly gained and lost from bacterial host populations. The mosaic nature of plasmids has been widely accepted especially the fact that they may carry genes that display varied evolutionary ancestry. During the evolutionary process, plasmids have developed their own replication, transfer and maintenance system to ensure their survival in the host strains and are therefore considered 'selfish' genetic elements. Plasmids can harbor a wide variety of genes, including backbone (encoding transfer, maintenance and incompatibility) and accessory genes (encoding many different functions, including resistance, detoxification and metabolism) (32). While the backbone genes are essential for their transfer, maintenance and incompatibility, the accessory genetic components can offer new features to the host strains such as enhancing host fitness under specific environmental conditions. However, the functions of a large portion of plasmid genomes remain unknown. 1.3.2. Structure of plasmids 1.3.2.1. Replication system Many plasmids have developed a complete replication system to ensure successful vertical transmission from the mother to daughter cells. The replication module can determine the copy number of a plasmid and is essential to survival and likelihood of transmission into -18 - the new host cells. Plasmids of narrow host range and broad host range have developed replication systems that use different types of machineries provided by the plasmids in association with components produced by the host (33). Initially, information about the mechanisms of narrow-host-range plasmid replication was largely obtained by examining representative plasmids that belong to the Enterobacteriaceae. It was very recently found that, some narrow-host-range plasmids can also replicate in non-enteric bacterial species (34). Nonetheless, within this group, two subgroups are classified depending on whether a plasmid-encoded protein for replication (Rep protein) is involved. An example of Repindependent replication is CoEI, a 6.6 kb E. coli plasmid that requires host produced proteins for initiation of replication such as DNA-dependent RNA polymerase, RNase H, DNA Pol I, DNA gyrase and topoisomerase but not Rep proteins. In an either uni- or bidirectionally 0-shaped manner (35), the replication process starts by binding of these host proteins at the origin site (ori) in a 0.6 kb region on the plasmid. In contrast, other enterobacterial plasmids unrelated to CoEI utilize a replicon containing different structural components. For example, pSC 101 plasmids, obtained from Salmonella panama, contain the following four components: a gene coding for the Rep protein, clusters of direct repeats (iterons), binding sites for the DnaA protein and A+T rich sequences (36). Initiation of replication involves binding of RepA protein to the iteron region to begin the formation of a replisome. DnaA protein, produced by the host, then recognizes the oriC site on the plasmid and recruiting more proteins to form replisome. In contrast to these narrow host range plasmids, broad-host-range plasmids are capable of replication and maintenance in diverse, unrelated bacteria and therefore have developed more complex replication systems (37-39). Plasmids in this category can be classified, based on their incompatibility, into several large groups such as IncC, IncJ, IncP, IncQ and - 19- IncW (40-44). Incompatibility refers to the inability of two or multiple plasmids with the same replicon system to coexist in the same cells. It is thought that this incompatibility is based on competition for replication, and that the inferior competitor is eventually lost by the host cell. The best-studied examples are in the IncQ group (RSF1010, R1162 and R300B) (45, 46). These are all multicopy plasmids with nearly identical structure but isolated from different bacterial hosts. In this group, replication initiation requires RepA, RepB and RepC, the latter of which recognizes the origin and binds to the iterons of the larger cis region. Their replication does not depend on DnaA as is in the case of pSC101 plasmid, but requires DNA Pol III and gyrase, similar to CoEI plasmids. The presence of the Rep proteins encoded by plasmids and the independence of the replication from host produced DnaA protein is an important contributing factor to the broad-host-range character of these plasmids. In addition to all the above traditional classification of the replication system, a RNA dependent system has recently been reported in marine bacteria, represented by the plasmid pB 1067 of Vibrio nigripulchritudo(47). The ori region in this type of plasmids does not encode Rep protein to initiate DNA replication, rather it encodes two RNAs - RNA I and RNA II containing complementary sequences that are transcribed from opposite DNA strands. RNA I is the smaller RNA that contains about 68 nucleotides and functions as the negative regulator and is also an essential determinant in plasmid compatibility. The longer RNA, ranging in size between 250 and 500 nucleotides, is termed RNA II and remains inactive by forming a complex with RNA I between the single-stranded loop region of each RNA which therefore inhibit replication. A similar RNA replication mechanism prior to this discovery has only been observed for ColE 1 and related plasmids from the -20- Enterobacteriaceae(35). Replication of this type of plasmids involves RNA II; however, inhibition of RNA II activity is achieved by binding of an antisense RNA molecule instead of RNA I in the case of pB1067. Sequence and structure analysis of these RNA revealed that the two types of replicons employed by pB1067 and COEI plasmid are not related, indicating that they may have emerged through independent evolution pathways. In spite of these differences, all plasmids employ a replication system that involves genetic elements carried in the plasmids themselves as well as proteins produced by the host strains. The complex combination of these elements involved in the system offers the plasmids different features, whether they have low or high copy number in the host strain, or whether they have a restricted or broad range of host. However, they all share one simple goal, to ensure transmission into the daughter cells. 1.3.2.2. Maintenance system In addition to the replication systems, plasmids employ a variety of mechanisms to ensure their maintenance in the host strains: partitioning systems, postsegregational host killing systems, and site-specific resolution systems. These mechanisms can be differentially associated with host species or plasmid types; however, they all serve the same ultimate goal, to enhance the survival rate of plasmids (48). Partitioning systems are typically found in low copy number plasmids and are usually small with relatively simple organization (49). These plasmids share a central mechanism of using oligonucleotide-driven cytomotive filaments to relocate replicated plasmids (48). To date, most partitioning systems identified consist of a DNA-binding site (par site), an adaptor protein, and a nucleotide binding motor protein. The specific function of the adaptor protein is to recognize the par site. The motor protein, by interacting with the -21- adaptor protein DNA complex, helps to distribute plasmids so that each daughter cell contains at least one plasmid during division. Partitioning systems, depending on the composition of the motor proteins, can be classified into three categories. Type I encode a NTPase called ParA and a centromere binding protein (CBP) called parB, which function together in a "pulling" manner, as illustrated in Figure 3 (48). Type II, the best understood type, encodes ParM and ParR that function in a "Pushing" style through an insertion polymerization mechanism. Type III, which has recently been characterized from Bacillus thuringiensispBtoxins, encodes TubZ and TubR that function together in a "trimming" style (Figure 3) (48), a mechanism that is distinct from both Type I and II. In brief, TubR multimer first recognizes the C-termini of TubZ-GTP filaments and the captured TubR is then transported to the cell pole. Once this complex reaches the cell membrane, the TubZ filament undergoes a conformational change to release TubR and can be recycled for the next transportation. Postsegregational killing systems, also named toxin-antitoxin (TA) systems, ensure plasmid stability by killing plasmid-free daughter cells (50). So far, toxins characterized in all classified TA system are proteins, whereas antitoxins can be either proteins or small molecules. Under normal circumstances, the antitoxin is expressed at a much higher level than the toxin so that toxin action is inhibited, enhancing the survival rate of the host cells. -22- Type I Type I "pulling" Type III "tramming" "pushing" *PWr-NTP 0 PrA-ATP APtrcm PC a II DPs fI +i s) TpZe A 0 P Cwrnnt Op~ini Strucur Biolog Figure. 3. Schematic models for type I, II and III partitioning system. (a) Type I partition utilizes the host cell nucleoid as a 'track' for NTPase-ATP binding and polymerization (square). When the NTPase-ATP polymer encounters a ParBcentromere partition complex (shown as a circle), that is, the ParB attached plasmid, the NTPase activity is activated resulting in dissociation of capping ParA-ADP subunits (triangles) and polymer retraction. The ParBplasmid is either pulled along in the retreating ParA polymer or is attracted and diffuses toward the moving polymer. The ultimate outcome is the dynamic equi-distribution of ParB-plasmids at opposite ends of the nucleoid. (b) Type II partition uses a pushing or insertional polymerization mode of segregation. In this model, the dynamically unstable ParM filaments are stabilized and propagate only when each end is captured by a ParRcentromere partition complex. The polymer continues to grow upon addition of ParM-ATP or ParM-GTP subunits to the ParR-ParM +interface. The outcome is redistribution of replicated plasmids to opposite poles. (c) Type III partition employs a tram mechanism of partition. TubR binds the centromere serving as a high local concentration of binding sites for the Cterminal flexible domains emanating from treadmilling TubZ filaments. Once captured, the TubR-plasmid is transported to the cell pole by the treadmilling TubZ filaments. Upon reaching the membrane the TubZ filament bends, likely dumping its TubRplasmid cargo, and reverses direction. Now traveling in the opposite direction, the TubZ filament binds another TubR-plasmid cargo and carries it to the opposite pole. Schumacher M, Current Opinion in StructuralBiology, Volume 39, 2012 (Reprinted by permission from Elsevier Limited). If the plasmid is lost during the cell division, the plasmid-free daughter cells will no longer be protected from toxin action and will be killed by the activated toxin through interfering with key intracellular biological processes such as translation, cytoskeleton synthesis, cell membrane and cell wall biosynthesis and replication (51). Depending on the molecular -23- nature of the antitoxin as well as the type of interaction with the toxin, TA modules are classified into 5 different types (Type I to V) (52). The antitoxins of type I and III are both small non-coding RNAs but differ in the mode of interaction with the toxin. In type I, the antitoxin down-regulates toxin production by base pair matching with the stable toxin mRNA (53). As a consequence, toxin mRNA cannot bind to the ribosome, preventing further translation of the toxin from its mRNA. In contrast, type III systems achieve suppression of the toxin binding not through inhibiting translation but by directly binding to toxin proteins (54). Antitoxins in all other 3 classes are small proteins that interact with the toxin by forming a protein-protein complex (type II), interfering with cytoskeleton assembly (type IV) (55) and preventing translation of toxin (type V) (56). In addition to the above systems, plasmids are frequently found to utilize a DNA sitespecific resolution systems to be maintained in the host strains (49). Plasmids can remain in cells with more than one copy, which increases the chances of plasmids to be replicated. However, this beneficial trait can also cause instability of the plasmids due to high rate of formation of dimmers and even multimers through recombination with each other (49). These multimers can be very unstable and may eventually be lost from the host cells. To prevent the loss due to multimerization, plasmids make use of a host cell-encoded enzyme complex that converts multimers into monomers by recognizing a cer site typically found in plasmids with high copy number (57). Resolution of multimeric forms of plasmids is facilitated by site-specific recombination that occurs at the duplicated replicon sites through the enzymatic reaction catalyzed by recombinase. Most bacteria encode their own recombinase that is specific for the target recombination sites with the exception of a few species that utilize host-encoded recombination system. -24- 1.3.2.3. Conjugation system In addition to the replication and maintenance systems, some plasmids, also called mobile conjugative elements (MCE), utilize conjugative systems to allow horizontal transmission (58). Typically, a full set of conjugative components of a conjugative plasmid contains four apparatuses: an origin of transfer (oriT), a relaxase, a type IV secretion system and a type IV coupling protein (T4CP) (59). Plasmids equipped with the full set of components are identified as self-transmissible or conjugative plasmids, whereas, plasmids equipped with a minimal set including only the site of origin, a relaxase and one or more nickingaccessory protein are often referred to as mobilizable plasmids (60). Here we focus mainly on the structural organization of a conjugative plasmid. The process of conjugation is initiated upon the relaxase recognizing the origin of transfer (oriT), followed by catalytic cleavage at this site, producing the DNA strand to be transferred. The transportation of the plasmid into the recipient cell is facilitated by T4SS, a membrane associated complex of 12 to 30 proteins (61). The protein complex forms a mating channel for single stranded DNA to pass through. The DNA is further released into the recipient cells by the T4CP, a protein complex that is attached to the inner cell membrane and that interacts with both T4SS and the secretion substrate (62). It has been postulated that T4CP may function as a DNA pump during the conjugative transfer (63). Mobilizable plasmids, which are not self-transmissible because of lacking functions required for mating pair formation, usually carry genetic elements encoding relaxosome components and the origin of transfer (oriT), which is a short DNA sequence required in cis for a plasmid to be conjugatively transmissible (Figure 4). Initiation of DNA transfer utilizing relaxase follows a similar mechanism to conjugative plasmids. The subsequent -25- transfer process relies on conjugative components expressed by co-existing selftransmissible or conjugative plasmids in the same host strain (60). A B mobilizable Jconjugative * MOB /mni MPF Figure 4. Schematic view of the genetic constitution of transmissible plasmids (A) and some essential interactions in the process of conjugation (B). (A) Selftransmissible or conjugative plasmids code for the four components of a conjugative apparatus: an origin of transfer (oriT) (violet), a relaxase (R) (red), a type IV coupling protein (T4CP) (green), and a type IV secretion system (T4SS) (blue). The T4SS is, in fact, a complex of 12 to 30 proteins, depending on the system (see text). Mobilizable plasmids contain just a MOB module (with or without the T4CP) and need the MPF of a coresident conjugative plasmid to become transmissible by conjugation. (B) The relaxase cleaves a specific site within oriT, and this step starts conjugation. The DNA strand that contains the relaxase protein covalently bound to its 5 end is displaced by an ongoing conjugative DNA replication process. The relaxase interacts with the T4CP and then with other components of the T4SS. As a result, it is transported to the recipient cell, with the DNA threaded to it. Subsequently, the DNA is pumped into the recipient by the ATPase activity of the T4CP (Smillie, C, et al. Microbiology and Molecular Biology Reviews, Volume 74, 2010 (Reprinted by permission from ASM Press). 1.4. Phage with plasmid structure Bacteriophage are tremendously abundant on earth, with an estimated 10" phage particles found in the biosphere (64, 65). Phage infection occurs frequently with up to 1023 infections every second, suggesting that bacteriophage are a highly dynamic biological force (66). To date, phage sequence information is obtained by different approaches, including analysis -26- of laboratory isolated phages, viral metagenomics as well as prophage mining (67). These methods complement each other and have enriched our knowledge of phage. Phages carry genetic material in different forms: RNA (68), single stranded (ss) (69) and double stranded (ds) DNA (70), where the latter appears to comprise the vast majority of bacteriophage. The genome sizes of these dsDNA phage range from 3 kbp to 500 kbp (67), and the size of the virus capsule the genome is packaged into can vary accordingly. Because the genome size of a phage or the amount of DNA packaged into the capsule can directly determine the virion infectivity, loss or acquisition of new DNA materials can be immediately influential in bacteriophage evolution (67). Comparative analysis reveals that different regions of phage genomes display distinct evolutionary history, suggesting that phage genomes are highly mosaic (71). One possible explanation could be that HGT and recombination plays an important role. Some phages demonstrate features very similar to plasmids, which are known to be a vehicle in HGT. For example, the lambdoid phage N15 of E. coli, unlike many typical temperate phage, is not integrated into the host chromosomes. Instead, they are extrachromosomal self-replicating DNA with covalently closed ends, a structural organization typically employed by plasmids (72). In another case, bacteriophage P1 is found to be maintained as a plasmid prophage with low copy number in host strains (73). P1 phage can express the recombinase Cre that assists with multiple plasmid maintenance functions such as resolving plasmid multimers and maintaining low copy numbers. Perhaps, some phages have evolved to carry their own replication system while maintaining their infection system. The mosaic nature of these phages, that is similar to that observed in plasmids, may reflect -27 - frequent HGT that occurs not only in bacterial population but also in phages. 1.5. Vibrio as a model system Vibrio (Vibrionaceae)are gram-negative gamma-Proteobacteriathat have long been used as models for studying heterotrophic processes in the ocean (74, 75). Vibrio are motile, metabolically and ecologically versatile members of coastal plankton. Cultivationdependent and -independent studies have consistently detected Vibrio with high densities in and/or on marine macroorganisms, including fish, corals, mollusks, sea grass, shrimp and zooplankton (75, 76). They have also been found to occur free-living in the water column as well as associated with various types of organic particles and organisms (77). While some strains are well-known human pathogens such as V. cholerae, V. parahaemolyticus, and V. vulnificus, most Vibrio are non-pathogenic or occasional pathogens of marine organisms. Several properties of Vibrio make them a good model for studying plasmid diversity: culturability, taxonomic breadth and ecological diversity. Among all environmental bacteria, Vibrio are one of the few that can be easily cultured in the laboratory, and isolation tools established so far have proven to be efficient. In this study, we performed our analyses on two Vibrio collections previously obtained by our laboratory from two separate geographical locations in both spring and fall. From the surface water of Oyster Pond and lagoon in Woods Hole MA, only Vibrio cholerae strains were collected, whereas, a much higher diversity of Vibrio strains was isolated from the surface water of Plum Island Estuary in Ipswich MA. Isolation of the cells, classification as well as further characterization were described in detail in previously published results (78). In brief, cells isolated from both populations were separated into four size fractions, each containing microorganisms and organic material and/or larger organisms of different -28- origins. Particles are considered to be enriched in zooplankton if larger than 63um, enriched in organic particles if the size ranges between 5-63, and in larger cells or cells attached to very small particulate matter if between 1-5um. Clearly free-living cells were obtained if size range between 0.22 to lum. In collaboration with the Alm lab, our lab developed an AdaptML model that predicted six habitats from the samples collected, based on the season and size fraction information of each strain in relation to their individual distribution. Through further analyses of the six habitats in relation to their phylogenetic relationship, all strains were grouped into 25 populations. 1.6. ECEs in Vibrio Studies on the diversity and distribution of ECEs in Vibrio have been primarily focused on pathogenic strains because of evidence that ECEs contribute to virulence. In addition, ECEs are also a major player in the spread of antibiotic resistance among Vibrio, particularly among Vibrio cholerae strains. Nonetheless, some recent studies have shown that ECEs can be identified in frequently studied Vibrio species including environmental strains (76). To date, the complete sequence of 37 plasmids identified from Vibrio has been deposited in Genbank (Table I). Some strains carry only one plasmid while others carry more than one type. The size of these plasmids varies considerably, ranging from 2 to 250 kbp. Studies that focus on pathogenic Vibrio strains have shown that they may require plasmids to cause diseases in fish and various invertebrates. For example, a 65 kbp plasmid pJM1 detected in V. anguillarumwas shown to cause fatal hemorrhagic septicemic disease in salmon and other fish (79). Complete genomic sequence analysis revealed that the genetic components - 29- Table 1. Summary of characterized Vibrio plasmids. All data were obtained from Genbank as of June 25 2014. Plasmid name Host Accession numbers pVAE259 Vibrio alginolyticus Vibrio anguillarum Vibrio anguillarum775 NC_013178 NC_019325 pJv pJM1 Size (bp) 6075 Topology 5982 65009 circular circular circular 89003 circular circular circular circular unnamed pVIBHAR Vibrio campbellii ATCC BAA- 1116 NC_005250 NC_022271 Vibrio campbellii ATCC BAA- 1116 NC_009777 pVCG4.1 pVCG1.2 Vibrio cholerae Vibrio cholerae NC_010910 NC_010899 89008 2163 2357 pVCG1.1 Vibrio cholerae NC_010897 4439 circular pTLC Vibrio cholerae NC_004982 4719 circular pSIO1 Vibrio cholerae NC_006860 4906 circular pVCR94deltaX Vibrio cholerae NC_023291 120572 circular pVC unnamed pES100 pMJ100 Vibrio cincinnatiensis NC_019241 6309 circular Vibrio coraliilyticus Vibrioflscheri ES114 Vibriofischeri MJ 11 NC_020451 NC_006842 NC_011185 26631 45849 circular circular circular pBD146 pVCR1 Vibriofluvialis NC_011797 NC_021808 pVCR1 pSFnl VIBNI pA pZY5 Vibrio harveyi Vibrio harveyi 179459 7472 9615 9615 circular circular NC_023279 NC_010733 11237 circular linear NC_015156 NC_012859 NC_002088 247271 3504 4839 circular circular circular circular circular pSA19 Vibrio nigripulchritudo Vibrio nigripulchritudo Vibrio parahaemolyticus Vibrio parahaemolyticus unnamed pO3K6 Vibrio parahaemolyticus Vibrio parahaemolyticus NC_021292 NC_002473 7138 8784 pAK1 p09O22A Vibrio shilonii NC_010734 13415 linear Vibrio sp. 09022 Vibrio sp. 0908 Vibrio sp. 23023 Vibrio sp. 41 Vibrio sp. TC68 Vibrio tapetis Vibrio vulmnficus Vibrio vulnificus Vibrio vulnificus Vibrio vulnificus Vibrio vulnificus VVybl (BT3) NC_010114 NC_010113 31036 circular circular Vibrio vulnificus YJ016 NC_005128 0908 p p23023 pPS41 pTC68 pVT1 pMP1 pC4602-1 pC4602-2 pR99 unnamed pYJ016 -30- NC_010112 NC_004961 81413 52527 6886 NC_008690 NC_010614 NC_012758 NC_009702 NC_009703 NC_009701 7847 82266 7628 56628 66946 68446 NZCM001801 39190 48508 circular circular circular circular circular circular circular circular circular circular of pJMI encode proteins involved in the production of a siderophore, a key element in V. anguillarumpathogenicity. Other plasmids have also been linked to Vibrio virulence such as an 11.2 kbp plasmid pSFnl in V. nigripulchritudo(80). Vibrio strains possessing this plasmid can cause diseases in shrimp. Similarly, the coral pathogen V. shiloni was found to harbor a 13.4 kbp plasmid pAK1 that demonstrates similar genetic composition with pSFn1 (81). Overall, this suggests that plasmids are an important virulence factor in Vibrio strains that can cause diseases in a variety of ocean life. While these studies focused on the identification of plasmids in pathogenic Vibrio strains, only few reports have addressed the diversity of plasmids in environmental Vibrio strains. For example, the presence of plasmids has been assessed in several environmental Vibrio strains, including three coastal strains of V. fluvialis, V. mediterranei and V. campbellii (82). Plasmids of different sizes were detected in all three strains, ranging from 31 to 81 kbp and all plasmids shared similar G+C contents. Sequence analysis revealed that all three plasmids encode different proteins, suggesting that genetic organization and transfer mechanisms of plasmids is diverse even in a small group of Vibrio strains (82). In another study, several small plasmids were characterized from V. parahaemolyticus;however, these plasmids only encode hypothetical proteins and their roles in the host are still unknown. Although these studies suggest that plasmids are fairly frequent in Vibrio and can transfer potentially beneficial functions to the host, there is little comprehensive information on diversity, evolutionary dynamics, association of plasmids. -31- abundance and environmental 1.7. Goals of this thesis The overall goal of this study is to provide a deeper understanding of plasmid (and other extrachromosomal element) diversity and their distribution among Vibrio populations in the environment. We seek to address the following questions: (1) What is the diversity of the ECE backbone system encoding basic replication, maintenance, incompatibility and transfer processes? (2) What is the relationship between plasmids and their host and the potential association of their occurrence with the environment? (3) What is the distribution pattern and eco-evolutionary dynamics of plasmids among Vibrio populations? - - 32 References 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. Woese CR, Kandler 0, Wheelis ML. 1990. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc NatI Acad Sci U S A 87:4576-4579. Woese CR. 1987. Bacterial evolution. Microbiological reviews 51:221-271. Gogarten JP, Townsend JP. 2005. Horizontal gene transfer, genome innovation and evolution. Nat Rev Microbiol 3:679-687. Smith MW, Feng DF, Doolittle RF. 1992. Evolution by acquisition: the case for horizontal gene transfers. Trends in biochemical sciences 17:489-493. Griffith F. 1928. The Significance of Pneumococcal Types. The Journal of hygiene 27:113-159. Nakamura Y, Itoh T, Matsuda H, Gojobori T. 2004. Biased biological functions of horizontally transferred genes in prokaryotic genomes. Nature genetics 36:760766. Vocke C, Bastia D. 1983. Primary structure of the essential replicon of the plasmid pSC101. Proc Natl Acad Sci U S A 80:6557-6561. Garcia-Vallve S, Romeu A, Palau J. 2000. Horizontal gene transfer in bacterial and archaeal complete genomes. Genome Res 10:1719-1725. Ragan MA. 2001. Detection of lateral gene transfer among microbial genomes. Curr Opin Genet Dev 11:620-626. Kunnimalaiyaan M, Stevenson DM, Zhou Y, Vary PS. 2001. Analysis of the replicon region and identification of an rRNA operon on pBM400 of Bacillus megaterium QM B1551. Mol Microbiol 39:1010-1021. Eisen JA. 2000. Horizontal gene transfer among microbial genomes: new insights from complete genome analysis. Curr Opin Genet Dev 10:606-611. Koonin EV, Makarova KS, Aravind L. 2001. Horizontal gene transfer in prokaryotes: quantification and classification. Annual review of microbiology 55:709-742. Ragan MA. 2001. On surrogate methods for detecting lateral gene transfer. FEMS microbiology letters 201:187-191. Martinez RJ, Wang Y, Raimondo MA, Coombs JM, Barkay T, Sobecky PA. 2006. Horizontal gene transfer of PIB-type ATPases among bacteria isolated from radionuclide- and metal-contaminated subsurface soils. Appl Environ Microbiol 72:3111-3118. Lawrence JG, Ochman H. 1998. Molecular archaeology of the Escherichia coli genome. Proc Natl Acad Sci U S A 95:9413-9417. Gogarten JP, Doolittle WF, Lawrence JG. 2002. Prokaryotic evolution in light of gene transfer. Molecular biology and evolution 19:2226-2238. Clarke GD, Beiko RG, Ragan MA, Charlebois RL. 2002. Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores. J Bacteriol 184:2072-2080. 33 - 1. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. Medigue C, Rouxel T, Vigier P, Henaut A, Danchin A. 1991. Evidence for horizontal gene transfer in Escherichia coli speciation. J Mol Biol 222:851-856. Lawrence JG, Ochman H. 1997. Amelioration of bacterial genomes: rates of change and exchange. J Mol Evol 44:383-397. Daubin V, Lerat E, Perriere G. 2003. The source of laterally transferred genes in bacterial genomes. Genome Biol 4:R57. Eisen JA. 1995. The RecA protein as a model molecule for molecular systematic studies of bacteria: comparison of trees of RecAs and 16S rRNAs from the same species. J Mol Evol 41:1105-1123. Chen I, Dubnau D. 2004. DNA uptake during bacterial transformation. Nat Rev Microbiol 2:241-249. Dubnau D. 1999. DNA uptake in bacteria. Annual review of microbiology 53:217244. Dubeikovskii AN, Boronin AM. 1990. [Localization in Escherichia coli of transcribed regions of the broad host range plasmid pBS222]. Molekuliarnaia genetika, mikrobiologiia i virusologiia:27-29. de Vries J, Wackernagel W. 2002. Integration of foreign DNA during natural transformation of Acinetobacter sp. by homology-facilitated illegitimate recombination. Proc Natl Acad Sci U S A 99:2094-2099. Wommack KE, Colwell RR. 2000. Virioplankton: viruses in aquatic ecosystems. Microbiology and molecular biology reviews: MMBR 64:69-114. Ashelford KE, Day MJ, Fry JC. 2003. Elevated abundance of bacteriophage infecting bacteria in soil. Appi Environ Microbiol 69:285-289. Heuer H, Smalla K. 2007. Horizontal gene transfer between bacteria. Environmental biosafety research 6:3-13. Thomas CM, Nielsen KM. 2005. Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat Rev Microbiol 3:711-721. van Elsas JD, Bailey MJ. 2002. The ecology of transfer of mobile genetic elements. FEMS microbiology ecology 42:187-197. Johnson TJ, Nolan LK. 2009. Pathogenomics of the virulence plasmids of Escherichia coli. Microbiology and molecular biology reviews: MMBR 73:750774. Boltner D, MacMahon C, Pembroke JT, Strike P, Osborn AM. 2002. R391: a conjugative integrating mosaic comprised of phage, plasmid, and transposon elements. J Bacteriol 184:5158-5169. Chattoraj DK, Snyder KM, Abeles AL. 1985. P1 plasmid replication: multiple functions of RepA protein at the origin. Proc Natl Acad Sci U S A 82:2588-2592. Rakowski SA, Filutowicz M. 2013. Plasmid R6K replication control. Plasmid 69:231-242. del Solar G, Giraldo R, Ruiz-Echevarria MJ, Espinosa M, Diaz-Orejas R. 1998. Replication and control of circular bacterial plasmids. Microbiology and molecular biology reviews: MMBR 62:434-464. -34- 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. Ingmer H, Miller C, Cohen SN. 2001. The RepA protein of plasmid pSC101 controls Escherichia coli cell division through the SOS response. Mol Microbiol 42:519-526. Kolatka K, Kubik S, Rajewska M, Konieczny I. 2010. Replication and partitioning of the broad-host-range plasmid RK2. Plasmid 64:119-134. Leao SC, Matsumoto CK, Carneiro A, Ramos RT, Nogueira CL, Junior JD, Lima KV, Lopes ML, Schneider H, Azevedo VA, da Costa da Silva A. 2013. Correction: The Detection and Sequencing of a Broad-Host-Range Conjugative IncP-1beta Plasmid in an Epidemic Strain of subsp. PLoS One 8. Leao SC, Matsumoto CK, Carneiro A, Ramos RT, Nogueira CL, Lima JD, Jr., Lima KV, Lopes ML, Schneider H, Azevedo VA, da Costa da Silva A. 2013. The detection and sequencing of a broad-host-range conjugative lncP-1beta plasmid in an epidemic strain of Mycobacterium abscessus subsp. bolletii. PLoS One 8:e60746. Uga H, Matsunaga F, Wada C. 1999. Regulation of DNA replication by iterons: an interaction between the ori2 and incC regions mediated by RepE-bound iterons inhibits DNA replication of mini-F plasmid in Escherichia coli. The EMBO journal 18:3856-3867. van Zyl U, Deane SM, Rawlings DE. 2003. Analysis of the mobilization region of the broad-host-range IncQ-like plasmid pTC-F14 and its ability to interact with a related plasmid, pTF-FC2. J Bacteriol 185:6104-6111. Pembroke JT, Murphy DB. 2000. Isolation and analysis of a circular form of the Ind conjugative transposon-like elements, R391 and R997: implications for IncJ incompatibility. FEMS microbiology letters 187:133-138. Jacquet MA, Ehrlich R. 1985. In vivo and in vitro effect of mutations in tetA promoter from pSC101: insertion of poly(dA.dT) stretch in the spacer region does not inactivate the promoter. Biochimie 67:987-997. Fernandez-Lopez R, Garcillan-Barcia MP, Revilla C, Lazaro M, Vielva L, de la Cruz F. 2006. Dynamics of the IncW genetic backbone imply general trends in conjugative plasmid evolution. FEMS microbiology reviews 30:942-966. Sakai H, Komano T. 1996. DNA replication of IncQ broad-host-range plasmids in gram-negative bacteria. Bioscience, biotechnology, and biochemistry 60:377382. Loftie-Eaton W, Rawlings DE. 2012. Diversity, biology and evolution of IncQfamily plasmids. Plasmid 67:15-34. Le Roux F, Davis BM, Waldor MK. 2011. Conserved small RNAs govern replication and incompatibility of a diverse new plasmid family from marine bacteria. Nucleic Acids Res 39:1004-1013. Schumacher MA. 2012. Bacterial plasmid partition machinery: a minimalist approach to survival. Curr Opin Struct Biol 22:72-79. Hsu CC, Chen CW. 2010. Linear plasmid SLP2 is maintained by partitioning, intrahyphal spread, and conjugal transfer in Streptomyces. J Bacteriol 192:307315. -35- 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. Ogura T, Hiraga S. 1983. Mini-F plasmid genes that couple host cell division to plasmid proliferation. Proc Natl Acad Sci U S A 80:4784-4788. Unterholzner SJ, Poppenberger B, Rozhon W. 2013. Toxin-antitoxin systems: Biology, identification, and application. Mobile genetic elements 3:e26219. Guglielmini J, Szpirer C, Milinkovitch MC. 2008. Automated discovery and phylogenetic analysis of new toxin-antitoxin systems. BMC Microbiol 8:104. Brantl S. 2012. Bacterial type I toxin-antitoxin systems. RNA biology 9:14881490. Blower TR, Short FL, Rao F, Mizuguchi K, Pei XY, Fineran PC, Luisi BF, Salmond GP. 2012. Identification and classification of bacterial Type Ill toxin-antitoxin systems encoded in chromosomal and plasmid genomes. Nucleic Acids Res 40:6158-6173. Masuda H, Tan Q, Awano N, Wu KP, Inouye M. 2012. YeeU enhances the bundling of cytoskeletal polymers of MreB and FtsZ, antagonizing the CbtA (YeeV) toxicity in Escherichia coli. Mol Microbiol 84:979-989. Wang X, Lord DM, Cheng HY, Osbourne DO, Hong SH, Sanchez-Torres V, Quiroga C, Zheng K, Herrmann T, Peti W, Benedik MJ, Page R, Wood TK. 2012. A new type V toxin-antitoxin system where mRNA for toxin GhoT is cleaved by antitoxin GhoS. Nature chemical biology 8:855-861. Tolmasky ME, Colloms S, Blakely G, Sherratt DJ. 2000. Stability by multimer resolution of pJHCMW1 is due to the Tn1331 resolvase and not to the Escherichia coli Xer system. Microbiology 146 ( Pt 3):581-589. Davison J. 1999. Genetic exchange between bacteria in the environment. Plasmid 42:73-91. Garcillan-Barcia MP, Francia MV, de la Cruz F. 2009. The diversity of conjugative relaxases and its application in plasmid classification. FEMS microbiology reviews 33:657-687. Smillie C, Garcillan-Barcia MP, Francia MV, Rocha EP, de la Cruz F. 2010. Mobility of plasmids. Microbiology and molecular biology reviews : MMBR 74:434-452. Alvarez-Martinez CE, Christie PJ. 2009. Biological diversity of prokaryotic type IV secretion systems. Microbiology and molecular biology reviews : MMBR 73:775808. Llosa M, Zunzunegui S, de la Cruz F. 2003. Conjugative coupling proteins interact with cognate and heterologous VirB10-like proteins while exhibiting specificity for cognate relaxosomes. Proc Natl Acad Sci U S A 100:10465-10470. Tato I, Matilla I, Arechaga I, Zunzunegui S, de la Cruz F, Cabezon E. 2007. The ATPase activity of the DNA transporter TrwB is modulated by protein TrwA: implications for a common assembly mechanism of DNA translocating motors. The Journal of biological chemistry 282:25569-25576. Hatfull GF. 2008. Bacteriophage genomics. Curr Opin Microbiol 11:447-453. Hendrix RW, Hatfull GF, Smith MC. 2003. Bacteriophages with tails: chasing their origins and evolution. Research in microbiology 154:253-257. -36- 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. Suttle CA. 2007. Marine viruses--major players in the global ecosystem. Nat Rev Microbiol 5:801-812. Hatfull GF, Hendrix RW. 2011. Bacteriophages and their genomes. Current opinion in virology 1:298-303. Friedman SD, Genthner FJ, Gentry J, Sobsey MD, Vinje J. 2009. Gene mapping and phylogenetic analysis of the complete genome from 30 single-stranded RNA male-specific coliphages (family Leviviridae). J Virol 83:11233-11243. Werten S. 2013. Identification of the ssDNA-binding protein of bacteriophage T5: Implications for T5 replication. Bacteriophage 3:e27304. Ackermann HW. 2007. 5500 Phages examined in the electron microscope. Archives of virology 152:227-243. Hendrix RW, Smith MC, Burns RN, Ford ME, Hatfull GF. 1999. Evolutionary relationships among diverse bacteriophages and prophages: all the world's a phage. Proc Natl Acad Sci U S A 96:2192-2197. Ravin NV. 2011. N15: the linear phage-plasmid. Plasmid 65:102-109. Lobocka MB, Rose DJ, Plunkett G, 3rd, Rusin M, Samojedny A, Lehnherr H, Yarmolinsky MB, Blattner FR. 2004. Genome of bacteriophage P1. J Bacteriol 186:7032-7068. Reen FJ, Almagro-Moreno S, Ussery D, Boyd EF. 2006. The genomic code: inferring Vibrionaceae niche specialization. Nat Rev Microbiol 4:697-704. Takemura AF, Chien DM, Polz MF. 2014. Associations and dynamics of Vibrionaceae in the environment, from the genus to the population level. Frontiers in microbiology 5:38. Hazen TH, Pan L, Gu JD, Sobecky PA. 2010. The contribution of mobile genetic elements to the evolution and ecology of Vibrios. FEMS microbiology ecology 74:485-499. Rivera IN, Souza KM, Souza CP, Lopes RM. 2012. Free-living and planktonassociated vibrios: assessment in ballast water, harbor areas, and coastal ecosystems in Brazil. Frontiers in microbiology 3:443. Hunt DE, David LA, Gevers D, Preheim SP, Alm EJ, Polz MF. 2008. Resource partitioning and sympatric differentiation among closely related bacterioplankton. Science 320:1081-1085. Naka H, Dias GM, Thompson CC, Dubay C, Thompson FL, Crosa JH. 2011. Complete genome sequence of the marine fish pathogen Vibrio anguillarum harboring the pJM1 virulence plasmid and genomic comparison with other virulent strains of V. anguillarum and V. ordalii. Infection and immunity 79:28892900. Walling E, Vourey E, Ansquer D, Beliaeff B, Goarant C. 2010. Vibrio nigripulchritudo monitoring and strain dynamics in shrimp pond sediments. Journal of applied microbiology 108:2003-2011. Reynaud Y,. Saulnier D, Mazel D, Goarant C, Le Roux F. 2008. Correlation between detection of a plasmid and high-level virulence of Vibrio nigripulchritudo, a pathogen of the shrimp Litopenaeus stylirostris. Appi Environ Microbiol 74:3038-3047. - 37 - 66. 82. Hazen TH, Wu D, Eisen JA, Sobecky PA. 2007. Sequence characterization and comparative analysis of three plasmids isolated from environmental Vibrio spp. AppI Environ Microbiol 73:7703-7710. -38- CHAPTER TWO High Frequency of a Novel Filamentous Phage, VCY(D, within an Environmental Vibrio cholerae Population Hong Xue, Yan Xu, Yan Boucher and Martin F. Polz Reprinted by permission from Applied and Environmental Microbiology Copyright 2012 ASM Press, Washington DC Xue, H., Y. Xu, Y. Boucher & M.F. Polz, (2012) High frequency of a novel filamentous phage, VCY phi, within an environmental Vibrio cholerae population. Appi Environ Microbiol 78: 28-33 - 39- -40- High Frequency of a Novel Filamentous Phage, VCYD, within an Environmental Vibrio cholerae Population Hong XueA; Yan XUA*; Yan Boucher &; Martin F. Polz$ Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts A These authors contributed equally to this work. * Present address: Tong Ji University, Shanghai, P.R.China & Department of Biological Sciences, University of Alberta, CW405, Edmonton, AB, T6G 2E9, Canada $ Corresponding author. Mailing address: Massachusetts Institute of Technology, 48-421, 77 Massachusetts Ave., Cambridge, MA 02139. Phone: (617) 253-7128. Fax: (617) 2588850. E-mail: mpolz@mit.edu -41- 2. Chapter Two: High Frequency of a Novel Filamentous Phage, VCY4D, within an Environmental Vibrio cholerae Population 2.1. Abstract Environmental Vibrio choleraestrains isolated from a coastal brackish pond (Oyster Pond, Woods Hole, MA) carried a novel filamentous phage, VCYD, which can exist as a hostgenome integrated (IF) and plasmid-like replicative form (RF). Outside the cell, the phage displays morphology typical of Inovirus with filamentous particles -1.8 pm in length and 7 nm in width. Four independent RF isolates had identical genomes except for 8 single nucleotide polymorphisms (SNPs) clustered in two regions. The overall genome size is 7,103 bp with 11 putative ORFs, organized into three functional modules (replication, structure and assembly, and regulation). VCYO shares sequence similarity with other filamentous phages (including cholera disease associated CTX) in a highly mosaic manner, indicating evolution by horizontal gene transfer and recombination. VCY(I integrates in the vicinity of the putative translation initiation factor Suil in chromosome II of V. cholerae. A screen of 531 closely related host isolates showed that -40% harbored phage with 27% and 13% carrying the IF and RF, respectively. The relative frequency of RF and IF differed among strains isolated from the pond or lagoon of Oyster Pond suggesting that host habitat influences the intracellular phage biology. The overall high prevalence within the host population shows that filamentous phages can be an important component of the -42 - environmental biology of V. cholerae. 2.2. Introduction Filamentous phages of the genus Inovirus are unusual among bacterial viruses in that they do not lyse host cells when new phage particles are produced. Instead, new virions are packaged on the cell surface and extruded (24). These virions contain ssDNA that typically enters new hosts via a variety of pili positioned on the cell surface (26). Inside the host, inoviruses can persist as a circular, double-stranded replicative form (RF); alternatively, they can integrate into the host chromosome by a variety of mechanisms, including phageencoded transposases (19) and host-encoded XerC/D (11, 13), which normally resolve chromosome dimers. Production of new, single-stranded phage DNA can proceed via rolling circle replication from the RF. The genomes of inoviruses are composed of modules that encode genome replication, virion structure and assembly, and regulation (3); additionally, like many other phages, inoviruses can undergo extensive recombination, often picking up new genes in the process so that they may act as important gene transfer mechanisms among hosts (7, 9). Vibrio cholerae, environmental bacteria containing strains capable of eliciting the diarrheal disease cholera, has become somewhat of a model for studying Inovirus biology and diversity. This is because an important pathogenicity factor, the cholera toxin (CT), is encoded and transferred by the filamentous phage CTX(I (21). Infection is mediated by recognition of a type IV pilus (toxin coregulated pilus) and the phage genome can irreversibly integrate into the host chromosome at one of two dif sites (difi and dij2), which are the target of XerC/D-mediated recombination with phage att-sites (attP) and are present on V. cholerae chromosome 1 and 2, respectively (22). Different variants of CTXI are specific for either dif1 or d#j2 where they can integrate as single or tandem copies (6). A -43- number of additional filamentous phages have been described for V. cholerae, including VEJ(D (3), VGJO (4), KSF-1A (9), VSK(D (17), VSKKD, fslD (23), fs2(D (8), Vf33D (27) and 4930 (16). Importantly, it has recently been shown that several filamentous phages display cooperative interactions, and that a process of sequential infection, involving two satellite and three helper phages, may have been important in the evolution of V. cholerae strains associated with the seventh pandemic (11). Here we characterize a novel filamentous phage, designated VCY4), from an environmental V. cholerae, population. We also show that VCY(D had a remarkably widespread distribution in the host population it originated from and that the prevalence of RF vs. IF in host cells appears to be influenced by host habitat and lifestyle. 2.3. Materials and Methods 2.3.1. V. cholerae isolation and propagation Vibrio cholerae strains were isolated from surface water of Oyster Pond, Woods Hole, MA, and its lagoon connecting the pond to the coastal ocean on September 8, 2008. The water temperature and salinity were 24.5 and 26'C, and 4 and 5 ppt for the pond and lagoon, respectively. Particle-associated and free-living bacterial populations were collected by sequential filtration of water samples onto filters with different size cutoffs following the protocol in. (14). For the largest fraction, which is enriched in zooplankton, three replicate water samples of -100 L each were filtered through a 63 pm plankton net (Wildlife Supply Company) and the filtrate collected for strain isolation in the lab. For the remaining 3 size fractions, 3 replicate 1 L samples, which had been prefiltered to remove the 63 pm fraction, were collected and transported to the lab for further processing. In the laboratory, all materials retained on 63 pm filters were homogenized using a tissue -44- grinder (VWR Scientific) and vortexed for 20 minutes at low speed. The replicate 1-L water samples from which the >63 pm fraction had been removed were sequentially filtered through 5, 1 and 0.2 pm pore size filters where the 63-5 and 5-1 Jm size fractions were collected using gravity filtration to avoid breakdown of fragile particles. For these, filtration was repeated with sterile seawater to further remove cells unattached to particles. Subsequently, all filters were placed into 50 ml conical tubes containing 45 ml sterile seawater and vortexed for 20 minutes at low speed to break up particles and resuspend bacterial cells. Supernatants were used for isolation of V. cholerae by concentrating serial dilutions onto 0.2 pm Supor-200 filters (Pall) using gentle vacuum pressure. These filters were then placed onto agar plates containing Vibrio selective Thiosulfate Citrate Bile Salts Sucrose media (BD Difco) with 2% NaCl (marine TCBS). Single colonies were picked and re-streaked three times by alternating Tryptic Soy Broth (TSB) (BD Bacto) with 2% NaCl and marine TCBS media to obtain pure strains. For all subsequent analyses, the stock cultures were used to avoid unequal treatment of strains. Identification of V. cholerae was done by partial sequencing of the mdh gene as described in (1). For routine propagation, strains were grown overnight in Luria-Bertani (LB) broth (Difco) at 25*C in a shaking bath (180 rpm) overnight. Phage was originally detected as a plasmid- like band in genomic DNA preparations analyzed on agarose gels. 2.3.2. DNA isolation and sequencing. DNA was extracted from V. cholerae for sequencing of the replicative, plasmid-like form (RF) of VCY(D and to determine the insertion site of the integrative form (IF) in the host chromosome. To obtain RF DNA, plasmid-like genomes were isolated from 2 ml of overnight culture of V. cholerae strain lOE09PWO2, 1OF04PWO2, 5G03LW63 and -45- 11H04LW5 using Qiaprep Spin Miniprep kit (Qiagen Inc.). Subsequently, DNA was electrophoretically separated on 0.8% agarose gels, the bands corresponding to the RF cut out and purified using gel extraction kits (Qiagen Inc.). RF DNA from strain 1OE09PWO2 was tagged by barcode A6-B 15 (Table 1) while DNA from the remaining three RFs was combined and tagged with barcode A4-B 14 for Illumina sequencing. An Illumina sequencing protocol (25) was modified to allow for small plasmid library preparation as follows. DNA libraries were prepared by shearing about 1 pg RF DNA in a volume of 50 pl into fragments with average length of -400 bp. This was done using 14 cycles of alternating 30 seconds ultasonic bursts and 30 seconds pauses in a 4*C water bath in a Bio-Ruptor UCD-200 (Biogenode). The fragments were then end- repaired and phosphorylated using the End-Repair kit (New England Biolabs). The products were subject to a ligation reaction with a 10-fold molecular excess of Illumina adapters (Table 1) using the Quick Ligation kit (New England Biolabs). The ligation product was separated on 1.5% agarose gels and fragments of 300-500 bp size were purified with 10 pl EB buffer using the Qiagen MinElute Reaction Cleanup kit (Qiagen Inc.). The fragments were nick translated with Bst Polymerase (New England Biolabs) in 30 pl final volume. Eight replicate-2 pI reaction products were used without further purification in PCR amplifications using Phusion Hot Start High-Fidelity DNA polymerase (New England Biolabs), and reaction progress was monitored on a Bio-Rad Opticon real-time PCR instrument. The reactions were stopped in the late logarithmic amplification phase and the DNA from the replicate reactions pooled. To generate the ready-to-sequence DNA, libraries were subjected to an additional gel purification step to remove adapter dimers and residual primers. The quality and size distribution of the DNA libraries were checked by Agilent Bioanalyzer DNA-1000 assays (Agilent Technologies, Inc.). The two libraries were pooled with 34 other libraries -46- that had different bar codes for deconvolution post sequencing. The samples were loaded onto a cluster of Illumina GAIIx sequencer and resultant data were analyzed using the Illumina pipeline 1.4.0 to generate fastq files. Sequences were reconstructed and annotated using NextGen 1.9 (Softgenetics Inc.) and DNAmaster software (http://cobamide2.bio.pitt.edu), respectively. To determine the host chromosomal region of phage insertion in strains 4AO3LW1 and 4BO3LW1, a walking PCR protocol (18,20) was used taking advantage of the fact that the attPsite is split during insertion of phage into the host chromosome. Biotinylated primers, PCRwalking-biotin and PCRwalking-biotin-asp (Table 1), facing outwards from the predicted attP site were designed and used to obtain single-stranded PCR products. In a typical reaction, 20 ng host DNA containing integrated VCY(D was mixed with 0.5 pmoles biotinylated primer and 0.5 U Platinum Taq Hi-Fidelity (Invitrogen). Amplification used a three-step cycling program (94*C for 30 s; 45*C for 30 s; 68*C for 5 min) for 35 cycles. The extension products were captured on Streptavidin beads (Promega), purified and stored in 1x terminal deoxynucleotidyl transferase buffer. A polyG tail was added to the purified extended products by incubation with 4 mM dGTP and 4 U Tdt enzyme (Promega) at 37*C in a shaking bath (200 rpm) for 2 hours. The polyG tailed products were made double stranded by using the PCRwalking-anchor-C12 and PCRwalkingnest-asp primers (Table 1). The PCR products were separated on a 1% agarose gel and fragments 2-4 kb in size were purified using the Qiaquick gel extraction kit (Qiagen). The purified DNA fragments were re-amplified with primers PCRwalking-nest and PCRwalkinganchor -47 - PCRwalking-anchor-C12 but lacking the run of 12 C) (Table 1). (as Table 1. List of primers used in this study Primer Sequence Reference ilumina adapter A4 up 5'/5AmMC6/ACACTCTTTCCCTACACGACGCTCTTCC GATCTGCAGG-3' 5'CCTGCAGATCGGAAGAGCGTCGTGTAGGGAAAG AGTGTAC/3AmM/-3' 5'/5AmMC6/ACACTCTTTCCCTACACGACGCTCTTCC GATCTAATTC-3' 5'GAATTAGATCGGAAGAGCGTCGTGTAGGGAAAG AGTGTAC/3AmM/-3' 5'TACTGAGATCGGAAGAGCGGTTCAGCAGGAATG CCGAGC/3AmM/-3' 5'/5AmMC6/CTCGGCATTCCTGCTGAACCGCTCTTCC GATCTCAGTA-3' 5'AGCAGAGATCGGAAGAGCGGTTCAGCAGGAATG CCGAGC/3AmM/-3' 5'/5AmMC6/CTCGGCATTCCTGCTGAACCGCTCTTCC GATCTCTGCT-3' 5'AATGATACGGCGACCACCGAGATCTACACTCTTT CCCTACACGACGC TCTTCCGATCT-3' This study Illumina adapter A4 down Illumina adapter A6 up Illumina adapter A6 down Illumina adapter B 14 up Illumina adapter B 14 down Illumina adapter B 15 up Illumina adapter B 15 down Illumina _amp_1 This study This study This study This study This study This study This study (25) 5'AAGCAGAAGACGGCATACGAGATCGGTCTCGGC ATTCCTGCTGAAC CGCTCTTCCGATCT-3' (25) VCYint_F VCYint_R 5'-TTAACATTGTCAAATGATAAATATG-3' 5'-ATAATCAACTGATAATGTTGCAAAC-3' This study This study PCRwalking-biotin PCRwalking-biotin-asp 5'-biotin-CAACACAGCCCATTATTfTAGCCCC-3' 5'-biotin-CATTTCACCATTTTATATTGCGCGT-3' This study This study PCRwalkingjbiotininest 5'-CATTTCACCATTATATTGCGCGT-3' This study PCRwalking-biotin-nestasp 5'-TCTGAACTGTTAGACGCCTACAAAA-3' This study PCRwalking-anchor-C12 5'CCACGCGTCGACTAGTAATTCCCCCCCCCCCCDN -3' This study PCRwalking-anchor 5'-CCACGCGTCGACTAGTAATT-3' This study VCY(_Seq_2 5'-ATATCAATGCTTTGCGGTGGTCTAG-3' This study VCYbSeqI 5'-TCGATTCATTGTTAAAACTCCCAAAATCG-3' This study Illumina _amp_2 -48- The DNA was purified again by agarose gel and Qiaquick gel extraction kit, and was sequenced using the Sanger method with either primers VCYODseq_1 or VCY4_seq_2 (Table 1). To test whether the phage DNA was in single stranded form, DNA from phage particles was isolated as described by Faruque et.al (2005) (9) and digested with DNase I. 2.3.3. PCR-based phage identification and screen for RF or IF in host cells. Because different strains were used for sequencing and electron microscopy of phage, and to ensure that RF and IF are similar phage, we devised specific PCR primers targeting a gene (ORF9) currently unique to VCY(I (Table 1). To identify host isolates containing the RF and/or IF, a PCR protocol was devised that can differentiate either form. This was achieved by designing one set of primers flanking the attB site in the bacterial chromosome (primers VCY(IintF and VCY(I-intR; Table 1), which is split during integration, so that these primers only yield a product for strains not carrying the IF of VCY. Similarly, a second set of primers flanking the attP site of the phage (primers VCY(DSeq_1 and VCY4ISeq_2; Table 1) was used to confirm the presence of RF of VCY(D. For identification of the IF, we used the set of primers VCYint_F and VCYISeq_1 or VCY&_Seq_2, which can produce a 190 bp or 150 bp PCR product if the VCY(I is integrated into the attB site. In a typical reaction, 20 ng genomic DNA or 2 p 1:10 diluted cultured strains were used as template and mixed with 0.4 pM RF-specific or IF specific primers and polymerase mixture provided from Qiagen HotStarTaq Master Mix Kit using the three- step cycling program initial denaturation at 95*C for 15min, 30 cycles of three-step procedure including: denaturation at 94*C for 30s, annealing at 52'C for 30s and extension at 72'C for 30s; final -49- extension at 72*C for 5 min. The PCR products were separated on a 1.8% agarose gel prepared with 0.5 x TBE buffer. 2.3.4. Electron microscopy To prepare phage for electron microscopy (EM), V. cholerae strains 7D07PW5, which carries the RF of VCYD, was grown overnight in 100 ml LB medium at room temperature in a shaking water bath (180 rpm). The supernatant containing phage was collected by centrifuging the culture at 8,000 x g for 15 min and subsequent filtering through 0.22 pm pore size filter. A 100-sl aliquot of the filtered supernatant was spread on a LB agarose plate for sterility assurance. To precipitate the phage particles, NaCl and polyethylene glycol 6,000 were added to the filtrate to final concentrations of 2.5 and 5%, respectively. The mixture was incubated on ice for 30 min, followed by centrifugation at 13,000 x g for 30 min. The phage-containing pellet was collected and resuspended in 500 pl phosphate buffered saline. For EM, purified phage particles were negatively stained with 4% (w/v) uranyl acetate and mounted on freshly prepared Formvar grids. Phage samples were photographed under a FEI Technai Spirit Transmission Electron Microscope. The average length and width of the phage were determined from six individual particles. 2.3.5. Nucleotide sequence accession numbers. The genome sequence of VCY(D from strain 10E09PWO2 has been deposited in GenBank with accession number JN848801. -so- 2.4. Results and Discussion 2.4.1. Characterization of the replicative form (RF) of filamentous phage VCY(b. Among a collection of 531 environmental V. cholerae isolates from Oyster Pond, 77 contained putative episomal elements when screened by gel electrophoresis. Seven strains contained episomal elements of variable sizes; however, seventy appeared similar in size. Restriction endonuclease analysis for a subset of 10 of these elements using BamHI, EcoRI and PstI revealed identical patterns, suggesting a closely related, double-stranded plasmidlike element of approximately 7 kbp in size (data not shown). Subsequent genome sequencing of 4 of these 7kbp plasmids suggested them to be the replication form (RF) of a new filamentous phage, which we call VCY(D (Figure 1). The whole genome of VCYD phage consists of 7,103 nucleotides (Figure 1) with a G+C content of 41.8 mol %. Among the 4 sequenced genomes, only 8 SNPs, clustered in two regions, were evident: 3 SNPs in a 7 bp stretch (SNP-A in Figure 1) and 5 SNPs in a 14 bp stretch (SNP-B in Figure 1). Overall, the phage contains 11 open reading frames (ORFi to ORF 1), predicted by Blast search. Of these ORFs, 9 are homologous to protein coding genes previously reported from other filamentous phages, including KSF-1(D (9) and VGJI (4). Based on similarity in sequence and organization to these other phages, the ORFs of VCYD can be classified into functional modules for replication, structure- assembly, and regulation (Figure 1). The putative replication module, composed of ORFi to ORF3 (Figure 1), maps to the same position as rstA and rstB in CTXD (8) and as gII and gVin M13 phage (2). ORF2 and ORF3 share amino acid sequence similarity with the protein of potential phage replication genes -51- and genes of potential ssDNA-binding proteins (26). We therefore suggest that ORF2 and ORF3 play similar roles in VCYO. A hypothetical gene, ORF1 is associated with the replication module based on its map position and the overlap of its stop codon with ORF2; however, its function remains unknown. 0 ORFI ORF2 F4 S P-B OR 10 attP ORF6 ORF3 ORF5 ORF7 ORF8 ORF9 SNP-A F1 VCY0 7103 bp CD3 CDI KSF-10 4L CD2 CD4 CD7 CD5 CD6 D8 CD9 CD10 CD11 CD12 7107 bp CD2 CD4 CD5 CD1 CD3 =12 CD6 CD7 VGJ: CD8 CD9 CD10 CD11 CD13 7542 bp Figure 1. Genome organization of VCYO phage. Linear ORF maps of VGJ, KSF-1A0 and VCYD phage were aligned based on their modular structure. ORFs or genes are represented by arrows oriented in the direction of transcription. Black, white and light grey arrows represent replication, structural and assembly, and regulation modules, respectively. Dark grey arrows represent unknown ORFs. The attP sequences of VGJ and two SNP regions (SNP-A and SNP-B) are also indicated. The putative structural and assembly module consists of ORF4 to ORF8 (Figure 1), each sharing similarity in size, genome position and sequence similarity with the corresponding capsid proteins of other filamentous phage (3, 4, 8, 9). For instance, ORF6 exhibits similar size and genome position to gIII of CTXD, which encodes a minor capsid protein pIII that 52 - - recognizes and interacts with receptors and coreceptors. The protein encoded by ORF8 is similar to the pI protein of LF phage of Xanthomonas campestris (5) and the Zot protein of CTXD, which are both required for viral particle packaging and secretion. ORF10 and ORF 11 likely encode regulatory proteins constituting the third module. Both ORFs are oriented in opposite direction to the rest of the ORFs and exhibit homology to ORF136 and ORF154 of VGJD (4) that encode a potential regulatory and repressor protein, respectively. Finally, ORF9 is a conserved hypothetical protein whose function has not been established. Its location between the structure/assembly and regulation modules is the same as ctxA and ctxB of the CTX(D. However, ORF9 does not share homology with these genes, which code for the cholera toxin (CT), an important pathogenicity determinant in the diarrheal disease cholera. Based on these comparisons, it seems likely that ORF9 is not associated with any of the three modules and may provide additional but currently unknown function to the phage. As in other sequenced filamentous phages from Vibrio strains (4, 9, 10), the three modules of VCYD phage appear to be evolutionary mosaics assembled by horizontal gene transfer. At the nucleic acid level, the structure and assembly module of VCYD phage and filamentous phage KSF-1D from V. cholerae share 76% identity (9) while the regulatory module of VCYD phage displays only low similarity to that of KSF-1(F; instead it is 80% similar to the corresponding module of phage VGJ(I phage (4). The hybrid genome of VCYF phage thus confirms that horizontal gene transfer, possibly by co-infection, is a significant driving force in the evolution of filamentous vibriophages. 53 - - 2.4.2. Characterization of the viral particle A filamentous phage structure was detected by electron microscopy in precipitates obtained from filtered supernatant of strain 7D07PW5, which had been shown to contain the 7 kbp plasmid-like structure in the gel assay. These phage-like particles were 1.762 0.016 pm in length and 7 nm in width (n=6) (Figure 2). The size of VCY(D is similar to those typically found in the genus Inovirus (28) (0.8-2 pm in length and 6-7 nm in width) including fs20D (15) and KSF-1(D (9). A B Figure 2. Electron micrograph of VCY4D phage particles. Phage particles were isolated from the culture supernatant of strain 7D07PW5. Both of bars inside the pictures are 100 nm. - 54- 2.4.3. Comparison of RF and IF by PCR screen. To gain further confidence that the RF and IF of the phage detected in host cells represents the same phage, we used a specific PCR assay targeting ORF9 (Table 1), which is currently unique to VCYD. This gave positive results for all 10 strains assayed, including those used for sequencing and electron microscopy. Together with the PCR assays differentiating IF and RF, this suggests that the phage present in the Oyster Pond isolates were of a highly similar nature. 2.4.4. Characterization of the integration site of VCYO. Because several filamentous phages have been shown to integrate into the V cholerae chromosome, we investigated such ability in VCYO. We first identified a putative 28 nucleotide-long attP site by comparison with VGJD (4) since this phage has a regulation module distinct from VCYD by only 20% nucleotide differences (Figure 3). This comparison also revealed potential binding sites for XerC and XerD, which mediate phagehost recombination (22). The XerC site of VCYD phage differs from that of VGJ phage in four nucleotides whereas the XerD sites are identical. Similar to VGJ(I phage, the XerC and XerD sites of VCYO phage are also intervened by 7 nucleotides. The attP sequences of some filamentous phage from Vibrio are homologous, suggesting the attP structure is important for recognition and recombination by XerC and XerD. To characterize the phage integration site of the host chromosome (attB), we first developed a PCR screen to distinguish V. cholerae strains containing either the IF or RF of the VCYD phage. Strain 4A01LW1, for which this analysis detected an integrated phage, was chosen for further characterization of the chromosomal location of the attB site using walking PCR. 55 - - Sequence analysis of PCR products and comparison with the genome sequence of V. cholerae 01 N16961 (12) identified a putative attB site, 28 bp long and identical to the site of difi of V. cholerae 01 N16961 except for a single nucleotide position (Figure 3A). However, unlike the difi site of V. cholerae 01 N16961, which is located on chromosome I, the 1,054 bp region flanking the attB site in strain 4AO1LW1 shared sequence similarity (84%) with chromosome II. In this strain, the attB site is located between a putative transposase and suil, which encodes a translation initiation factor (Figure 3B). This suggests VCY(I phage, like other filamentous phage in Vibrio strains, uses the XerC and XerD recombination system to integrate into chromosomal dif-like sequences. XerC XerD . A attP-VCYO AGTACATATTATGTGTCGTrATGGTAAAA attP-VGJ ACTCGCATTATGTCGGCTrTATOGTAAAA dif1 (v.cholerae) AGTGO4GCATrATGTATG-tTATGTTAAAT altB-VCYO ACTGCGCATTATGTATG-TTAT1GTAAAT B aft i site Identify -I% Io ISI4 tranIvmw N*N ientitv to siI Figure 3. attP site of VCYI and attB site of integration of VCY4I into chromosome II of strain 4AO1LW1. (A) Sequence alignment of the attP regions of the VCY(I and VGJO (AY242528), and of the attB regions of strain 4AO1LW1 and difi of V.cholerae N16961. (B) Schematic representation of the integration region of chromosome II of strain 4AO1LWl. The attB sequences region is also indicated. 56 - - 2.4.5. Distribution of VCY4 across the environmental V. cholerae population. Using data from gel analysis and the PCR-based screens for either RF or IF, we further investigated the frequency of the two forms of the VCY(D phage across a large collection of V. cholerae isolates from Oyster Pond. This showed that 220 (41.4%) of a total of 531 isolates contained either the RF or IF of VCYD phage (Table 2), and seventy (13.2%) of 531 strains only carried the 7 kbp RF of VCYD phage as suggested by gel electrophoresis (Table 2). Because in none of these strains the IF was detectable by IF-specific PCR assay, the RF of VCYO phage appears to be able to replicate without integration into the chromosome. Overall, this suggests remarkably high prevalence of this phage within this environmental V. choleraepopulation. Table 2. Frequency of the IF and RF of VCY0 phage in a collection of 531 V cholerae isolates from coastal Oyster Pond (MA) and its lagoon. No. of isolates No. of isolates with No. of strains with IF2 Total no. of strains containing VCY0 phage RF' Pond 360 59 (16.4%) 79 (21.9%) 138(38.3%) Lagoon 171 11(6.4%) 71 (41.5%) 82 (48.0%) Total 531 70 (13.2%) 150 (28.2%) 220 (41.4%) The RF was detected as a 7 kbp, plasmid-like band by agarose gel electrophoresis and by an additional RF-specific PCR assays followed the IF-specific PCR assays. 2 The IF was identified by IF-specific PCR assays. Although it is impossible to know how exactly initial isolation has affected transition between RF and IF or loss of the phage, we suggest that the numbers provided represent - 57- lower bound of the frequency of the phage in this environmental V. cholerae population. We found that after regrowth of strains from liquid stock cultures, 14% of strains had lost the RF but in only one strain had the RF transitioned to IF. Moreover, only a single loss of the IF was observed when 2 strains (4A01 and 4B03) were streaked from liquid media and a total of 39 single colonies were assayed. This suggests that the RF and IF are moderately stable when strains are propagated but also indicates that an even higher portion of strains in the environmental population may have been harboring phage. Another 150 (28.2%) of the 531 host isolates carried the IF of VCY(D phage detectable by IF-specific PCR screen suggesting high prevalence of the integrated phage in the V. choleraepopulation. Although none of these strains showed a visible 7-kbp band by agarose gel electrophoresis, 113 also gave positive results with the RF-specific PCR screen (Table 2). Because lysogen repression is never absolute, it is possible that a small subpopulation in each culture tube had transitioned from the [F to the RF. Alternatively, a subpopulation in each culture tube had transitioned from IF to RF. These host strains were therefore scored as containing IF only but the presence of small amounts of RF underscores that the exact proportions of strains containing RF and IF may have shifted during culturing. We therefore only stress trends in populations from different habitats based on the assumption that these should be unaffected by transitions between the two forms during regrowth of strains, which was highly standardized (three streaks post isolation with subsequent analyses carried out from freezer stock cultures). The overall frequency of the phage (RF and IF) was similar among host isolates from lagoon and pond (Table 2); however, they displayed different trends in the presence of IF and RE. While in the pond the frequency of IF (22%) and RF (16%) were roughly equal, in the 58 - - lagoon, the vast majority was in the IF (42%) rather than the RF (6%) (Table 2). As detailed above, we deem it unlikely that such difference might arise post isolation of host strains, so that environmental factors may play a role in transition between RF and IF. The population in the lagoon may thus have been producing less phage particles than its equivalent in the pond; however, we emphasize that such observation will have to be verified by cultureindependent methods in the future. In summary, we have described a novel filamentous phage infecting V. cholerae, adding to the already considerable number of such phages in this species. We show that this phage had a surprisingly high prevalence in environmental host populations when sampled in late summer and that the transition between IF and RF may be influenced by environmental factors. Overall, filamentous phages appear to be an important factor in the environmental biology of V. cholerae and can affect a large fraction of the cells within a population. 2.5. Acknowledgements This work was supported by grants from the National Science Foundation Evolutionary Ecology program, and the National Science Foundation and National Institutes of Health co-sponsored Woods Hole Center for Oceans and Human Health, the Moore Foundation and the Department of Energy to MFP, as well as postdoctoral fellowships from the MITMerck alliance to YB. YX would like to acknowledge support from the Chinese Scholarship Council during her stay at MIT. 59 - - References: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Boucher, Y., 0. X. Cordero, A. Takemura, D. E. Hunt, K. Schliep, E. Bapteste, P. Lopez, C. L. Tarr, and M. F. Polz. 2011. Local mobile gene pools rapidly cross species boundaries to create endemicity within global Vibrio cholerae populations. mBio 2:e00335-00310. Calendar, R. 1988. The Bacteriophages. Plenum Press, New York. Campos, J., E. Martinez, Y. lzquierdo, and R. Fando. 2010. VEJ$, a novel filamentous phage of Vibrio cholerae able to transduce the cholera toxin genes. Microbiology 156:108-115. Campos, J., E. Martinez, E. Suzarte, B. L. Rodriguez, K. Marrero, Y. Silva,T. Ledon, R. del Sol, and R. Fando. 2003. VGJ phi, a novel filamentous phage of Vibrio cholerae, integrates into the same chromosomal site as CTX phi. J Bacteriol 185:5685-5696. Chang, K. H., F. S. Wen, T. T. Tseng, N. T. Lin, M. T. Yang, and Y. H. Tseng.1998. Sequence analysis and expression of the filamentous phage phi Lf gene I encoding a 48-kDa protein associated with host cell membrane. Biochem. Biophys. Res. Commun. 245:313-318. Das, B., J. Bischerour, and F. X. Barre. 2011. Molecular mechanism of acquisition of the cholera toxin genes. Indian J Med Res 133:195-200. Davis, B. M., and M. K. Waldor. 2003. Filamentous phages linked to virulence of Vibrio cholerae. Curr Opin Microbiol 6:35-42. Ehara, M., S. Shimodori, F. Kojima, Y. Ichinose, T. Hirayama, M. J. Albert,K. Supawat, Y. Honma, M. Iwanaga, and K. Amako. 1997. Characterization of filamentous phages of Vibrio cholerae 0139 and 01. FEMS Microbiol Lett 154:293-301. Faruque, S. M., I. Bin Naser, K. Fujihara, P. Diraphat, N. Chowdhury, M. Kamruzzaman, F. Qadri, S. Yamasaki, A. N. Ghosh, and J. J. Mekalanos. 2005. Genomic sequence and receptor for the Vibrio cholerae phage KSF-1phi: evolutionary divergence among filamentous vibriophages mediating lateral gene transfer. J Bacteriol 187:4095-4103. Faruque, S. M., and J. J. Mekalanos. 2003. Pathogenicity islands and phages in Vibrio cholerae evolution. Trends Microbiol 11:505-510. Hassan, F., M. Kamruzzaman, J. J. Mekalanos, and S. M. Faruque. 2010. Satellite phage TLCphi enables toxigenic conversion by CTX phage through dif site alteration. Nature 467:982-985. Heidelberg, J. F., J. A. Eisen, W. C. Nelson, R. A. Clayton, M. L. Gwinn, R. J. Dodson, D. H. Haft, E. K. Hickey, J. D. Peterson, L. Umayam, S. R. Gill, K. E. Nelson, T. D. Read, H. Tettelin, D. Richardson, M. D. Ermolaeva, J. Vamathevan, S. Bass, H. Qin, I. Dragoi, P. Sellers, L. McDonald, T. Utterback, R. D. Fleishmann, W. C. Nierman, 0. White, S. L. Salzberg, H. 0. Smith, R. R. Colwell, J. J. Mekalanos, J. C. Venter, and C. M. Fraser. 2000. DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. Nature 406:477-483. -60- 13. Huber, K. E., and M. K. Waldor. 2002. Filamentous phage integration requires the host recombinases XerC and XerD. Nature 417:656-659. 14. Hunt, D. E., L. A. David, D. Gevers, S. P. Preheim, E. J. Alm, and M. F. PoIz. 2008. Resource partitioning and sympatric differentiation among closely related bacterioplankton. Science 320:1081-1085. Ikema, M., and Y. Honma. 1998. A novel filamentous phage, fs-2, of Vibrio cholerae 0139. Microbiology 144 (Pt 7):1901-1906. Jouravieva, E. A., G. A. McDonald, C. F. Garon, M. Boesman-Finkelstein, and R. A. Finkelstein. 1998. Characterization and possible functions of a new filamentous bacteriophage from Vibrio cholerae 0139. Microbiology 144 (Pt 2):315-324. Kar, S., R. K. Ghosh, A. N. Ghosh, and A. Ghosh. 1996. Integration of the DNA of a novel filamentous bacteriophage VSK from Vibrio cholerae 0139 into the host chromosomal DNA. FEMS Microbiol Lett 145:17-22. Katz, L. A., E. A. Curtis, M. Pfunder, and L. F. Landweber. 2000. Characterization of novel sequences from distantly related taxa by walking PCR. Mol Phylogenet Evol 14:318-321. Kawai, M., I. Uchiyama, and I. Kobayashi. 2005. Genome comparison in silico in Neisseria suggests integration of filamentous bacteriophages by their own transposase. DNA Res 12:389-401. Luo, P., T. Su, C. Hu, and C. Ren. 2011. A novel and simple PCR walking method for rapid acquisition of long DNA sequence flanking a known site in microbial genome. Mol Biotechnol 47:220-228. McLeod, S. M., H. H. Kimsey, B. M. Davis, and M. K. Waidor. 2005. CTXphi and Vibrio cholerae: exploring a newly recognized type of phage-host cell relationship. Mol Microbiol 57:347-356. McLeod, S. M., and M. K. Waidor. 2004. Characterization of XerC- and XerDdependent CTX phage integration in Vibrio cholerae. Mol Microbiol 54:935-947. Nakasone, N., Y. Honma, C. Toma, T. Yamashiro, and M. Iwanaga. 1998. Filamentous phage fsl of Vibrio cholerae 0139. Microbiol Immunol 42:237-239. Rakonjac, J., J. Feng, and P. Model. 1999. Filamentous phage are released from the bacterial membrane by a two-step mechanism involving a short C-terminal fragment of pill. J Mol Biol 289:1253-1265. Rodrigue, S., A. C. Materna, S. C. Timberlake, M. C. Blackburn, R. R. Malmstrom, E. J. Alm, and S. W. Chisholm. 2010. Unlocking short read sequencing for metagenomics. PLoS One 5:e11840. Stassen, A. P., R. H. Folmer, C. W. Hilbers, and R. N. Konings. 1994. Singlestranded DNA binding protein encoded by the filamentous bacteriophage M13: structural and functional characteristics. Mol Biol Rep 20:109-127. Taniguchi, H., K. Sato, M. Ogawa, T. Udou, and Y. Mizuguchi. 1984. Isolation and characterization of a filamentous phage, Vf33, specific for Vibrio parahaemolyticus. Microbiol Immunol 28:327-337. Welsh, L. C., D. A. Marvin, and R. N. Perham. 1998. Analysis of X-ray diffraction 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. -61- from fibres of Pfl Inovirus (filamentous bacteriophage) shows that the DNA in the virion is not highly ordered. J Mol Biol 284:1265-1271. 62 - - CHAPTER THREE Diversity and Dynamics of Extrachromosomal Elements among EcologicallyDefined Host Populations -63- -64- 3. Chapter Three: The Eco-evolutionary Dynamics of Extrachromosomal elements among Ecological Populations of Vibrionaceae 3.1. Abstract Although plasmids and other extrachromosomal elements (ECEs) are recognized as key players in horizontal gene transfer, their diversity and dynamics among ecologically structured host populations in the wild remains poorly understood. Here we characterized 187 ECEs from 660 Vibrio isolates previously categorized into 25 ecologically and genetically cohesive populations from a coastal environment (Plum Island Sound, Ipswich, MA). ECEs are unevenly distributed among host populations and occur at higher frequency in free-living cells, suggesting influence of host lifestyle on ECE carriage. We detected 22 temperate but non-integrated bacteriophages, and 24 conjugative, 38 mobilizable and 103 non-transmissible plasmids, which all differ in their putative mode and probability of transmission. Based on sequence similarity, phages and plasmids were assigned to 2 and 29 different families, respectively. Many of the plasmid families contain low sequence diversity but are widespread among host populations, indicating high eco-evolutionary turnover. This includes non-transmissible ECEs suggesting that these are misnamed and transferred by currently unknown mechanisms. Finally, analysis of recent gene transfer among ECEs suggests that plasmids are highly recombinogenic and represent an extensive network of gene transfer that has implications for horizontal gene transfer among distantly related host. 65 - - 3.2. Introduction With few exceptions, plasmids and other extrachromosomal elements (ECEs) have been studied with a purpose, i.e., as major conduits for the spread of resistance and virulence genes. Only recently have whole genome sequencing of microbial hosts and direct extraction of ECEs from environmental samples provided a more unbiased glimpse at their large and functionally often uncharacterized diversity (1, 2). For plasmids, in particular, this has demonstrated the existence of large numbers of types that have been broadly categorized into conjugative, mobilizable and non-transmissible based on the presence (or absence) of functional genes related to the ability to transfer between host cells (3, 4). Much, however, remains to be learned about the ecological and evolutionary dynamics of these different types of plasmids, such as their host range, nucleotide and gene content diversity, and frequency and persistence within host populations in the wild (5). Even less well studied are temperate phages that can manifest as ECEs, replicating as plasmid-like structures during the lysogenic phase of their lifecycle. Examples of such phage ECEs are some Tectiviridae (6), which have been found as linear plasmids in Bacillus species, and phage N15, which is a relative of lambda phage (7). Because plasmid and phage ECEs play roles as molecular symbionts or parasites, and can mediate horizontal gene exchange, their biology must ultimately be studied in the context of host populations they invade; however, this has remained difficult due to the dearth of suitable model systems of ecologically and genotypically well-constrained bacterial populations. Here we take a population-genomic approach to determine carriage of different types of ECEs in a recently established model for ecologically and genetically cohesive bacterial populations (8), asking whether different ECEs (i) are primarily associated to host -66- phylogeny or ecology, (ii) show evidence for distinct transfer (and loss) patterns, and (iii) display different micro-evolutionary patterns. We use marine bacteria of the family Vibrionaceaeas our model for environmentally differentiated host populations. These have previously been identified as genotypic clusters with characteristic distribution among environmental samples suggesting that they partition resources in the coastal ocean by differential occurrence among the free-living and associated (with suspended organic particles and zooplankton) fractions of bacterioplankton (9-11). Many of these populations do, however, co-occur on the surfaces and in the guts of larger marine animals providing opportunity for transfer of ECEs via occasional contact. On the other hand, sampling during different seasons has revealed strong temporal differentiation with the same 'habitat' type often occupied by season-specific populations (9, 10). Finally, recent analysis of recombination has indicated that these ecological population display cohesive behavior in terms of gene flow, making it possible for adaptive genes to spread in a population-specific manner (12). Because of these properties, these clusters are hypothesized to represent natural populations and provide a platform to inquire the diversity and dynamics of ECEs. To explore the diversity of ECEs within host populations, we screened a large collection of isolates obtained from the coastal ocean in the spring and fall of 2006 (8). We aimed at comprehensively sampling and sequencing all detectable ECEs of different sizes to obtain an as unbiased picture of ECE diversity as possible. ECE diversity was analyzed in a comparative genomic framework, integrating this analysis with both phylogenetic and habitat information of the bacterial populations in order to identify differential associations and dynamics. -67- Our data reveal surprising results about the diversity and distribution of ECEs among Vibrio populations and reject several basic hypotheses about eco-evolutionary dynamics based on prior literature. First, contrary to the expectation that due to the requirement of cell-to-cell contact for transmission, plasmids should be predominantly associated with isolates recovered from biofilms, we show that they are significantly enriched in fee-living, planktonic cells. Second, our data show that so-called non-transmissible plasmids, which are the most common type of ECEs in Proteobacteria (13), are in spite of their presumed inability to self-mobilize, able to transfer rapidly and frequently. This suggests that currently unrecognized transfer mechanisms are at work. Finally, we show that plasmidlike temperate phages occur at considerable frequency within cells suggesting that these play a more important environmental role than previously anticipated. 3.3. Methods 3.3.1. Vibrio isolates and initial screening for plasmids Isolates were selected from a previous collection of Vibrionaceaecarried out to ascertain co-existence of ecologically and genotypically structured populations in the same seawater samples (8). These isolates had been frozen after minimal handling for purification to reduce potential for plasmid and gene loss. A total of 660 isolates from spring (4/28/06) and fall (9/6/06) samples were screened for ECE presence by using gel electrophoresis of DNA that had been extracted using a modified QIAGEN plasmid purification kit. Briefly, 3 ml of buffers P1, P2 and P3 were sequentially added to a cell pellet, obtained after growth in 30 ml of 1% Tryptic Soy Broth (TSB) media with 2% NaCl (BD Bacto) for 20 hours. ECE DNA was further purified by phenol:chloroform extraction followed by precipitation with isopropanol. DNA was re-dissolved in 500 pl TE buffer of which 100 l and 400 pl -68- were used for detection of small plasmids (<20 kbp) on 0.8% agarose gels and of large plasmids (>20 kbp) on 1% agarose gels, respectively. The latter gels were run for 16-20 hours at 4*C and at 5-10 volt/cm. For each isolate, three replicate detection and isolation procedures were performed. 3.3.2. ECE sequencing To obtain large enough quantities of purified ECE DNA for sequencing, isolation was performed from 200 ml cultures using the same protocol as above except that 4 ml of buffers P1, P2 and P3 were added. Small bands (<20kbp) was electrophoretically separated and extracted from gels using the QIAEX II Gel Extraction Kit (Qiagen Inc.). Larger bands (>20kbp) were cut from the gel and 2 volumes of water and 3 volumes of Buffer QX1 (Qiagen Inc.) were added. The gel slices were incubated at 50*C for 10 min or more until the agarose was completely solubilized. Lastly, the ECE DNA was precipitated from the supernatant by adding 1/10 volume of 3M sodium acetate (pH 5.2) and 0.7 volumes of room-temperature isopropanol. For sequencing of small ECEs, DNA libraries were prepared using the modified Illumina sequencing protocol and multiplexed 36-fold using combinations of six barcodes as described previously (14, 15). Barcoded DNA libraries were normalized and mixed for sequencing on Illumina GAIIx sequencers and the data were analyzed using the Illumina pipeline 1.4.0 to generate fastq files. Libraries of ECE DNA from larger bands were prepared using Nextera DNA Sample Preparation Kits (Roche Titanium-compatible) (Epicentre, Inc.) with 36 adapters (Table 1). Thirty or 36 DNA libraries, each tagged with different adapters, containing the same amount of total DNA, were mixed and sequenced using the Roche GS FLX system. - 69- 3.3.3. Assembly of plasmid contigs Plasmids sequenced with 454 were assembled using Mira v.3.4.1 (16). Plasmids sequenced with Illumina were assembled by trimming adapter sequences from the reads using Cutadapt (17), followed by running the velvet assembler (18). The above assembly procedure was partially automated with additional manual adaptation. To achieve the highest accuracy, multiple assemblies were applied with varying parameters on subsets of the data, both random and conditional on kmer abundance. Assemblies that most closely matched the expectations of genome size and connectedness indicated by the kmer spectrum were chosen for further analysis. Repeated trial and error were performed on the assemblies that did not match the expectations well enough. 3.3.4. Annotation of proteins We annotated the ORFs and the corresponding function of the encoded proteins (in terns of FIGFAMS and Subsystems) using the RAST tools (19). Ten short bands for which no ORFs could be annotated were removed from any further analysis. From the total set of 4,751 proteins and 5 RNA genes annotated in the remaining bands, we built families of protein orthologs using orthoMCL (20). 3.3.5. ECE identification from band contigs Because each gel band can, in principle, contain more than one independent element, we developed a bioinformatics pipeline to identify independent ECEs. We constructed a network of sequence similarity based on the gene content overlap across all contigs from all bands. For any given pair of contigs a and b belonging to different bands we computed two similarity metrics based on the number of protein families shared between the contigs. Let n and m be the number of proteins encoded in each member of the pair (excluding - 70- duplications) and s be the number of shared protein families between the contigs. Local similarity (LS, OLS:1) between the contigs was defined as LS(ab) = s/min(nm). Global similarity (GS, OGS:1) was defined as GS(ab) = s/max(nm). While high values for LS are obtained when the protein family content of a contig is similar to that encoded in a segment of a larger contig, large GS values can only be obtained when the similar protein content involves a large fraction of the proteins encoded in each contig. Next we performed the following ECE separation algorithm: 1) Contig families were built using MCL (21) to cluster contigs above a global similarity threshold of GS>0.7; 2) A single contig was considered as an ECE if it was circularized and/or it belonged to one of the contig families defined in step 1; 3) For each given contig - even for those included in the single contig ECE set defined in step 2 - we considered whether it could be a fragment of a larger ECE consisting of two or more contigs from the same band. We therefore looked for contigs that matched ECEs from the same band in a way to suggest that they could be part of a single larger, multi-contig ECE. In that way, the set of potential references for the given contig (the query contig) was defined as the set constituted by any contig in our collection that i) was larger than the query contig (in terms of number of encoded proteins) and ii) had a LS above 0.5 with the query contig. To prioritize contigs of high quality as references, we removed any other contig from the set of potential references if some of the potential references were one-contig ECEs; 4) For each potential reference contig we computed it's LS with each contig in the -71- same band to which the query contig belonged (the query band). Contigs in the query band were ordered by the LS value in decreasing order and grouped sequentially. A joined global similarity (JGS) to the reference was computed at each contig grouping step; 5) We kept the group of contigs in the query from which the largest JGS value was obtained (the optimal JGS) in step 4; 6) Steps 4 and 5 were repeated for each potential reference, and the reference giving the largest optimal JGL was considered as the best reference for the corresponding set of contigs in the query band. Such set of contigs was proposed as a multi-contig ECE; 7) The proposed multi-contig ECE was validated by checking that the joined contigs had similar read coverage in the same host. We used SSAHA2 (22) for mapping the reads and SAMtools (23) to compute the mapping coverage; 8) Confirmed multi-contig ECEs were separated from the original band. We found one multi-contig ECE per band at most. Any contig that was used as a reference for a confirmed multi-contig ECE was included in the set of one-contig ECEs. Some contigs that were initially defined as single contig ECEs were removed from that category because they could be integrated in a larger (multi-contig) ECE. Contigs in a band that were not previously defined as single contig ECEs or were not integrated in a multi-contig ECEs using a reference were considered by default as part of the same and separated (low-evidence) ECE. Such remnant ECE could have one or more contigs. -72- 3.3.6. Classification of ECEs. For detecting conjugative and mobilizable ECEs, we used a very significant update of the protein profiles previously assembled in (24). For conjugative plasmids, we first searched for matches to TraUNirB4 from each of the mating pair formation (MPF) system families defined previously (25) since this is the only protein that is associated with all known Type four secretion systems (T4SS) (or at least the only sufficiently conserved in sequence). We then gathered all proteins found within a frame of -20/+20 ORFs around the TraU/VirB to determine whether a functional T4SS was present. For each MPF type, we carried out similarity searches between all proteins and clustered them into families. These families were aligned, analyzed and curated. We iterated based on criteria such as sensitivity and specificity, and then made multiple alignments that were used to build protein profiles with HMMER (26, 27). This led to a database of -120 protein profiles associated with conjugation. These correspond in general to the known essential proteins in each system (albeit a few evolve too fast and give poor sequence similarity hits). The profiles were then searched using HMMER in the proteomes of ECEs. We filtered the results by using an Evalue threshold of 0.01, and coverage (ratio between the target and query lengths) of more than 0.5. Clusters are based on the findings in the entire replicon. For detecting phages, we blasted the contigs against the ACLAME Database of Mobile Elements v. 0.4 (28) using blastp with parameter E set at le-10, in which we unified the phage and prophage categories into a single "phage" category so that we ended up with two ECE categories: "phage" or "plasmid" proteins. If a given protein from our contigs had a match with at least one phage protein in ACLAME, and there were no matches with plasmid targets, the protein was classified as of phage origin. The same single classification was - 73- used to decide whether a protein was associated to a plasmid (whenever there was at least a plasmid match and no matches with phage targets). When there were both matches with at least a plasmid protein and a phage protein, proteins were classified as mixed (with independence of the number of matches within each category). If the percentage proteins recruited to phage was > 20% and also more than 1.5 times the percentage recruited to plasmid, we called it a phage. Then all of other ECEs were called non-transmissible ECEs. 3.3.7. ECE families and network of gene sharing among families For analysis of the history of ECE transfer among populations, ECE families were identified based on gene content and sequence similarity with a cutoff of 60 and 97% in order to integrate over more ancient and recent transfer, respectively. These families were based on the contig families identified above (Ch.3.3.5.), after removal of those contigs that were integrated into multi-contig ECEs. Multi-contig ECEs were integrated in the same family of the corresponding reference contig; however, if the reference contig was not already a member of a family, a new family was created. ECEs defined as remnants were not part of any ECE family. To determine the relationship among the ECE genomes in terms of shared genes, ECEs were assigned to groups based on clustering of shared proteins. In brief, open reading frames (ORFs) were identified using Glimmer 3.0 (29), followed by the clustering of ORF proteins using OrthoMCL (30, 31), in which a minimum 97% coverage of the longer sequence was required as well as an e-value cut-off of 10-5. Whole genomes of ECEs were then clustered based on these shared proteins, using the FT ClustNSee clustering algorithm in Cytoscape (32-34). -74- 3.4. Results and Discussion 3.4.1. ECE Characterization Gel electrophoresis was employed to screen 660 Vibrio isolates for the presence of ECEs. This revealed 140 DNA bands ranging between 1 and 200 kbp distributed across 101 Vibrio strains (15.3% of all isolates). DNA extracted from each band was sequenced using next generation sequencing (Illumina or 454) and assembled into 270 contigs containing a total of 4,246,711 bp with a size range between 1 and 95 kbp (Table 2). The GC content within the contigs varied between 36% and 53% (average value of 44.3%). By using a novel bioinformatics pipeline, we were able to identify 187 ECEs from these contigs. 3.4.2. Proteins of ECEs We annotated 4,751 protein, five 5S RNA and three tRNA genes from the 187 ECEs (supplementary table 1). With 2,992, the majority of proteins (63.0%) are hypotheticals according to standard RAST annotation. However, when comparing these results to annotation using the ACLAME database (version 0.4), which is a curated collection of prokaryotic mobile genetic elements from various sources (phages, plasmids, transposons and genomic islands) (28, 35), only 42.8% of proteins were categorized as hypothetical. This discrepancy demonstrates the value of curated databases for annotation but also that ECE function remains overall poorly characterized. Among the -37% (1,759) proteins with functional annotations, we identified 178 that are typically employed by plasmids for maintenance within their hosts (5). These fell into the following categories: 50 resolvases, 29 replicases (36), 91 partitioning systems (37), 41 toxin-antitoxin systems (38), and 17 restriction-modification systems (39). The largest functional category were 296 proteins that belong to T4SS systems (25), whose major 75 - - function is to mediate transfer of conjugative and mobilizable ECEs. We also identified 70 proteins belonging to T6SS systems (40), now recognized as a complex machinery that can pump effector proteins into recipient cells playing.a role in pathogenicity. Our finding of a relatively high incidence of T6SS proteins within Vibrio ECEs suggests that either T6SS play roles outside of pathogenicity or that many of the populations harbor potential pathogens (41). In addition to the above systems, which do not offer the plasmids any additional function beyond their own maintenance and transfer, we also identified a few groups of proteins with potentially beneficial functions for the hosts. Among these are 14 proteins involved in amino acid metabolism, 21 in carbohydrate metabolism and another 21 in stress responses. To what extent these proteins are functional in pathways that may benefit the host, however, remains unknown at this point. Interestingly, we also identified three 5S rRNAs on ECEs. That the detection of these genes is not an artifact of contamination with host chromosome is further confirmed by screening of these ECEs for the presence of 16S and 23S rRNAs, which are usually detected adjacent to 5S rRNAs within the same operon (42). Our results showed the 5S rRNAs occurred alone suggesting that these ECEs act as carriers for these genes. This observation further strengthens previous observations that conserved rRNA molecules can be spread by HGT with plasmids as vehicles (43, 44). 3.4.3. Distribution of ECEs among isolates and populations To investigate the distribution pattern of the ECEs among the 25 previously characterized Vibrio populations (8), we mapped ECE presence onto the phylogenetic tree of all initially screened 660 isolates. Our results show that ECEs are broadly but not evenly distributed 76 - - among host populations (Figure 1). For example, no ECE was detected in groups 5 (V. anguillarum), 7 (V. fischeri/logei), 9 (V. breoganii) and 10 (V. sp.) whereas ECEs were detected in all four isolates in group 6 (V. sp.) and six of ten of the isolates in group 14 (V. kanaloae) (Figure 1). Absence of ECEs in our collection of V. anguillarumand V. fischeri is in conflict with previously published results by other research group where ECEs were detected in both species (45, 46). What causes this difference is unknown but may reflect previous study of ECEs in the context of environmental stress (such as heavy metal contamination) on bacterial populations since the ECEs detected previously could be linked to detoxification and other stress related functions (47, 48). Next, we asked whether the lifestyle of the Vibrio strains at the time of isolation showed any correlation with ECE presence. As mentioned above, association with one of four size fractions was used to ecologically categorize the isolates where the smallest (<lpm) fraction indicated free-living lifestyle whereas presence in larger size fractions suggested attachment to particles or organisms. We found strong enrichment of ECEs in the freeliving phase with 101 ECEs (54%) being associated with the smallest fraction. This association counters the intuition that particle associated bacteria, which live in more dense communities, are more prone to acquire mobile elements. A possible explanation for this discrepancy is that the high incidence of ECEs in small size fractions reflects stability within host and/or environmental selection, rather than high transmission. On the other hand, conjugational plasmids detected in this study are primarily of the F-type, which possess T4SS that due to their thick, flexible pili can mediate plasmid transfer in liquid better than on surfaces (see section 3.4.4). -77- I..-LW I Figure 1. Distribution of ECEs among Vibrio hosts. Vibrio phylogeny based on the hsp60 protein-coding gene. Colored rings indicate size fractions and dark bars indicate the presence of at least 1 ECE. Populations were labeled in numbers indicated in the shadow areas. The closest named species to numbered populations are as follows: P1, Enterovibrio calviensis; P2, Enterovibrio norvegicus; P3, Vibrio ordalii; P4, Vibrio rumoiensis;P5, Vibrio anguillarum;P6, Vibrio sp.; P7, Vibriofischeri/logei;P8, Vibrio fischeri; P9, Vibrio breoganii;P10, Vibrio sp.; P11, Vibrio splendidus cluster 1; P12, Vibrio sp.; P13, Vibrio crassostreae;P14, Vibrio kanaloae;P15, Vibrio cyclitrophicus; P16 and P17, Vibrio tasmaniensis;P18 to P25, Vibrio splendidus. - 78- The frequency distribution of ECEs per host cell shows that nearly half carry a single ECE, about a quarter carry 2 ECEs and the remainder contain between 3 and 11 ECES (Figure 2). Some unusual combinations of ECEs were detected in several isolates. For example, strain FF472 contains a phage element, two mobilizable, one conjugational and three nontransmissible ECEs. In a separate case, strain FF112 contains, in addition to a phage element, two different types of conjugative plasmids, one mobilizable plasmid and one non-transmissible ECE. While it is common to carry multiple plasmids within one cell, it is very unusual to find more than one type of conjugative plasmids in one isolate. 40CO 30- 0) 20- 0 0 .10- E 01 2 3 4 5 6 7 8 9 10 11 number of ECEs per strain Figure 2. Distribution of the number of ECEs per strain for all Vibrio isolates with at least one ECE. Nearly 50% of the strains with ECEs have only 1 element; however the distribution has a long tail, which includes one isolate with 11 ECEs. - 79- 3.4.4. Classification of ECEs All 187 ECEs were classified into four types: conjugational, mobilizable and nontransmissible plasmids, and bacteriophage. Conjugational and mobilizable plasmids both contain relaxases while only the first also encode a T4SS, which is necessary for selftransmission (25). Based on these criteria, 24 were defined as conjugative ECEs. All of these contained at least five T4SS coding genes but only 12 also had a detectable relaxase gene. 38 putative mobilizable plasmids were identified based on the presence of only a relaxase gene (Table 2), which presumably require a T4SS to act in trans for transmission and hence at least occasional coexistence with a conjugative plasmid. Phages were identified by blasting the ECE sequences against the ACLAME Database. ECEs containing only genes annotated as of phage origin were defined as such. However, some ECEs were found to carry both plasmid and phage proteins, in which case, we defined the ECEs as bacteriophage, if phage proteins accounted for more than 20% of the total proteins and their presence are 1.5 times more than plasmid proteins. In total, 22 phage ECEs were identified in our collection based on these criteria. In addition to phages, and conjugative and mobilizable plasmids, there were 103 ECEs that did not meet criteria of any of these three categories and were therefore, in accordance with the scheme proposed by Smillie et al. (13), defined as "non-transmissible" ECEs. Overall, the ECE composition in our Vibrio collection is, with 14.5% conjugative, 23.0% mobilizable and 55% non-transmissible ECEs (Table 3 and 4), slightly different from the Proteobacterial average, which is 20% conjugative, 30% mobilizable and 50% non-transmissible ECEs (3). We further categorized the types of T4SS found in conjugative plasmids since different types possess properties informative for interpretation of ECE mobility. Sequence -80- comparison suggested that 16 T4SS were of the F type, 6 of the T type and 2 of the G type. To date, Type G T4SS has not been fully characterized and its presence has only been reported in integrative conjugative elements (ICEs) and very rarely in plasmids (http://dbmml.sjtu.edu.cn/SecReT4/index.php). Type F have thick flexible pili that allows high frequency of conjugation in liquid whereas type T has rigid pili that allow high frequency conjugation only on solid surfaces (49). The prevalence of the F type is consistent with preferential occurrence of ECEs in free-living cells and suggests there is selection for mating in liquid rather than in biofilms. Moreover, the broadest-host range plasmids are found among type T, whereas F-like plasmids tend to be narrow host-range, suggesting that the conjugative plasmids in this dataset may be limited to fairly closely related hosts. We also identified one ECE (m055) from strain FF112 that seems to contain 3 T4SS: 2 type F and 1 type T. This is unexpected since multiple T4SS in a single plasmid are atypical. Although this might point to an interesting biological function, it will have to be confirmed that ECE m055 does not represent an artifact assembled from multiple, independent ECEs. Additionally, the same isolate harbors ECE m058, which contains a type G T4SS so that 3 types of T4SS co-exist within the same host. Taken together, this study showed for the first time evidence of coexistence of more than two types of conjugative plasmids in one single strain. 3.4.5. ECE families and analysis of transmissibility Classification of all 187 ECEs into families according to their sequence similarity apportioned 93 of the ECEs among 31 multi-member families, while 94 ECEs remained singletons. The numbers of ECEs detected within multi-member families ranged from 2 to 13 (Figure 3). Family I, a siderophore synthesis plasmid family, will be described in detail - - 81 in the next session (Ch.3.4.6.) Family II contained a disproportionally large number of members, represented as the largest dot in Figure 3. Further analysis of the protein sequences of ECEs from family II identified 27 ORFs, among which three share high sequence similarity with phage genes and 12 other genes are homologue to phage genes, 0,3- 0 O * **4 phage E Emobilizable *non-transmissible 0) 06 Econjugative 13 ECEs C 2 ECEs 2 5 I I I I 10 20 50 100 Size (Kbp) Figure 3. ECE family diversity as a function of family size, ECE size and classes. Average percentage identities are calculated for each family and plotted against the size of the element in base pairs. The size of the points indicates the number of members in the ECE family. The smallest size is 2 and the largest 13 (a widespread phage family). The two largest families are labeled as I and II, representing a siderophore plasmid family and a phage family, respectively. suggesting that this may be a new phage family that can propagate in a plasmid-like fashion. The categorization into families also allows analysis of distribution of specific types of ECEs among the Vibrio populations in order to infer host-range and evolutionary turnover. A network analysis of ECE distribution among potential hosts shows that while many multi-family ECEs are population-specific, about 63% are distributed across -82- populations suggesting broad host-ranges (Figure 4). This is especially the case for the multi-member phage families, which appear to be able to propagate in diverse hosts. Surprisingly, a large portion of the non-transmissible ECEs, however, also displays broad host range. Moreover, because most highly related ECEs are also in the non-transmissible B) A - - - - Non-transmissible Mobilizable Phage Figure 4. ECE family distribution across the Vibrio phylogeny. A) Network connections link genotypes sharing ECE families with an average percentage identity > 60% and their classification according to backbone genes and transmissibility. B) Network showing ECE families with an average percentage identity > 97% and their classification according to backbone genes and transmissibility. category and because their nucleotide diversity is generally lower than that of their hosts, transfer, rather than vertical inheritance appears to be the rule for this type of ECE. This pattern of primarily low nucleotide diversity among members of ECE families also suggests rapid evolutionary turnover, i.e., ECEs arise, spread and are lost frequently. Finally, considering that most of the closely related ECEs are classified as nontransmissible, the conclusion of rapid turnover is puzzling and suggests that a currently unrecognized, more direct transfer mechanism is at work. -83- 3.4.6. Genome dynamics of ECEs Because of the seemingly rapid turnover of ECEs, we investigated to what extent ECEs themselves are evolutionarily stable entities by constructing a network of recently exchanged genes. To approximate relatively recent transfer, we first clustered genes into * * II Ar 0 0 Proteins I Vibrio crassostreae Conjugative A Vibrio ordall V Vibrio sp. N VIbrIo fischer Mobilizable Non-transmissible Phages Vibriosplendidus VIbrio tasmanlensis Vibriocyclitrophicus Vibrio kanaloae Enterovibrio norvegicus Enterovlbrio calviensis-like Figure 5. ECE genome cluster network. ECEs are connected through proteins (blue dots) shared by at least two ECEs at 97% sequence similarity. The diameter of an ECE symbol indicates the relative size of the genome of the ECE. Three ECE clusters (I, II and III) were selected using FT ClusterNSee as examples of ECE genome dynamics (see Figure 6). closely related families (97% in sequence identity) and then determined how many ECEs share these families. This shows a network that suggests high incidence of gene exchange within a hub of strongly connected conjugative plasmids, which share many types of genes -84- with T4SS being most frequent. Many non-transmissible and a few mobilizable plasmids are also strongly connected to this hub while links to the remainder of plasmids and, especially, phage are sparse suggesting relatively minor gene exchange (Figure 5). Several plasmid and one phage family are not connected to the network. The structure and connections of non-transmissible and mobilizable plasmids within this network allows some conclusions as to their evolutionary origins, which we illustrate with the following examples. The first example is a cluster of non-transmissible plasmids (labeled I in Figure 5 and corresponding to family I in Figure 3), which share a cluster of siderophore genes with otherwise unrelated plasmids detected in several other Vibrio species (50, 51). The circular plasmids of family I contain three modules (Figure. 6 I). The first encodes a plasmid partition system (blue ORFs) that allows the plasmid to be maintained in its host strain. The function of the second module (orange) remains unidentified. The third module (green) encodes a protein complex that shares high sequence identity with siderophore biosynthesis genes from a previously characterized plasmid family from V. vulnificus and V. parahaemolyticus(50, 51) suggesting that these genes are mobile among ECEs. Finally, plasmids within family I are somewhat atypical since they display higher sequence divergence suggesting persistence over longer evolutionary times than many of the other -85 - non-transmissible plasmids, which consist of clusters of highly identical genomes. 1 2JN 4PO 6Pfl 80aO 1QOO 1.O 14.000 1"AN3 1".OO 2"WNI 2%OO 24AOS 24000 24000 3U,000 32.ao M0013 m303F1 rn31q MM moo Io III 1 1,000 2,000 3,000 4,000 5,000 11 6,000 7,000 8,000 9,000 10,000 11.000 12,000 ,m173, m176, m184 A , mO83 B C D E m135, m137, m138, m142 m120, m125 m156 ------ -------- B mOOl, m099 D m181 m052 E mon1, moS, mo C -- Figure 6. Sequence alignment of ECEs in three representative clusters from the network analysis. Homologous sequence regions are highlighted in grey as in cluster I and Ill. The corresponding network of each cluster is also included. Cluster I is a highly conserved siderophore synthesis plasmid family. Cluster 11 contains two conjugative and one non-transmissible ECEs. Cluster Ill, contains 23 mobilizable and one non-transmissible ECEs, were grouped into 5 sub clusters, indicated as A to E. -WNWON"im" - .- . .. ............ MW The second example (cluster II in Figure 6 II) illustrates that gene gain or loss may change one plasmid category into another. ECE m161 and m096 are conjugative plasmids that share almost all of the backbone genes (blue ORFs), which are responsible for conjugation. They differ, however, in two relatively large regions, which are present in m161 but absent in m096. These regions are also shared by the non-transmissible plasmid m031, which is overall more similar to m161 except for the lack of genes responsible for conjugative transfer. This indicates that non-transmissible plasmids may originate from conjugative plasmids and vice versa via gain or loss of T4SS and relaxase genes. The final example (cluster III in Figure 6 111) illustrates a case where five unrelated families containing 23 mobilizable plasmids are connected only by few shared backbone genes. Family A, B, C and D have mostly Type P relaxase genes in common while the remainder of their genome is unrelated. Even more indirect is the connection of Family E to the other families in this group. It shares Type C relaxase genes with ECE m052, which has some accessory genes in common with ECE m181. The latter is connected to Families A, B, C and D via Type P relaxase genes. The above three examples illustrate that the overall architecture of plasmids can change over relatively short evolutionary timespans and that transitions between different categories of plasmids are possible via gain or loss of relaxase and T4SS systems. Considering the high mobility of ECEs among different host populations detected in this study, this may lead to rapid transfer and reassortment of functions of potential benefit to hosts. -87- 3.5. Summary In summary, we isolated 187 ECEs from 660 strains categorized into 25 different ecologically and genetically cohesive populations. We identified the following elements: 22 bacteriophages, 24 conjugative ECEs, 38 mobilizable ECEs. The latter require cooccurring conjugative plasmids for successful transfer. Moreover, 103 non-transmissible ECEs that do not encode any genes for self-transfer were detected. We further found that ECEs were significantly enriched in free-living Vibrio cells, suggesting association of ECEs with host environment. Our data show that non-transmissible plasmids appear to be most common among Vibrio ECEs and that they may have been transferred recently and frequently through mechanisms yet to be uncovered. This suggests that these plasmids have most certainly been misnamed. The high incidence of putative temperate phages that appear to propagate as plasmids is surprising. Although plasmid-like phages have been previously described, the phages detected here appear novel and their prevalence suggests a previously unanticipated role in the marine environment. Finally, the dynamic property of ECE genomes offer their host strains a rich supply of external genetic materials that may allow the rapid assembly of different functions. -88- Table 1. An extend bar codes set beyond Roche Titanium-compatible bar codes kit Sequence (5'-3') MID names MID 13 MID 14 MID 15 MID 16 MID 17 MID 18 MID 19 MID 20 MID 21 MID 22 MID 23 MID 24 MID 25 MID 26 MID 27 MID 28 MID 29 MID 30 MID 31 MID 32 MID 33 MID 34 MID 35 MID 36 MID 37 MID 38 CCATCTCATCCCTGCGTGTCTCCGACTCAGCATAGTAGTGAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGCGAGAGATACAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGATACGACGTAAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGTCACGTACTAAGATGTGT ATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGCGTCTAGTACAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTACGTAGCAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGTGTACTACTCAGATGTGT ATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGACGACTACAGAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGCGTAGACTAGAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGTACGAGTATGAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGTACTCTCGTGAGATGTGT ATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGTAGAGACGAGAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGTCGTCGCTCGAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGACATACGCGTAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGACGCGAGTATAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGACTACTATGTAGATGTGT ATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGACTGTACAGTAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGAGACTATACTAGATGTGT ATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGAGCGTCGTCTAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGAGTACGCTATAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGATAGAGTACTAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGCACGCTACGTAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGCAGTAGACGTAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGCGACGTGACTAGATGTG TATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGTACACACACTAGATGTGT ATAAGAGACAG CCATCTCATCCCTGCGTGTCTCCGACTCAGTACACGTGATAGATGTG TATAAGAGACAG - 89- Table 2. The number, GC content and size of contigs ECE ID Contig ID GC Contig Size m001 m002 m003 1 1 1 2 1 2 1 2 3 1 1 2 1 2 3 1 2 3 1 1 2 3 4 1 1 2 3 4 5 6 7 8 1 2 3 1 1 2 0.41 0.47 0.41 0.42 0.37 0.42 0.52 0.46 0.43 0.41 0.42 0.36 0.53 0.45 0.47 0.41 0.41 0.41 0.41 0.53 0.47 0.41 0.52 0.41 0.42 0.41 0.4 0.43 0.42 0.42 0.42 0.41 0.42 0.44 0.47 0.44 0.42 0.39 48.4 6.2 30.7 15.9 3.9 19 6.5 4.2 13.7 48.4 5.2 5.1 14.2 12.7 11.8 13.3 9.4 15 48.4 7.9 5.8 3.9 4.2 48.4 27 18 20.3 20 36.8 23.5 18.7 19 2.8 3.1 2.8 95.4 19.6 7.6 m004 m005 m006 m007 m008 m009 m010 m011 m012 m013 m014 m015 m016 m017 m018 m019 m020 m021 m022 m023 m024 m025 m026 m027 m028 m029 m030 m031 m032 m033 m034 m035 m036 m037 -90- 3 4 1 2 3 1 2 3 1 1 1 2 1 1 1 1 2 3 1 1 2 3 4 1 2 3 1 1 1 2 1 1 2 1 2 1 1 1 2 0.41 10.7 0.41 5.8 0.45 4.2 0.51 3.4 0.51 3.4 0.42 11 0.42 21.9 0.42 15.9 0.51 8.1 0.44 40.6 0.42 6.6 0.43 6.7 0.43 17 0.41 8.1 42.4 0.5 0.51 16.5 0.46 13.7 0.41 9.6 37.9 0.5 0.46 11.8 0.47 8.1 40.4 0.5 0.46 9 0.47 10.5 0.49 10 4.7 0.5 0.44 95.9 0.41 24 13.3 0.4 5.6 0.4 0.48 47.3 0.46 33.4 0.48 20.9 0.47 24.2 0.43 14.5 0.42 19 0.51 16 0.48 23 0.46 14 m038 m039 m040 m041 m042 m043 m044 m045 m046 m047 m048 m049 m05O m051 m052 m053 m054 m055 m056 m057 m058 m059 m060 m061 m062 m063 m064 m065 3 1 1 1 2 3 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 2 1 1 2 3 4 5 6 1 1 2 1 2 1 1 1 1 1 1 1 0.42 0.42 0.42 0.48 0.43 0.43 0.44 0.44 0.45 0.42 0.44 0.49 0.45 0.41 0.41 0.41 0.41 0.41 0.46 0.4 0.47 0.45 0.41 0.46 0.41 0.43 0.43 0.49 0.52 0.42 0.42 0.42 0.48 0.46 0.42 0.45 0.47 0.46 0.51 0.43 0.48 13.2 17.6 20.1 7 6.1 5.8 35 35 43.1 18.3 11.4 35.8 68.3 45.8 24 36.9 12.2 24.6 47.1 12.6 32.8 11.8 24.5 64.3 25.2 20.6 15.4 25.4 17.5 19.4 46.9 17.3 59.2 48.6 17.8 25.4 12.6 62.6 17.1 21.9 16.8 m066 m067 m068 m069 m070 m071 m072 m073 m074 m075 m076 m077 m078 m079 m080 m081 m082 m083 m084 m085 m086 m087 m088 m089 m090 m091 -91- 1 1 1 1 2 1 1 1 1 1 2 3 4 5 1 1 2 3 4 5 6 1 1 1 1 1 1 2 1 1 1 1 1 2 3 4 1 1 1 1 2 0.41 0.48 0.47 0.46 0.48 0.48 0.48 0.42 0.45 0.49 0.44 0.47 0.51 0.44 0.39 0.51 0.51 0.45 0.46 0.52 0.46 0.42 0.42 0.46 0.43 0.36 0.42 0.44 0.45 0.39 0.41 0.5 0.44 0.44 0.44 0.44 0.38 0.38 0.44 0.42 0.41 24.5 48.1 12.6 32.8 21.7 16.6 48.2 20 43.1 9.5 3 11.3 5.6 3.5 5.2 1.7 2.7 2.5 1.5 2.3 2.5 2.8 9.6 2.1 5.1 5.8 2.6 2.4 2.1 3.6 2.5 2.2 3.5 5.1 2.5 4.4 3.5 2.1 5.5 1.7 1.8 m092 m093 m094 m095 m096 m097 m098 m099 mlOO m101 m102 m103 m104 m105 m106 m107 m108 m109 ml10 mill m112 m113 m114 ml15 m116 m117 m118 m119 m120 m121 m122 m123 m124 m125 3 1 1 1 1 1 1 2 1 1 1 1 1 1 2 3 4 5 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.43 0.51 0.41 0.4 0.41 0.39 0.42 0.52 0.4 0.44 0.44 0.42 0.38 0.49 0.48 0.5 0.48 0.5 0.44 0.42 0.39 0.4 0.39 0.42 0.45 0.45 0.37 0.4 0.46 0.42 0.47 0.48 0.52 0.47 0.41 0.43 0.4 0.37 0.37 0.4 0.42 m126 m127 m128 m129 m130 m131 2.8 37.6 2.9 6.9 4.5 23 4.7 6 3.3 3.2 3.6 4.8 6 17.3 14.8 19.3 14.1 10.8 13.6 2.7 3.7 2.3 5 8.6 36.6 43.3 5.8 3 4.7 1.5 3 7.6 12.2 5.3 3.5 4.8 3 5.8 5.8 3 4.8 m132 m133 m134 m135 m136 m137 m138 m139 m140 m141 m142 m143 m144 m145 m146 m147 m148 m149 m150 mi51 m152 m153 m154 m155 m156 m157 m158 m159 m160 m161 m162 -92- 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 1 0.37 0.52 0.47 0.44 0.44 0.39 0.42 0.4 0.42 0.4 0.42 0.42 0.44 0.42 0.42 0.39 0.44 0.39 0.42 0.39 0.39 0.39 0.44 0.42 0.43 0.42 0.42 0.5 0.41 0.44 0.42 0.44 0.46 0.39 0.43 0.4 0.42 0.42 0.4 0.4 0.43 5.8 12.1 10.6 13.6 13.6 4 7.6 4 8.3 2.5 2.8 5.1 8.5 5.1 5.1 7.1 13.6 2.5 5.1 2.6 2.6 2.5 8.5 54.4 12.1 3 1 10.6 14.1 17.2 51.9 75.1 2.4 2.6 4.4 3.2 13.5 53.1 3.2 31.2 6 m163 m164 m165 m166 m167 m168 m169 m170 m171 m172 m173 m174 1 1 2 3 4 1 1 1 1 1 1 1 1 1 1 0.44 0.43 0.41 0.44 0.47 0.4 0.4 0.39 0.41 0.5 0.43 0.5 0.42 0.42 0.42 12.8 18.2 74.5 55.6 22.7 3 5.4 3.3 4.8 42.4 8.4 42.4 4.4 4.4 5.6 m175 m176 m177 m178 m179 m180 m181 m182 m183 m184 m185 m186 m187 -93- 1 1 1 1 1 2 1 1 1 1 1 1 1 1 0.43 0.42 0.42 0.42 0.39 0.41 0.43 0.42 0.41 0.39 0.42 0.41 0.39 0.42 38.7 4.4 5.6 5.6 1.7 2.4 8.4 7.2 2.9 3.3 4.4 2.9 3.4 2.3 Table 3. Classification, host and population of ECEs ECE ID m001 m002 m003 m004 m005 m006 m007 m008 m009 m010 m011 m012 Classification non-trans phage non-trans phage non-trans non-trans mobilization conjugation non-trans non-trans conjugation non-trans Host IS_120 1S_269 IS_269 5F_7 5S_122 5S_149 5S_22 5S_235 5S_239 5S_240 5S_5 IS_77 Season S S S F S S S S S S S S Fraction 1 1 1 5 5 5 5 5 5 5 5 1 Species Vibrio kanaloae Vibrio sp. Vibrio sp. Vibriofischeri Vibrio splendidus Vibrio kanaloae Vibrio splendidus cluster 1 Vibrio splendidus Vibrio kanaloae Vibrio kanaloae Vibrio splendidus Vibrio kanaloae m013 conjugation FF_110 F F Vibrio sp. m014 m015 m016 m017 m018 m019 m020 m021 m022 m023 m024 m025 m026 m027 m028 m029 m030 m031 m032 m033 m034 m035 m036 m037 m038 m039 non-trans non-trans conjugation phage non-trans conjugation conjugation non-trans non-trans non-trans phage conjugation phage conjugation non-trans non-trans phage non-trans non-trans mobilization non-trans phage conjugation conjugation phage phage FF_136 FF_145 FF_152 FF_152 FF_174 FF_174 FF_273 FF_286 FF_286 FF_286 FF_304 FF_308 FF_308 FF_375 FF_375 FF_61 IF_145 IF_145 IF_145 IF_145 1F_145 IF_145 1F_145 IF_146 IF_146 1F_146 F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F 1 1 1 1 1 1 1 1 1 1 Vibrio sp. Vibrio splendidus Vibrio sp. Vibrio sp. Vibrio tasmaniensis Vibrio tasmaniensis Vibrio sp. Vibrio sp. Vibrio sp. Vibrio sp. Vibrio sp. Vibrio splendidus Vibrio splendidus Vibrio tasmaniensis Vibrio tasmaniensis Vibrio cyclitrophicus Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis -94- m040 mobilization IF_146 F 1 Vibrio tasmaniensis m041 m042 m043 m044 m045 m046 m047 m048 m049 m05O m051 m052 m053 m054 mO55 m056 m057 m058 m059 m060 m061 m062 m063 m064 m065 m066 m067 m068 m069 m070 m071 m072 m073 m074 m075 m076 m077 m078 m079 m080 non-trans non-trans non-trans phage non-trans phage conjugation phage non-trans phage non-trans mobilization conjugation non-trans conjugation phage mobilization conjugation phage non-trans non-trans mobilization conjugation mobilization non-trans phage non-trans non-trans mobilization non-trans non-trans phage non-trans conjugation non-trans non-trans non-trans non-trans mobilization non-trans IF_189 IF_279 5S_118 5S_214 5S_214 55_214 5S_214 5S_214 5S_239 5S_239 5S_268 FF_112 FF_112 FF_112 FF_112 FF_112 FF_210 FF_307 FF_307 FF_351 FF_472 FF_472 FF_472 FF_472 FF_472 FF_472 FF_472 FF_482 FF_482 FF_482 FF_482 FF_482 ZS_101 1F_145 1F_145 1F_187 IF_253 1F_292 1F_292 1F_292 F F S S S S S S S S S F F F F F F F F F F F F F F F F F F F F F S F F F F F F F 1 1 5 5 5 5 5 5 5 5 5 F F F F F F F F F F F F F F F F F F F F F Z 1 1 1 1 1 1 1 Vibrio sp. Vibrio tasmaniensis Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio kanaloae Vibrio kanaloae Vibrio splendidus Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio sp. Vibrio sp. Enterovibrio norvegicus Enterovibrio norvegicus Enterovibrio norvegicus Enterovibrio norvegicus Enterovibrio norvegicus Enterovibrio norvegicus Enterovibrio norvegicus Enterovibrio norvegicus Vibrio sp. Vibrio sp. Vibrio sp. Vibrio sp. Vibrio sp. Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis - 95- m081 m082 m083 m084 m085 m086 m087 m088 m089 m090 m091 m092 m093 m094 m095 mobilization non-trans mobilization non-trans non-trans non-trans non-trans mobilization mobilization non-trans non-trans phage non-trans non-trans mobilization FF_3 FF_3 FF_3 FF_3 FF_3 FF_7 ZF_221 ZF_221 ZF_221 5S_149 FF_113 FF_113 FF_113 FF_1 FF_1 F F F F F F F F F S F F F F F F F F F F F Z Z Z 5 F F F F F Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Enterovibrio norvegicus Vibriofischeri Vibriofischeri Vibriofischeri Vibrio kanaloae Enterovibrio calviensis-like Enterovibrio calviensis-like Enterovibrio calviensis-like Vibrio splendidus Vibrio splendidus m096 conjugation FF_291 F F Vibrio sp. m097 m098 m099 m100 m101 m102 m103 m104 m105 m106 m107 m108 m109 mI10 m111 ml 12 m113 ml 14 m 15 m116 m117 mobilization non-trans mobilization non-trans non-trans mobilization non-trans non-trans non-trans non-trans non-trans non-trans non-trans phage mobilization non-trans mobilization non-trans mobilization non-trans conjugation FF_304 FF_304 FF_32 FF_59 ZF_76 ZF_76 ZS_138 IF_148 IF_263 IF_279 IF_279 1F_279 1F_279 1F_97 5F_20 5F_20 5F_275 5F_275 5F_275 5S_242 FF_191 F F F F F F S F F F F F F F F F F F F S F F F F F Z Z Z 1 1 1 1 1 1 1 5 5 5 5 5 5 F Vibrio sp. Vibrio sp. Enterovibrio norvegicus Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio splendidus Vibrio sp. Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio sp. Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio splendidus Vibrio splendidus m118 non-trans FF_191 F F Vibrio splendidus m119 m120 m121 non-trans mobilization non-trans ZF_193 ZF_193 ZF_45 F F F Z Z Z Vibrio tasmaniensis Vibrio tasmaniensis Vibrio sp. -96- m122 m123 m124 m125 m126 m127 m128 m129 m130 m131 m132 m133 m134 m135 m136 m137 m138 m139 m140 m141 m142 m143 m144 m145 m146 m147 m148 m149 m150 m151 m152 m153 m154 m155 m156 m157 m158 m159 m160 m161 m162 mobilization mobilization non-trans mobilization mobilization conjugation non-trans non-trans non-trans non-trans non-trans non-trans non-trans mobilization non-trans mobilization mobilization mobilization non-trans non-trans mobilization non-trans non-trans non-trans non-trans non-trans mobilization non-trans non-trans conjugation non-trans conjugation non-trans non-trans mobilization non-trans non-trans non-trans non-trans conjugation non-trans ZF_45 ZF_53 ZF_53 ZF_6 ZS_138 ZS_190 ZS_190 IF_164 IF_169 IF_243 1F_255 IF_255 1F_260 IF_260 IF_260 1F_260 IF_275 IF_283 1F_9 FF_1 FF_1 FF_1 FF_1 FF_1 FF_1 FF_146 FF_146 FF_164 FF_164 FF_172 FF_172 FF_1 FF_1 FF_1 FF_24 FF_24 FF_267 FF_267 FF_31 FF_32 FF_59 F F F F S S S F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F - Z Z Z Z Z Z Z 1 1 1 1 1 1 1 1 1 1 1 1 F F F F F F F F F F F F F F F F F F F F F F 97- Vibrio sp. Vibrio sp. Vibrio sp. Vibrio sp. Vibrio splendidus Vibrio splendidus cluster 1 Vibrio splendidus cluster 1 Enterovibrio norvegicus Vibrio tasmaniensis Vibrio crassostreae Vibrio crassostreae Vibrio crassostreae Enterovibrio norvegicus Enterovibrio norvegicus Enterovibrio norvegicus Enterovibrio norvegicus Vibrio splendidus Vibrio splendidus Vibrio tasmaniensis Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio sp. Vibrio sp. Enterovibrio calviensis-like Enterovibrio calviensis-like Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio splendidus Vibrio tasmaniensis Vibrio tasmaniensis Vibrio ordalii Enterovibrio norvegicus Vibrio tasmaniensis m163 m164 m165 m166 m167 m168 m169 m170 m171 m172 m173 m174 m175 m176 m177 m178 m179 m180 m181 m182 m183 m184 m185 m186 m187 non-trans conjugation non-trans non-trans phage mobilization phage non-trans phage mobilization mobilization non-trans conjugation mobilization non-trans non-trans non-trans non-trans mobilization non-trans non-trans mobilization non-trans non-trans non-trans FF_59 FF_59 ZF_53 IS_139 FF_233 FF_249 FF_262 FF_266 FF_266 FF_268 FF_268 FF_268 FF_268 FF_268 FF_268 FF_268 FF_307 FF_371 FF_371 FF_376 FF_376 FF_376 FF_376 FF_376 FF_451 F F F S F F F F F F F F F F F F F F F F F F F F F -98- F F Z 1 F F F F F F F F F F F F F F F F F F F F F Vibrio tasmaniensis Vibrio tasmaniensis Vibrio sp. Vibrio crassostreae Vibrio tasmaniensis Vibrio tasmaniensis Enterovibrio norvegicus Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio sp. Vibrio sp. Vibrio sp. Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio tasmaniensis Vibrio ordalii Table 4.Summary of strains carrying ECEs Host Number of ECEs Phage Conjugative ECEs Mobilizable Non-transmissible ECEs ECEs 1F_145 1F_146 1F148 1F_164 1F_169 1F187 1F_189 1F_243 1F_253 1F255 1F260 1F_263 1F275 1F_279 1F_283 1F_292 1F_9 1F_97 IS_120 1S139 1S269 1S_77 5F20 5F275 5F_7 5S118 5S122 5S_149 5S214 5S22 9 4 1 1 1 1 1 1 1 2 4 1 1 5 1 3 1 1 1 1 2 1 2 3 1 1 1 2 5 1 2 2 2 1 1 1 5S235 1 5S239 5S240 5S242 5S268 3 1 1 1 1 11 1 5S_5 FF_1 FF_110 2 4 1 1 1 1 1 1 1 2 2 1 1 5 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 3 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 2 -99- 8 FF_112 FF_113 FF_136 FF_145 FF_146 FF_152 FF_164 FF_172 FF_174 FF_191 FF_210 FF_233 FF_24 FF_249 FF_262 FF_266 FF_267 FF_268 FF_273 FF_286 FF_291 FF_3 FF_304 FF_307 FF_308 FF_31 FF_32 FF_351 FF_371 FF_375 FF_376 FF_451 FF_472 FF_482 FF_59 FF_61 FF_7 ZF_193 ZF_221 ZF_45 ZF_53 5 3 1 1 2 2 2 2 2 2 1 1 2 1 1 2 2 7 1 3 1 5 3 3 2 1 2 1 2 2 5 1 7 5 4 1 1 2 3 2 3 1 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 2 3 3 1 1 1 1 2 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 -100- 1 1 1 4 1 3 3 3 1 1 1 1 1 2 ZF_6 ZF_76 ZS_101 ZS_138 ZS_190 1 2 1 2 2 1 1 -101- 1 Reference: 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. Polz MF, Alm EJ, Hanage WP. 2013. Horizontal gene transfer and the evolution of bacterial and archaeal population structure. Trends in genetics : TIG 29:170175. Wiedenbeck J, Cohan FM. 2011. Origins of bacterial diversity through horizontal genetic transfer and adaptation to new ecological niches. FEMS microbiology reviews 35:957-976. Garcillan-Barcia MP, Alvarado A, de la Cruz F. 2011. Identification of bacterial plasmids based on mobility and plasmid population biology. FEMS microbiology reviews 35:936-956. Francia MV, Varsaki A, Garcillan-Barcia MP, Latorre A, Drainas C, de la Cruz F. 2004. A classification scheme for mobilization regions of bacterial plasmids. FEMS microbiology reviews 28:79-100. Phillips G, Funnell BE. 2004. Plasmid biology. ASM Press, Washington, D.C. Kan S, Fornelos N, Schuch R, Fischetti VA. 2013. Identification of a ligand on the Wipi bacteriophage highly specific for a receptor on Bacillus anthracis. J Bacteriol 195:4355-4364. Ravin NV. 2011. N15: the linear phage-plasmid. Plasmid 65:102-109. Hunt DE, David LA, Gevers D, Preheim SP, Alm EJ, Polz MF. 2008. Resource partitioning and sympatric differentiation among closely related bacterioplankton. Science 320:1081-1085. Hunt DE, David LD, Gevers D, Preheim SP, Alm EJ, Polz MF. 2008. Resource partitioning and sympatric differentiation among closely related bacterioplankton. Science 320:1081-1085. Preheim SP, Boucher Y, Wildschutte H, David LA, Veneziano D, Alm EJ, Polz MF. 2011. Metapopulation structure of Vibrionaceae among coastal marine invertebrates. Environ. Microbiol. 13:265-275. Preheim SP, Timberlake S, Polz MF. 2011. Merging taxonomy with ecological population prediction: a case study of Vibrionaceae. Appl Environ Microbiol 77:7195-7206. Shapiro BJ, Polz MF. 2014. Ordering microbial diversity into ecologically and genetically cohesive units. Trends Microbiol. Smillie C, Garcillan-Barcia MP, Francia MV, Rocha EP, de la Cruz F. 2010. Mobility of plasmids. Microbiology and molecular biology reviews: MMBR 74:434-452. Xue H, Xu Y, Boucher Y, Polz MF. 2012. High frequency of a novel filamentous phage, VCY phi, within an environmental Vibrio cholerae population. Appl Environ Microbiol 78:28-33. Rodrigue S, Materna AC, Timberlake SC, Blackburn MC, Malmstrom RR, Alm EJ, Chisholm SW. 2010. Unlocking short read sequencing for metagenomics. PLoS One 5:e11840. - 102 - 1. 16. 17. 18. Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Muller WE, Wetter T, Suhai S. 2004. Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res 14:1147-1159. Martin M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17:10-12. Zerbino DR. 2010. Using the Velvet de novo assembler for short-read sequencing technologies. Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] Chapter 11:Unit 1115. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva 0, Vonstein V, Wilke A, Zagnitko 0. 2008. The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9:75. Chen F, Mackey AJ, Stoeckert CJ, Jr., Roos DS. 2006. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res 34:D363-368. Dongen Sv. 2000. Graph Clustering by Flow Simulation. Graph Clustering by Flow Simulation. University of Utrecht. Ning Z, Cox AJ, Mullikin JC. 2001. SSAHA: a fast search method for large DNA databases. Genome Res 11:1725-1729. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078-2079. Guglielmini J, Quintais L, Garcillan-Barcia MP, de la Cruz F, Rocha EP. 2011. The repertoire of ICE in prokaryotes underscores the unity, diversity, and ubiquity of conjugation. PLoS Genet 7:e1002222. Guglielmini J, de la Cruz F, Rocha EP. 2013. Evolution of conjugation and type IV secretion systems. Molecular biology and evolution 30:315-331. Finn RD, Clements J, Eddy SR. 2011. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29-37. Eddy SR. 2011. Accelerated Profile HMM Searches. PLoS Comput Biol 7:e1002195. Leplae R, Lima-Mendez G, Toussaint A. 2010. ACLAME: a CLAssification of Mobile genetic Elements, update 2010. Nucleic Acids Res 38:D57-61. Delcher AL, Bratke KA, Powers EC, Salzberg SL. 2007. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23:673-679. Fischer S, Brunk BP, Chen F, Gao X, Harb OS, lodice JB, Shanmugam D, Roos DS, Stoeckert CJ, Jr. 2011. Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] Chapter 6:Unit 6 12 11-19. Li L, Stoeckert CJ, Jr., Roos DS. 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178-2189. - 103 - 19. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. Spinelli L, Gambette P, Chapple CE, Robisson B, Baudot A, Garreta H, Tichit L, Guenoche A, Brun C. 2013. Clust&See: a Cytoscape plugin for the identification, visualization and manipulation of network clusters. Bio Systems 113:91-95. Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. 2011. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27:431432. Kohl M, Wiese S, Warscheid B. 2011. Cytoscape: software for visualization and analysis of biological networks. Methods in molecular biology 696:291-303. Leplae R, Hebrant A, Wodak SJ, Toussaint A. 2004. ACLAME: a CLAssification of Mobile genetic Elements. Nucleic Acids Res 32:D45-49. Petersen J. 2011. Phylogeny and compatibility: plasmid classification in the genomics era. Archives of microbiology 193:313-321. Schumacher MA. 2012. Bacterial plasmid partition machinery: a minimalist approach to survival. Curr Opin Struct Biol 22:72-79. Schuster CF, Bertram R. 2013. Toxin-antitoxin systems are ubiquitous and versatile modulators of prokaryotic cell fate. FEMS microbiology letters 340:7385. Vasu K, Nagaraja V. 2013. Diverse functions of restriction-modification systems in addition to cellular defense. Microbiology and molecular biology reviews: MMBR 77:53-72. Silverman JM, Brunet YR, Cascales E, Mougous JD. 2012. Structure and regulation of the type VI secretion system. Annual review of microbiology 66:453-472. Records AR. 2011. The type VI secretion system: a multipurpose delivery system with a phage-like machinery. Molecular plant-microbe interactions: MPMI 24:751-757. Blattner FR, Plunkett G, 3rd, Bloch CA, Perna NT, Burland V, Riley M, ColladoVides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y. 1997. The complete genome sequence of Escherichia coli K-12. Science 277:1453-1462. Battermann A, Disse-Kromker C, Dreiseikelmann B. 2003. A functional plasmidborne rrn operon in soil isolates belonging to the genus Paracoccus. Microbiology 149:3587-3593. Kunnimalaiyaan M, Stevenson DM, Zhou Y, Vary PS. 2001. Analysis of the replicon region and identification of an rRNA operon on pBM400 of Bacillus megaterium QM B1551. Mol Microbiol 39:1010-1021. Tialnen T, Pedersen K, Larsen JX. 1995. Ribotyping and plasmid profiling of Vibrio anguillarum serovar 02 and Vibrio ordali. The Journal of applied bacteriology 79:384-392. Dunn AK, Martin MO, Stabb EV. 2005. Characterization of pES213, a small mobilizable plasmid from Vibriofischeri. Plasmid 54:114-134. Reva 0, Bezuidt 0. 2012. Distribution of horizontally transferred heavy metal resistance operons in recent outbreak bacteria. Mobile genetic elements 2:96100. - 104 - 32. 49. 50. 51. van Hal SJ, Wiklendt A, Espedido B, Ginn A, Iredell JR. 2009. Immediate appearance of plasmid-mediated resistance to multiple antibiotics upon antibiotic selection: an argument for systematic resistance epidemiology. J Clin Microbiol 47:2325-2327. Bradley DE, Taylor DE, Cohen DR. 1980. Specification of surface mating systems among conjugative drug resistance plasmids in Escherichia coli K-12. J Bacteriol 143:1466-1470. Naka H, Liu M, Actis LA, Crosa JH. 2013. Plasmid- and chromosome-encoded siderophore anguibactin systems found in marine vibrios: biosynthesis, transport and evolution. Biometals : an international journal on the role of metal ions in biology, biochemistry, and medicine 26:537-547. Naka H, Lopez CS, Crosa JH. 2008. Reactivation of the vanchrobactin siderophore system of Vibrio anguillarum by removal of a chromosomal insertion sequence originated in plasmid pJM1 encoding the anguibactin siderophore system. Environ Microbiol 10:265-277. - 105 - 48. 106 - - CHAPTER FOUR Conclusions and Future Directions 107 - - 108 - - 4. Chapter Four: Conclusions and Future Directions The overall goal of this study was to explore the diversity and dynamics of extrachromosomal elements (ECEs) within the context of ecologically cohesive host populations. By using two model systems, a Vibrio cholerae population from a brackish water pond and several Vibrio populations co-existing in the marine coastal environment, we have found that ECEs are prevalent and diverse in Vibrio populations in the wild and include different types of plasmids and temperate phage. The distribution of ECEs is correlated to the environmental distribution indicating that host ecology may influence ECE biology. In addition, we have shown for the first time that non-transmissible ECEs are most common among Vibrio ECEs and may have been transferred recently and frequently, despite the fact that they lack genetic components typically found in mobilizable or conjugative ECEs. Finally, the gene pool of the ECEs was found to be highly dynamic with high levels of recent recombination among different types of ECEs. 4.1. Study on Vibrio cholerae model A large collection of Vibrio cholerae isolates from a brackish water pond in Massachusetts were found to contain a novel filamentous phage VCY at very high prevalence with -40% of cells being infected. These phages are part of the Inoviridaethat replicate by extrusion, a process that does not kill the host cells. The phage occurred in both the host genome integrative form (IF) and the plasmid-like replicative form (RF). The high prevalence of the phage strongly suggest a potential impact on the host population. This view is further strengthened by our finding that the frequency of the two forms of VCY+ differ in strains collected from lagoon and the pond of the brackish water system with the replicative form being much more prevalent in the latter. These findings suggest that filamentous phage can 109 - - be an important component of the environmental biology of V. choleraeand are not limited to pathogenic strains. However, several questions remain for further investigations. One question is whether the uneven distribution of the two forms of VCY from the Lagoon and the Pond samples might be due to ecological differences in the two locations. In particular, differences in salt concentration, pH and temperature may have an impact on the physiology of Vibrio cholerae. Hence the influence of these factors should be investigated to address whether they lead to any differences in distribution and induction of VCY . Integration of VCY into the host chromosomes may involve enzymes and potential cofactors that catalyze and facilitate the process. We suggest an approach where genome-wide gene expression patterns are analyzed in conjunction with phage dynamics. Alternatively, other factors may also be involved such as intracellular localization, topological structures as well as binding affinity between the phage DNA and the proteins and regulators. 4.2. Study on ecologically and genetically cohesive Vibrio model Our study is the first to perform large scale screening of ECEs in marine bacteria using Vibrio as a model. Identification of 187 ECEs in the 660 strains analyzed indicates that ECEs are prevalent in Vibrio. These ECEs occur in diverse types, including mobilizable, non-transmissible and conjugative plasmids, and bacteriophages. These ECEs are highly dynamic, predominantly having high ecological and evolutionary turnover. We also found that these ECEs are mostly enriched in isolates obtained from free-living rather than particle associated fractions, suggesting correlation between ECEs presence and lifestyle. Our data also show that non-transmissible ECEs appear to be most common among Vibrio and may have been transferred recently and frequently, thus indicating that the name for - - 110 these types of ECEs is a misnomer. However, potential mechanisms of transfer remain to be uncovered. Finally, by assessing the genetic components of ECEs, we have found that ECEs are evolutionarily highly dynamic and subject to frequent gene exchange and rearrangement. The work presented here opens several questions for future investigations. Like in the V. choleraeexample, environmental factors may select for different ECE behavior. However, our understanding of relevant factors that differentiate particle-associated and free-living host populations are still incomplete. Nonetheless, environmental conditions may be a factor with the free-living fraction being more variable both in composition and concentration of nutrients. It therefore might be speculated that a relatively unstable ecological niche may benefit from a dynamic pool of ECEs that could offer a source of additional genes, whereas ECEs may be less needed or beneficial in a relatively more stable niche such as particles. A further important question is why the so-called non-transmissible ECEs appear to be subject to frequent transfer, at least on par with plasmids that have genes encoding for transfer mechanisms while non-transmissible ECEs do not encode elements and machineries that are typically employed by mobilizable or conjugative plasmids or phages. Therefore, one could argue that they may (1) represent new bacteriophages possessing a novel infection mechanism, (2) be transferred through a mechanism yet to be discovered, or (3) have been transferred through transformation. To determine whether they are a group of new temperate phages, induction experiments could be performed and if phage particles were obtained, these could be collected and their sequence compared to the collection of ECE elements. Interestingly, evidence from other studies has shown that ECEs can be 111- packaged into small vesicles to be secreted from donor cells and taken up by recipient cells. In our collection, we can use electron microscopy to first identify these vesicles that could potentially carry ECEs and then perform genomic sequence analysis on collected particles to determine the types of the ECEs they might contain. Taken together, our results show that ECEs are prevalent in natural Vibrio populations and that environmental factors may contribute to their diversity and distribution in distinct ecological niche. Although many questions remain to be addressed in further investigation, these results have shed light on our understanding of horizontal gene transfer and their roles - 112 - in microbial evolution.