1 Stage 0 sporulation gene A (spo0A) as a molecular marker to study diversity of endospore-forming 2 Firmicutes 3 4 Tina Wunderlin, Thomas Junier, Ludovic Roussel-Delif, Nicole Jeanneret, Pilar Junier* 5 6 Laboratory of Microbiology, Institute of Biology, University of Neuchatel, CH-2000, Neuchâtel, Switzerland 7 8 * Corresponding author: E-mail: pilar.junier@unine.ch 9 10 11 Supplementary Information 12 Experimental procedures 13 14 15 16 17 DNA extraction procedure DNA from sediment samples was extracted using three different protocols: Protocol 1) Standard extraction with in situ lysis in 0.5 g sediment using the MP FastDNA® SPIN Kit for Soil (MP Biomedicals, Santa Ana, CA, USA), following the manufacturer’s instructions. 18 Protocol 2) Repeated extraction using MP FastDNA® SPIN kit with the following modifications: 0.5 g 19 sediment was subjected to three repetitive extractions with in situ lysis using bead-beating at 50 strokes per 20 second with the TissueLyser LT (QIAGEN, Hilden, Germany) for 10 min. The sample was then centrifuged and 21 900 µl of supernatant fluid was collected in a separate tube. Lysis buffer was again added to the samples before 22 subjecting to a second round of bead-beating for 5 min, then centrifuged and supernatant fluid collected. This 23 procedure was repeated a third time. The three supernatants were then processed separately following the 24 standard protocol. Finally, the three extractions were pooled and DNA precipitated with 0.3 M Na-acetate and 25 ethanol (99%) and washed with ethanol (70%) before being re-suspended in sterile water. 26 Protocol 3) Indirect extraction, separating biomass from sediment particles prior to lysis. Three grams of 27 sediment were homogenized with 15 ml of dispersing agent (1 % Na-hexa-meta-phosphate) using an Ultra- 28 Turrax homogenizer (IKA, Staufen, Germany) at 15500 rpm for two minutes to separate cells from the 29 sediment matrix. Coarse particles were then removed from the slurry by centrifugation at 20 x g for 1 minute, 30 and the supernatant (containing the cells) was collected on a nitrocellulose membrane of 0.2 µm pore size 31 (Whatman, Dassel, Germany). The cell separation step was then repeated. Filters were immediately frozen in 32 liquid nitrogen and stored at -80°C. DNA was then extracted directly from the membrane with Protocol 2. 33 DNA yield was measured with a Qubit® 2.0 Fluorometer (Invitrogen, Carlsbad, CA, USA) using the Quant-iT 34 dsDNA BR assay kit, following the manufacturer’s instructions. DNA quality was verified by agarose gel 35 electrophoresis and by spectrophotometer absorbance at 260 and 230 nm using NanoDrop ND-1000 36 (NanoDrop, Wilmington, DE, USA). 37 38 Quantification of gene copy numbers 39 Quantification of bacterial DNA in sediment extracts was performed by real-time quantitative PCR of the V3 40 region of the 16S rRNA gene with primers 338f and 520r (Ovreås et al., 1997). The qPCR mix contained 1 μL of 41 10-fold diluted DNA template (1.3 to 8.4 ng/µL), 0.3 μM of each primer and 10 μL of QuantiTect SYBR® Green 42 PCR Kit (QIAGEN). Total reaction volume of 20 μL was reached with PCR-grade water. The qPCR was run with 43 a Rotor-GeneTM 6000 instrument (QIAGEN) with the program: enzyme activation at 95°C for 5 min, 40 cycles 44 of denaturation at 95°C for 5 sec, annealing at 55°C for 15 sec and extension at 72 °C for 20 sec. Thresholds 45 (Th), Ct values, and derivatives of melting curves were determined using Rotor-Gene 6 software. All extracts 46 were analyzed in triplicate. For quantification three independent plasmid standards series with 300 to 3 000 47 000 gene copies/µL of the 16S rRNA gene of an environmental clone were included. 48 Quantification of spo0A gene was performed as mentioned above for the 16S rRNA gene but with the 49 primers spo0A655f and spo0A923r (Bueche et al., 2013). The qPCR mix contained 1 μL of 10-fold diluted DNA 50 sample (1.3 to 8.4 ng/µL), 0.76 μM of each primer and 1 x QuantiTect SYBR® Green PCR Kit. Total reaction 51 volume of 20 μL was reached with PCR-grade water. The program differed in an annealing at 52°C for 30 sec 52 and extension at 72 °C for 30 sec. For quantification three independent plasmid standards series with 30 to 300 53 000 gene copies/µL of spo0A gene of B. subtilis were included. 54 55 Amplification and sequencing of the spo0A gene 56 One extract from each DNA extraction protocol from both sediments (Lake Geneva and Baikal) was 57 58 subjected to amplicon sequencing of the spo0A gene. Degenerate primers amplifying a 602 bp sequence of the spo0A gene were designed for this study. Primer 59 sequences were spo0A166f (5’-GATATHATYATGCCDCATYT-3’) and spo0A748r (5’- 60 GCNACCATHGCRATRAAYTC-3’). PCR reactions were performed with 0.5 ng DNA template, 1 x reaction buffer 61 (TaKaRa Bio, Shiga, Japan), 3 mM MgCl2, 10 µg bovine serum albumin (BSA; New England Biolabs, Ipswich, MA, 62 USA), 1 U of Ex Taq Polymerase (TaKaRa), 200 µM of each dNTP and 1 µM of each primer in a total reaction 63 volume of 50 µl, completed with PCR-grade water. Negative controls (1 µl PCR-grade water) and positive 64 controls (1 ng Paenibacillus alvei DNA template) were included in all reactions. Reactions were run on an 65 Arktik Thermo Cycler (Thermo Fisher Scientific, Vantaa, Finland) with the following temperature program: 66 initial denaturation at 94°C for 5 min; then 10 cycles of denaturation at 94°C for 30 min, touchdown annealing 67 starting at 55°C with decrease of 0.3°C per cycle for 30s and elongation at 72°C for 1 min; followed by 30 cycles 68 of denaturation at 94°C for 30 min, annealing at 52°C for 30s and elongation at 72°C for 1 min; and a final 69 extension at 72°C for 5 min. Prior to the amplification of environmental samples, the primers were tested on 70 pure strains of endospore-forming and non sporulating bacteria (Supplementary Table 2). 71 72 73 74 Amplified sediment samples were sent for barcode amplicon sequencing with Roche GS FLX+ (Eurofins MWG Operon, Ebersberg, Germany). 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 Supplementary Figure 1. Comparative phylogenetic analysis of 16S rRNA gene sequences and six conserved sporulation-related genes (spo0A, spoIVB, spoVAC, spoVAD, spoVT and gpr)(spore proteome) for 27 sporeforming Firmicutes with a complete genome sequence reported and annotated. Alignments were constructed with MAFFT (Katoh et al. 2005) or Muscle (Edgar 2004) using default parameters. Multiple-FastA alignments were converted to Phylip format with the seqret program from the EMBOSS package (Rice et al. 2000). Phylogenies were constructed from Phylip-formatted alignments with PhyML (Guindon and Gascuel 2003), using default parameters, except the following: JTT+Γ substitution model for proteins and GTR+ Γ model for nucleic acids; 4 classes of substitution rate categories; estimation of the shape parameter, proportion of invariants, and transition/transversion ratios (for nucleotides). Trees were processed (re-rooting, extracting topology, and plotting) with the Newick Utilities (Junier and Zdobnov 2010). Bootstrap values (percentage over 1000 samplings) are shown at the nodes of the trees. 90 91 92 93 94 95 96 97 98 Supplementary Figure 2. Phylogenetic reconstruction (above) and conservation profiles (below) for sequences of the stage 0 sporulation protein Spo0A. Conservation plots were made with the plotcon program from EMBOSS. This is a sliding-window program that computes a weighted average of the similarity scores for all residue pairs in each window. We used the default window size of 4 residues. 99 100 101 102 103 104 105 Supplementary Figure 3. Alignment of spo0A gene of Sulfobacillus acidophilus and Alicyclobacillus acidocaldarius Tc41 against spo0A of Bacillus subtilis 168. The two regions shown correspond to the forward primer 166f (left) and the reverse primer 748r (right) described in this study. Stars indicate 100% identity. The exclamation points highlight mismatches with the primer sequence. 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 Supplementary Figure 4. Cladogram of spo0A sequences from sediment of Lake Geneva extracted with protocols 1 (blue), 2 (yellow), and 3 (red).. The nucleotide sequences were then clustered into putative OTUs (identity of > 97%) with the pick_otus.py program from the QIIME package using the Uclust method (Caporaso, 2010 #87), and a representative was used to build the phylogeny. Phylogenies were constructed from Phylip-formatted alignments with PhyML (Guindon, 2003 #88), using default parameters. The trees were re-rooted, condensed according to DNA extraction protocol, and displayed with the Newick utilities (Junier, 2010 #4). Each branch represents a cluster of OTUs of > 97% sequence similarity. Identification of the closest relatives of the environmental sequences from the indirect extractions (protocol 3) was done by protein BLAST (Altschul et al., 1997) with the translated protein sequences using a reference database of 581 Spo0A protein sequences from the InterPro site (Mulder et al. 2002). Classes of closest relative are shown in color with indication of the identity ranges (<65% identity (-), 65-74% (<), 75-84% (~), 85-94%(#), >95% (+)). A Bacillus amyloliquefaciens, B B. methanolicus, C Geobacillus sp. (strain WCH70), D B. cereus subsp. cytotoxis (strain NVH 391-98), E B. thuringiensis , F Geobacillus thermodenitrificans (strain NG80-2), G B. atrophaeus (strain 1942), H B. subtilis, I B. mycoides, J B. pseudofirmus (strain OF4), K Lysinibacillus sphaericus (strain C3-41), L Brevibacillus laterosporus, M Brevibacillus brevis (strain 47), N Thermincola potens (strain JR), O Desulfotomaculum acetoxidans (strain ATCC 49208), P Desulfosporosinus orientis (strain ATCC 19365), Q Thermosediminbacter oceani (strain ATCC BAA-1034), R Syntrophobotulus glycolicus (strain DSM 8271), S Heliobacterium medesticaldum (strain ATCC 51547), T Clostridium clariflavum (strain DSM 19732), U B. cereus , V C. thermocellum, W C. cellulovorans (strain ATCC 35296), X C. cellulolyticum (strain ATCC 35319), Y C. botulinum , Z C. lijungdahlii (strain ATCC 55383), AA C. perfringens, AB C. sporogenes , AC Alkaliphilus metalliredigens (strain QYMF), AD A. oremlandii (strain OhILAs), AE Desulfotomaculum kuznetsovii (strain DSM 6115), AF Geobacillus sp. (strain Y412MC10), AG Paenibacillus polymyxa , AH P. mucilaginosus (strain KNP414). 129 130 131 132 133 134 135 136 137 138 139 140 141 Supplementary Figure 5. Cladogram of spo0A sequences from sediment of Lake Baikal extracted with protocols 1 (blue), 2 (yellow), and 3 (red). Each branch represents a cluster of OTUs of > 97% sequence similarity. Closest relatives are shown in letters around the tree together with identity ranges (<65% identity (), 65-74% (<), 75-84% (~), 85-94%(#), >95% (+)). For classes see legend in Figure 4 and the following: AI B. megaterium, AJ B. licheniformis , AK B. megaterium (strain DSM 319) AL C. haemolyticum, AM Paenibacillus sp. (strain JDR-2), AN B. cellulosilyticus (strain ATCC 21833), AO Sulfobacillus acidophilus (strain TPY), AP S. acidophilus (strain ATCC 700253), AQ Desulforudis audaxviator (strain MP104C), AR C. butyricum , AS C. kluyveri (strain ATCC 8527). 142 143 144 145 146 147 148 149 150 Supplementary Table 1. List of genome sequences from the 27 endospore-forming Firmicutes used in this study. Complete and draft genome sequences were downloaded from the Comprehensive Microbial Resource (CMR, 24.0 data release, cmr.jcvi.org) and Integrated Microbial Genomes (IMG, 3.0, img.jgi.doe.gov) websites. Protein and nucleotide sequences of spore-related genes were obtained by search for role category/function sporulation and germination (CMR) and sporulating (IMG). Additional information on all retrieved genomes was obtained from the GenBank database (www.ncbi.nlm.nih.gov/genome). Clas= taxonomical classification; B= Bacilli; C= Clostridia; T°= temperature range; M= mesophile; T= thermophile; P= psychrophile; H= hyperthermophile; Sp. Genes= number of sporulation-related genes. The number of sporulation-related genes was retrieved from the available genome annotations. Name Bacillus amyloliquefaciens FZB42 Bacillus anthracis A0248 Bacillus anthracis Sterne Bacillus cereus 03BB102 Bacillus cereus Zk Bacillus licheniformis ATCC 14580 (Novozymes) Bacillus pumilus SAFR-032 Bacillus subtilis 168 Bacillus thuringiensis Al Hakam Bacillus weihenstephanensis KBAB4 Geobacillus thermodenitrificans NG80-2 Lysinibacillus sphaericus C3-41 Alkaliphilus metalliredigens QYMF Alkaliphilus oremlandii OhILAs Candidatus Desulforudis audaxviator MP104C Carboxydothermus hydrogenoformans Z-2901 Clostridium beijerinckii NCIMB 8052 Clostridium botulinum A2 Kyoto-F Clostridium botulinum B Eklund 17B Clostridium difficile 630 Clostridium kluyveri DSM 555 Clostridium perfringens SM101 Heliobacterium modesticaldum Ice1 Pelotomaculum thermopropionicum SI Thermoanaerobacter pseudethanolicus ATCC 33223 Thermoanaerobacter sp. X514 Desulfotomaculum reducens MI-1 Taxon ID 326423 592021 260799 572264 288681 279010 315750 224308 412694 315730 420246 444177 293826 350688 477974 246194 290402 536232 508765 272563 431943 289380 498761 370438 340099 399726 349161 Clas. B B B B B B B B B B B B C C C C C C C C C C C C C C C Isolation Soil Human isolate Soil Human blood Wab of a dead zebra carcass Soil Soil X-ray irradiated strain Severe human tissue necrosis Soil Water in oil reservoir formation Mosquito breeding site Borax leachate ponds Freshwater USA Fracture water from a borehole Hot swamp Freshwater, Soil Infant botulism Marine sediments Clinical isolate Mud of a canal in Delft Soil Hot spring microbial mats and volcanic soil Thermophilic anaerobic sludge Thermal springs Deep subsurface sample Cr-contaminated marine sediment T° M M M M M M M M M P T M M M M H M M M M M M T T T T M Genes 3814 5418 5521 5767 5134 4420 3823 4298 5050 5983 3642 4887 5016 2951 2293 2707 5290 3978 3639 3983 4073 2748 3142 3018 2363 2467 3324 CDS 3696 5291 5287 5621 5323 4196 3729 4106 4798 5831 3471 4771 4801 2836 2239 2645 5100 3878 3527 3777 3913 2578 3001 2920 2291 2397 3324 rRNA 30 33 33 42 39 21 21 30 42 42 30 31 31 26 6 12 43 20 34 32 20 30 30 6 16 16 18 GC Perc 0.46 0.35 0.35 0.35 0.35 0.46 0.41 0.44 0.35 0.35 0.49 0.37 0.37 0.36 0.61 0.42 0.3 0.28 0.27 0.29 0.32 0.28 0.57 0.53 0.35 0.35 0.42 Sp. genes 111 190 82 183 75 69 117 129 102 115 101 96 83 75 63 61 62 98 82 63 77 67 82 66 76 79 83 Reference (Chen et al. 2007) Unpublished Unpublished Unpublished (Han et al. 2006) (Rey et al. 2004) (Gioia et al. 2007) (Kunst et al. 1997) (Challacombe et al. 2007) Unpublished (Feng et al. 2007) (Hu et al. 2008) Unpublished Unpublished (Chivian et al. 2008) (Wu et al. 2005) Unpublished Unpublished Unpublished (Sebaihia et al. 2006) (Seedorf et al. 2008) (Shimizu et al. 2002) (Sattley et al. 2008) Unpublished Unpublished Unpublished (Junier et al. 2010) Supplementary Table 2. Orthologous genes found after bi-directional BLAST of the sporulationrelated genes common to 27 genomes of endospore-forming Firmicutes. Protein lengths indicated for Bacillus subtilis as a reference were obtained from Stragier & Losick (1996). Name Stage 0 sporulation protein A Spore protease Stage V sporulation protein T Stage IV sporulation protein B Stage V sporulation protein AD Stage V sporulation protein AC 12 12 Gene symbol spo0A gpr spoVT spoIVB spoVAD spoVAC Function Length in Bacillus subtilis (aa) Global transcription regulator for sporulation 267 Degradation of the small acid-soluble spore proteins (SASPs) during germination 368 Global regulator activated by sigma G 178 Protease that activates processing of the pro-sigma K factor Potential transmembrane protein with unknown function Potential transmembrane protein with unknown function 425 150 338 References Bueche M, Wunderlin T, Roussel-Delif L, Junier T, Sauvain L, Jeanneret et al. (in press) Quantification of endospore-forming Firmicutes by qPCR with the functional gene spo0A. Appl Environ Microbiol. Challacombe JF, Altherr MR, Xie G, Bhotika SS, Brown N, Bruce D et al. (2007) The complete genome sequence of Bacillus thuringiensis Al Hakam. J Bacteriol 189 (9):3680-3681. Chen XH, Koumoutsi A, Scholz R, Eisenreich A, Schneider K, Heinemeyer I et al. (2007) Comparative analysis of the complete genome sequence of the plant growth-promoting bacterium Bacillus amyloliquefaciens FZB42. Nat Biotechnol 25 (9):1007-1014. Chivian D, Brodie EL, Alm EJ, Culley DE, Dehal PS, DeSantis TZ et al. (2008) Environmental genomics reveals a single-species ecosystem deep within earth. Science 322 (5899):275278. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32 (5):1792-1797. Feng L, Wang W, Cheng J, Ren Y, Zhao G, Gao C et al. (2007) Genome and proteome of long-chain alkane degrading Geobacillus thermodenitrificans NG80-2 isolated from a deepsubsurface oil reservoir. Proc Natl Acad Sci U S A 104 (13):5602-5607. Gioia J, Yerrapragada S, Qin X, Jiang H, Igboeli OC, Muzny D et al. (2007) Paradoxical DNA repair and peroxide resistance gene conservation in Bacillus pumilus SAFR-032. PLoS One 2 (9):e928. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic biology 52 (5):696-704. Han CS, Xie G, Challacombe JF, Altherr MR, Bhotika SS, Brown N et al. (2006) Pathogenomic sequence analysis of Bacillus cereus and Bacillus thuringiensis isolates closely related to Bacillus anthracis. J Bacteriol 188 (9):3382-3390. Hu X, Fan W, Han B, Liu H, Zheng D, Li Q et al. (2008) Complete genome sequence of the mosquitocidal bacterium Bacillus sphaericus C3-41 and comparison with those of closely related Bacillus species. J Bacteriol 190 (8):2892-2902. Junier P, Junier T, Podell S, Sims DR, Detter JC, Lykidis A et al. (2010) The genome of the Grampositive metal- and sulfate-reducing bacterium Desulfotomaculum reducens strain MI-1. Environ Microbiol. Junier T, Zdobnov EM (2010) The Newick Utilities: High-throughput Phylogenetic tree Processing in the UNIX Shell. Bioinformatics. Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33 (2):511-518. Kunst F, Ogasawara N, Moszer I, Albertini AM, Alloni G, Azevedo V et al. (1997) The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature 390 (6657):249-256. Rey MW, Ramaiya P, Nelson BA, Brody-Karpin SD, Zaretsky EJ, Tang M et al. (2004) Complete genome sequence of the industrial bacterium Bacillus licheniformis and comparisons with closely related Bacillus species. Genome Biol 5 (10):R77. Rice P, Longden I, Bleasby A (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16 (6):276-277. Sattley WM, Madigan MT, Swingley WD, Cheung PC, Clocksin KM, Conrad AL et al. (2008) The genome of Heliobacterium modesticaldum, a phototrophic representative of the Firmicutes containing the simplest photosynthetic apparatus. J Bacteriol 190 (13):46874696. Sebaihia M, Wren BW, Mullany P, Fairweather NF, Minton N, Stabler R et al. (2006) The multidrug-resistant human pathogen Clostridium difficile has a highly mobile, mosaic genome. Nature Genetics 38 (7):779-786. Seedorf H, Fricke WF, Veith B, Bruggemann H, Liesegang H, Strittimatter A et al. (2008) The genome of Clostridium kluyveri, a strict anaerobe with unique metabolic features. Proceedings of the National Academy of Sciences of the United States of America 105 (6):2128-2133. Shimizu T, Ohtani K, Hirakawa H, Ohshima K, Yamashita A, Shiba T et al. (2002) Complete genome sequence of Clostridium perfringens, an anaerobic flesh-eater. Proc Natl Acad Sci U S A 99 (2):996-1001. 13 13 Wu M, Ren Q, Durkin AS, Daugherty SC, Brinkac LM, Dodson RJ et al. (2005) Life in hot carbon monoxide: the complete genome sequence of Carboxydothermus hydrogenoformans Z2901. PLoS Genet 1 (5):e65. 14 14