1 Supplemental Information: 2 Host Genetic and Environmental Effects on Mouse Intestinal Microbiota 3 4 5 James H. Campbell1, Carmen M. Foster1, Tatiana Vishnivetskaya1,2, Alisha G. Campbell1,3, 6 Zamin K. Yang1, Ann Wymore1, Anthony V. Palumbo1, Elissa J. Chesler3,4 and Mircea 7 Podar1,3* 8 9 1 10 2 11 3 Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA 37831 Center for Environmental Biotechnology, University of Tennessee, Knoxville, TN, USA 37996 Genome Science and Technology Program, University of Tennessee, Knoxville, TN, USA 12 13 37996 4 The Jackson Laboratory, Bar Harbor, ME, USA 04609 14 15 16 Subject Category: Microbe-microbe and microbe-host interactions 17 18 Running Title: Genetic effects on mouse gut microbiota 19 20 21 22 23 24 25 26 27 28 29 30 *Corresponding Author Mircea Podar Oak Ridge National Laboratory Biosciences Division Oak Ridge National Laboratory Oak Ridge, TN 37831 Phone: (865) 576-6144 Fax: (865) 576-8646 Email: podarm@ornl.gov 31 1 32 Supplemental Methods 33 Mice. Mice were fed an irradiated rodent diet (Purina 5053 or 5058) and received 100% fresh 34 air processed into the facility through 95% efficient filters (hospital grade). Air being 35 introduced into the primary enclosure came from the room, and was first passed through a 36 roughing filter, then a High Efficiency Particulate Air (HEPA) filter and was exhausted via 37 the facility exhaust system. In the primary barrier, all mice were housed in ventilated racks 38 manufactured by Thoren Caging or Animal Care Systems (ACS). Thoren cages were under 39 positive pressure created by a supply air blower motor (50 air exchanges per hour). ACS 40 cages were under negative pressure created by the facility exhaust system (20-30 changes per 41 hour). ACS cages were exhausted through a HEPA-like filter in front of the cage, and had a 42 solid lid, designed such that unfiltered room air cannot enter the cage. All cages were opened 43 under a NuAire or Baker Laminar Flow work Station. 44 Mice were euthanized by cervical dislocation at the same time each day, and a 12-cm- 45 long segment of jejunum starting 5 cm distally from the ligament of Trietz was dissected. The 46 cecum was included in the dissection. Jejunum sections were flushed with ice-cold PBS, 47 placed in RNA later and stored for other analyses. Cecum contents were extruded manually, 48 snap frozen in liquid nitrogen and stored at -80°C. Cecum tissue was flushed with ice cold 49 saline and the tissue either snap frozen or stored in RNAlater (Ambion) for gene expression 50 analyses (to be reported elsewhere). All procedures were approved by the ORNL and 51 University of Tennessee Animal Care and Use Committees. 52 53 Extraction of Microbial Genomic DNA. Microbial genomic DNA (gDNA) was extracted 54 from mouse cecum contents using a protocol based on that used by Ley et al (Ley et al 2008). 55 Approximately 100 mg of cecum contents was added to a 2-ml, screw-capped tube containing 56 1 g of silica/zirconia beads (0.1 mm; BioSpec Products; Bartlesville, OK), 500 µl of 57 phenol:chloroform:isoamyl alcohol (25:24:1) and 210 µl of 20% SDS. Headspace was filled 2 58 with cold DNA extraction buffer (200 mM Tris at pH 8, 200 mM NaCl, 20 mM EDTA). Bead 59 tubes were attached to a MoBio vortex adapter and shaken horizontally at high speed for 10 60 min. Aqueous phase was washed three times with phenol:chloroform:isoamyl alcohol 61 (25:24:1) in phase gel lock tubes (Qiagen; Valencia, CA). Nucleic acids were precipitated 62 with 1 vol ammonium acetate (7.5 M), 2 vol isopropanol and incubation at -20C for at least 1 63 hr. Precipitated nucleic acids were concentrated by centrifugation at 15,000 g for 15 min then 64 dissolved in TE buffer. RNase A digestion (100 U) was performed for 30 min at 37C. 65 Genomic DNA was precipitated with 0.1 vol sodium acetate (3 M, pH 5.5) and 3 vol ethanol 66 and incubation at -20C for at least 1 hr. Again, DNA was concentrated by centrifugation at 67 15,000 g for 15 min, pellets were washed twice with 70% ethanol, air dried and dissolved in 68 PCR-grade water. Mock extractions without cecum contents were used as negative controls. 69 70 Preparation and Pyrosequencing of SSU rRNA gene Amplicon Libraries. Amplicon 71 libraries of both V1-2 and V4 regions of 16S SSU rRNA gene were obtained using similar 72 methods. Amplification of the V1-2 region was performed in 50-µl reactions composed of 1× 73 polymerase buffer (Invitrogen; Carlsbad, CA), 200 µM each dNTP, 3 mM MgSO4, 300 nM of 74 forward primer (MWG Operon; Huntsville, AL), 300 nM reverse primer mix (MWG Operon), 75 1 U of Platinum® Taq DNA Polymerase High Fidelity enzyme (Invitrogen) and 100 ng of 76 gDNA. We used a modification of the 27F primer (Frank et al 2008) fused to 6-nucleotide 77 multiplexing 78 GCCTCCCTCGCGCCATCAGxxxxxxGTTTGATCMTGGCTCAG-3’), where the x region 79 represents the multiplexing tag and the SSU rRNA primer is bold, and a single reverse primer 80 (5’- GCCTTGCCAGCCCGCTCAGCTGCTGCCTYCCGTA-3’) modified from 342R 81 (Weisburg et al 1991). Each amplification began with a denaturation step of 94C for 2 min 82 followed by 25 amplification cycles of 94C for 20 sec, 53C for 30 sec and 68C for 45 sec. A tags and to the 454 3 FLX sequencing primer A (5’- 83 final extension at 68C for 3 min followed amplification cycles. V4 amplicons were generated 84 in 50-µl reactions composed of 1× AccuPrime Pfx reaction mix (Invitrogen), 300 nM forward 85 primer (Integrated DNA Technologies; Coralville, IA), 300 nM reverse primer mix (IDT), 1.5 86 U AccuPrime Pfx polymerase (Invitrogen) and 100 ng gDNA. We used barcoded forward 87 primers (5’-GCCTCCCTCGCGCCATCAGxxxxxxAYTGGGYDTAAAGNG-3’) and a mix 88 of reverse primers (the FLX B adaptor sequence 5’-GCCTTGCCAGCCCGCTCAG fused to 89 the rRNA gene sequences TACCRGGGTHTCTAATCC, TACCAGAGTATCTAATTC, 90 CTACDSRGGTMTCTAATC or TACNVGGGTATCTAATCC-3’ in a 6:1:2:12 ratio, 91 respectively), , designed to cover most of the Bacteria domain (Cole et al 2009). Thermal 92 profiles consisted of a denaturation at 95C for 2 min followed by 27 amplification cycles of 93 95C for 15 s, 55C for 30 s and 68C for 45 s. A final extension at 68C for 3 min followed 94 amplification cycles. 95 All amplicons were visualized on agarose gels for quality and subsequently purified 96 from amplification reactions using Agencourt AMPure reagents (Beckman Coulter; Danvers, 97 MA). A final check of amplicon quality and quantity was performed on an Agilent 98 Bioanalyzer (Santa Clara, CA) using DNA 1000 reagents. Sequencing was performed on a 99 454-FLX 100 instrument (Roche; Indianapolis, IN) following the manufacturer’s recommendations. 101 Sequences were extracted from raw FASTA files using the RDP’s Pipeline Initial 102 Process. V4 amplicons were quality controlled by passing both forward and reverse primers 103 (two mismatches each), a minimum sequence length of 200 nt and no ambiguous base calls. 104 V1-2 amplicons were generally too long to completely sequence using FLX chemistry; 105 therefore, the only the forward primer was used for data processing, with minimum sequences 106 lengths of 200 nt and no mismatches allowed. Sequence yield and cocaging specifications are 107 summarized in Table S1. 4 108 OTU-Based Sequence Analysis. Initially, mothur (V1.11.0) was used to further screen 109 sequences for each SSU rRNA gene region. Sequences with homopolymers longer than 8 nt 110 were removed. Remaining sequences were aligned to the SILVA database using a 111 Needleman-Wunsch method, and those mapping to incorrect regions of the alignment were 112 also removed from the dataset. Then, the mothur implementation of ChimeraSlayer (Haas et 113 al 2011) was used to detect potentially chimeric sequences. These steps resulted in quality 114 controlled sequence sets containing unequal numbers of sequences for individual mice. To 115 control for unequal sequence coverage among individuals, sequences were separated into 116 individual samples and equally subsampled to the minimum sequence number using the Perl 117 script daisychopper.pl (http://www.genomics.ceh.ac.uk/GeneSwytch/Tools.html). 118 Amplicon libraries from both hypervariable regions of SSU rRNA gene were subject 119 to stringent quality control procedures that reduced the number of sequences analyzed (Table 120 S1). V1-2 region sequence numbers were reduced by 15% during this screening (from 121 345,742 to 293,928), with the majority (87.0%) of the purged sequences identified as 122 chimeric. V4 region sequence numbers were reduced by 26% during screening (from 819,554 123 to 605,397), with 99.7% of the eliminated sequences being identified as chimeric. Mean 124 sequences per mouse were 4982 for V1-2 and 6640 for V4 libraries. Equal subsampling for 125 UniFrac analyses was based upon the minimum number of sequences observed for any mouse 126 in each library, thus reducing V1-2 libraries to 1557 sequences per mouse and V4 libraries to 127 3128 sequences per mouse. 128 Identification of OTUs was performed in mothur (Schloss et al 2009) for each SSU 129 rRNA gene region with the general approach of Huse et al. (Huse et al 2010). Remaining 130 sequences were pre-clustered in mothur using “diffs=1”, and a distance matrix was calculated 131 for pre-clusters. Data were then clustered using an average-neighbor method. 132 5 133 Phylogeny-Based Sequence Analysis. Representatives of each OTU were collected for both 134 V1-2 and V4 regions at genetic distances of 0.03 and 0.05 using mothur and aligned using the 135 RDP aligner. Phylogenetic trees were constructed by both neighbor-joining (Jukes-Cantor 136 distances) in Geneious v5.4 or by maximum likelihood in RAxML-7.04 as described in Flores 137 et al. 2011 (Flores et al 2011). These trees and their originating sequences, as well as a 138 general SSU rRNA bacterial reference tree (greengenes.lbl.gov), were used for mapping the 139 entire V4 and V12 sequence datasets or equally subsampled versions of them for unweighted 140 Fast UniFrac analysis (Hamady et al 2009). Comparisons between the different types of trees 141 and datasets at different genetic distances were made to evaluate the level of explained 142 variation in the Principal Coordinates Analysis (PCoA) analysis and the intrastrain and 143 interstrain differences. Final plots for all analyses were produced with Matlab, and trees were 144 visualized and annotated using FigTree (v1.3.1). Data were also analyzed with respect to 145 taxonomic affiliation of the SSU rRNA gene fragments using the RDP Classifier set at 80% 146 confidence threshold. 147 148 149 150 151 152 153 154 155 156 157 6 158 SUPPLEMENTAL LITERATURE CITED 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ et al (2009). The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37: D141-145. Flores GE, Campbell JH, Kirshtein JD, Meneghin J, Podar M, Steinberg JI et al (2011). Microbial community structure of hydrothermal deposits from geochemically different vent fields along the Mid-Atlantic Ridge. Environmental Microbiology 13: 2158-2171. Frank JA, Reich CI, Sharma S, Weisbaum JS, Wilson BA, Olsen GJ (2008). Critical Evaluation of Two Primers Commonly Used for Amplification of Bacterial 16S rRNA Genes. Appl Environ Microbiol 74: 2461-2470. Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G et al (2011). Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Research 21: 494-504. Hamady M, Lozupone C, Knight R (2009). Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data. ISME J. Huse SM, Welch DM, Morrison HG, Sogin ML (2010). Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environmental Microbiology 12: 18891898. Ley RE, Hamady M, Lozupone C, Turnbaugh PJ, Ramey RR, Bircher JS et al (2008). Evolution of mammals and their gut microbes. Science 320: 1647-1651. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB et al (2009). Introducing mothur: Open Source, Platform-independent, Community-supported Software for Describing and Comparing Microbial Communities. Appl Environ Microbiol. Weisburg WG, Barns SM, Pelletier DA, Lane DJ (1991). 16S ribosomal DNA amplification for phylogenetic study. J Bacteriol 173: 697-703. 195 196 197 198 199 200 7 201 TITLES AND LEGENDS OF SUPPLEMENTARY FIGURES 202 Figure S1. A. Diagram of the experimental design for mouse housing, sibling information 203 and identity of each mouse used in the study, with strain and sex information. B. Overall 204 workflow for the cecum microbiota characterization. 205 206 Figure S2. Taxonomic diversity detected in amplicon libraries of both primer pairs. Each bar 207 represents the mean percentage (± SEM) of each phylum across all mice surveyed. Phyla 208 represented by fewer than 100 sequences were omitted from this graph. The y-axis is log- 209 scaled to better depict low-abundance phyla. 210 211 Figure S3. PCoA representation of UniFrac distances (0.05 genetic distance) from the V1-2 212 hypervariable region of SSU rRNA gene. 213 subsampled randomly for equal coverage across all mice (n = 59). Samples were analyzed with sequences and 214 215 Figure S4. PCoA representation of UniFrac distances (0.05 genetic distance) from the V4 216 hypervariable region of SSU rRNA gene. Samples were analyzed with all sequences and 217 subsampled randomly for equal coverage across all mice (n = 94). 218 219 Figure S5. Hierarchical clustering (UPGMA) representation of OTU-based clustering (0.03 220 genetic distance) of data from the V1-2 hypervariable region of SSU rRNA gene. Counts of 221 each OTU within each mouse (n = 59) were standardized to percentage, square-root 222 transformed and a Bray-Curtis similarity matrix was calculated. 223 224 8 225 Figure S6. Hierarchical clustering (UPGMA) representation of OTU-based clustering (0.03 226 genetic distance) of data from the V4 hypervariable region of SSU rRNA gene. Counts of 227 each OTU within each mouse (n = 94) were standardized to percentage, square-root 228 transformed and a Bray-Curtis similarity matrix was calculated. 229 230 Figure S7. 231 genetic distance) of data from the V1-2 hypervariable region of SSU rRNA gene. Sequences 232 were subsampled randomly for equal coverage across all mice (n = 59). Jackknifed hierarchical clustering representation of UniFrac distances (0.05 233 234 Figure S8. 235 genetic distance) of data from the V4 hypervariable region of SSU rRNA gene. Sequences 236 were subsampled randomly for equal coverage across all mice (n = 94). Jackknifed hierarchical clustering representation of UniFrac distances (0.05 237 238 Figure S9. Box-and-whisker plot showing the effects of maternal lineage on gut bacterial 239 communities within mouse strains. Distributions were formed by parsing strainwise data 240 from the larger Bray-Curtis dissimilarity matrix (V4 only) of mouse-by-mouse comparisons. 241 To illustrate the effects of maternal lineage, intrastrain dissimilarities were separated into two 242 groups: 1) pairwise distances of siblings and 2) pairwise distances of all non-siblings. 243 Distributions of non-siblings were plotted and distances of siblings were superimposed onto 244 these distributions (*). Each maternal lineage is represented by a different color within each 245 strain. Outliers are denoted by red plus characters (+). 246 247 248 249 9 250 Figure S10. Box-and-whisker plot showing the effects of cohabitation on gut bacterial 251 communities within mouse strains. Distributions were formed by parsing strainwise data 252 from the larger Bray-Curtis dissimilarity matrix (V4 only) of mouse-by-mouse comparisons. 253 To illustrate this effect, intrastrain dissimilarities (Bray-Curtis) were separated into two 254 groups: 1) pairwise distances of co-caged mice and 2) pairwise distances from all mice not co- 255 caged. Distributions were plotted only for mice that were not co-caged, and distances of co- 256 caged mice were superimposed onto these distributions (*). Each group of co-caged mice is 257 represented by a different color within each strain. Outliers are denoted by red plus characters 258 (+). 10