1 Divergence across diet, time, and populations rules out parallel evolution in the gut 2 microbiomes of Trinidadian guppies 3 Karen E. Sullam, Benjamin E. R. Rubin, Christopher M. Dalton, Susan S. Kilham, 4 Alexander S. Flecker, Jacob A. Russell 5 6 Supplementary Text and Figures 7 8 Supplemental Methods: 9 Additional information on sample processing and sequencing 10 Whole guppies were preserved in 95% ethanol for the bacterial analysis and 11 70% ethanol for the gut length analysis and were transported to Drexel University, 12 where samples were frozen in -20 °C prior to dissections 4-14 months later. After 13 dissections, whole guts were immediately added to Mo Bio PowerBead Tubes. 14 Following dissection, a digital picture was taken of each gut for length measurement 15 using ImageJ. The filters and sediment samples were collected in sterile Whirl-Pak® 16 Sampling Bags. Following collection, samples were frozen at -20 °C for 3 months, 17 followed by storage at -80 °C for 12 months prior to extraction. 18 Eighty samples were included in three different sequencing runs at Research 19 and Testing Labs. A subset of four samples from the Guanapo 2010 survey (two HP 20 samples and two LP sample) were run independently and multiplexed with samples 21 not included in the present study. Thirty-six of the remaining 2010 survey samples 22 were run in the second run, while the final run included the 2011 survey samples, the 23 samples from the dietary experiment, and one sample from the 2010 survey that had 24 minimal coverage from the first sequencing run. 25 1 26 Additional information on dietary manipulation study 27 Males were included in the tanks because females decrease energy assimilation in the 28 absence of male guppies (Reznick 1983), but only females were used for subsequent 29 measurements. All tanks had the same male to female ratio (1:3) and all female 30 guppies were reproductive over the course of the entire experiment. 31 32 33 Additional information on classification The BLAST algorithm, using the Greengenes database as a reference, was first 34 used for taxonomic classification of representative sequences. Any reads that were 35 either classified as chloroplasts or failed to classify to bacteria were excluded from 36 further analysis. It became apparent that a number of such classifications at the 37 phylum level deviated greatly from results involving BLAST against the NCBI 38 database and from RDP classification. For this reason we utilized UCLUST (in Qiime 39 v. 1.8) to classify all but 82 of our OTUs that went unassigned using this method. 40 BLAST classification against the Greengenes database appeared to adequately 41 classify these latter OTUs, and thus our overall classification consisted of a hybrid 42 approach involving both methods. 43 44 45 Additional information on OTU genotyping We built a pipeline to find genotype variation within the 6 most dominant 46 OTUs of the dataset, from which we focused on 2 OTUs that were composed of 47 samples run in the same batch to eliminate possibly confounding batch effects (OTU 48 4447 and 5760). First, to reduce the computational load of performing alignments on 49 all sequences in each OTU, we removed non-unique sequences using the dereplicate 50 function of USEARCH v. 6.0.307 (Edgar 2010). De-replicated sequences were 2 51 aligned with MUSCLE v. 3.8.31 (Edgar 2004). We used the “–maxiters 2” option on 52 those OTUs (2023 and 4447) with very large numbers of unique sequences (>17,000) 53 as attempting full alignments caused MUSCLE to crash. Default parameters were 54 used for all other OTUs. Alignments were then repopulated with the non-unique 55 sequences removed by de-replication. To reduce problems introduced by sequencing 56 errors, homopolymers and surrounding gaps were masked from alignments before 57 further analysis. A commonly encountered suspicious alignment pattern with the 58 following three characteristics was also masked: (1) adjacent sites had one identical 59 non-gap allele, (2) one allele at one site was a gap, (3) the frequency of the gap was 60 within 10% of the frequency of the identical allele in the site that did not include a 61 gap. This pattern suggested that the two apparently variable adjacent sites were 62 actually made up of two misaligned sites (e.g. site one: -/G, site two: G/A produced 63 by MUSCLE is likely a misalignment of a correct alignment of site one: G/G, site two 64 -/A). Although there is potentially useful information in these sites, they were 65 excluded to maintain genotype quality. We extended the filter to include similarly 66 suspicious situations following the same pattern except that both bases were shared in 67 the two adjacent sites (e.g. site one: A/G, site two: G/A). This latter pattern less 68 clearly represents misalignment, but when these sites were examined more closely, 69 they were invariably surrounded by combinations of gaps and nucleotides that 70 suggested alignment error. In addition, all alignments dropped precipitously in quality 71 after several hundred bases, so each OTU alignment was examined and trimmed at the 72 length where clear misalignments became common. While these exclusions 73 potentially reduced the amount of true variation that we could identify, the indel 74 errors inherent in 454 sequencing and the difficulty of accurately aligning such large 75 numbers of sequences made this procedure necessary. We ran our pipeline separately 3 76 on each OTU to identify the appropriate quality control parameters and poorly- 77 aligned sites for exclusion. All parameters for each OTU are given in Supplementary 78 Table 4. 79 Many sequences could not be assigned an allele at every variable site, leading 80 to incomplete genotypes. These arose due to the presence of low frequency bases that 81 likely represented sequencing errors, alignment errors, or masked sequence data. 82 Therefore, when these incomplete genotypes were missing data at just a single site but 83 were otherwise identical to one and only one other complete genotype, they were 84 assigned the corresponding complete genotype. Although we could potentially be 85 collapsing unique genotypes in this way, most of these incomplete genotypes likely 86 represented the genotype to which they were assigned. At the very least, they were at 87 least more closely related to the assigned genotype than to any other. Additionally, 88 sequences with genotypes including missing data that did not meet the quality 89 requirements were discarded. Finally, genotypes present at a frequency of less than 90 0.1% across the entire dataset were also excluded to further minimize the inclusion of 91 sequence and alignment artifacts. 92 For each OTU analyzed, samples with more than 25 reads per OTU were 93 included in the genotyping analysis. For OTU 4447, 2 samples from 2011 were 94 excluded in the analysis to focus on site variation within the 2010 samples. For OTU 95 5760, 1 HP Aripo and 1 LP Marianne sample were excluded from analysis to focus in 96 on populations with n > 1. 97 98 99 100 4 101 Results: 102 Enterotyping Analysis 103 It was determined that all fish gut samples, including those from the wild and 104 the dietary study, were optimally partitioned in to six enterotypes (Supplementary 105 Table 5). These groupings appeared to correlate to the dominance of certain OTUs, 106 where the wild fish either had one of the following OTUs as a dominant bacterium 107 (OTU 2023, 4447, and 5998) or were in the sixth partition in which no bacterium had 108 a strong dominance. The two partitions that encompassed lab-reared fish were 109 dominated by two OTUs (OTU 1229 and 1106). 110 The two OTUs that dominated the lab fish and seemingly shaped their 111 enterotype grouping, were not abundant in wild fish, but were still found in the wild 112 fish. For example, 13 out of 55 wild fish (8 LP fish and 5 HP fish) harbored the 113 Spirochaeta-derived OTU 1106 that dominated guts of lab-reared LP fish. For the 114 Entomoplasmatales-derived OTU 1229 that was common in lab-reared HP fish, 6 out 115 of 55 wild fish (5 HP fish and 1 LP fish) were found to have this OTU. 116 117 OTU member genotyping 118 The genotypic, or strain, composition of two dominant OTUs showed 119 variation across one or more scales. The Marianne LP and the Quare LP differed from 120 the other sampling locations that had sufficient representation of OTU 4447, which 121 included the Aripo HP, Aripo LP, and Guanapo LP (Supplementary Figure 3A). OTU 122 5760 showed variation in rare strains between two ecotypes from separate streams 123 (Supplementary Figure 3B). 124 125 126 Reznick DN (1983). The structure of guppy life histories: the tradeoff between growth and reproduction. Ecology 64: 862-873. 5 127 128 Supplementary Figures: 129 130 131 132 133 134 135 136 137 Supplementary Figure 1: Rarefaction curves of observed species number. Analyses were performed using QIIME to characterize species richness of bacterial communities from A) guppy guts colored by stream and separated by ecotype background from the 2010 survey, and B) environmental samples from the Guanapo River. The difference in y-axis of the two graphs shows that environmental samples, particularly those from sediment, tend to harbor greater bacterial diversity than guppy guts. 138 6 139 140 141 142 143 144 145 146 Supplementary Figure 2: Principal Coordinates Analysis of guppy gut bacterial communities from the 2010 field survey based on A) unweighted UniFrac distances and B) Hellinger-transformed Bray-Curtis distances. 7 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 Supplementary Figure 3: Strain analysis of dominant OTUs with their corresponding phyla in parentheses. Analyses include A) OTU 4447, which shows differences between certain sampling locations and ecotypes, and B) OTU 5760, which differs among rare strains between the two streams and ecotypes. HP and LP are used to distinguish ecotypes, collected from High (HP) vs. Low predation (LP) habitats. Number of reads from each sequence library is listed on top of bars for each OTU. Each vetted genotype is assigned a different color and listed in each panel by different letters (See Supplementary Table 4 for information of which genotype corresponds to which letters). Stacked bar graphs show the proportion of all reads from the given OTU made up by each genotype. 8 163 164 165 166 167 168 169 170 171 172 173 Supplementary Figure 4: Size standardized gut length comparison of guppies across four streams. Size standardizations to visualize the results and account for allometry (Torres & Vanni 2007) were made by calculating the size corrected gut characteristic (i.e. length or weight) = gut characteristic/standard length of fish^(slope of all individuals’ log gut characteristic/ log standard length). Color is associated with stream of origin and asterisks indicate significant differences between HP and LP ecotypes. 9 174 175 176 177 178 179 180 181 Supplementary Figure 5: Network analysis of samples collected in the 2011 Guanapo field survey during which gut bacteria were compared to environmental samples. N= 3 for gut samples from the three different environments and N = 2 for all environmental samples, except for HP water sample, for which N = 1. The gut samples clearly separate from the environmental samples. 10