Supplementary File S1: Sampling depth analysis

advertisement
Supplementary File S1: Sampling depth analysis
Material and Methods
In order to assess the impact of sequencing depth on our reaction number estimates, we down-sampled
(rarefied) the MetaHit samples to 5.5*106 reads per MetaHit sample using a custom Perl script.This
rarefaction threshold was chosen so that the least deep sequenced sample (6.2*106reads) could be
included. From this rarefied read abundance matrix the number of reactions was re-calculated using
previously described methodology, termed as “rarefied Reactions”. Gene Richness per sample was
calculated as the number of genes present in the rarefied abundance matrix. All statistical tests were
calculated with R (http://www.r-project.org/) version 2.14.
Reaction numbers remain stable on rarefied data
The number of detected reactions per sample remains, in a relative context, stable whether we used a
rarefied input matrix or a non-rarefied matrix to calculate reaction numbers. The correlation shows that
these two methods of estimating reaction numbers yield almost identical results (rho = 0.987, p-value <
2.2e-16)and this correlation is shown in Fig. 1.
Figure 1 Relation between rarefied and non-rarefied reaction number estimates. Each dot is one sample, samples are colored
by Cluster identity, with red = low, green = mid and black = high. The correlation is significant and shows that the rarefied
data shows similar trends in reaction numbers as the non-rarefied data.
Reaction numbers are not correlated to Sequencing depth
To further control that our estimated reaction numbers are not correlated to sequencing depth, we
controlled the relation between Sample Sequencing depth and the number of estimated reactions.
Sequencing depth shows no relation to the number of reactions, as would be expected given the
rarefactions (R2=0.0195, p= 0.15, pearsoncorrelation test). This relation is shown in Fig. 2.
Figure 2 Relation between sample sequencing depth and the number of reactions per sample.Sample colors are as in Fig. 1.
Furthermore, no significant association between the three metabolic groups and Sampling depth could
be observed (p=0.29, Kruskal-Wallis test, Fig. 3).
Overall we find no evidence, that sample sequencing depth has a biased effect on the number of
detected reactions per sample.
Sample Gene richness does not correlate to Sample Reaction Number
We tested the correlation between richness and reaction number, but could not find a significant
correlation (R2=0.013, p= 0.235, spearman correlation test). The correlation is shown in Fig.
4.
Figure 3 Correlation between Sample richness and rarefied reactions numbers is not significant. Sample colors are as in Fig. 1.
The three metabolic groups are not significantly different to each other in terms of sample richness
(p=0.46, Kruskal Wallis test). The apparent independence of sample gene richness and metabolic
reactions reflects the relatively specific subset of function our study focuses on. Furthermore, a high
richness microbiota does not necessarily provide relevant functions to the host, an important insight in
this context.
Download