Supplementary Information (doc 747K)

advertisement
1
Supplementary Methods
2
Microbial quantification by qPCR. qPCR reactions were performed using a
3
LightCycler 480 instrument (Roche) and the KAPA SYBR® FAST qPCR Kit (Kapa
4
Biosystems). Amplification reactions were run on total DNA purified from a fecal
5
sample
6
AGAGTTTGATCCTGGCTCAG-3’) and the broad-range bacterial primer 338R (5′-
7
TGCTGCCTCCCGTAGGAGT-3′). Final assay volumes of 20l were dispensed in
8
duplicate into 96-well plates. We used an average 25 ng of genomic DNA per 20 l
9
reaction as template. Standard curves were prepared by serial dilution of the PCR
10
product of the Enterococcus faecalis 16S gene obtained using the primers described
11
above. The reaction conditions were 95°C for 10 min followed by 40 cycles of 95°C for
12
30 s, 52°C for 30 s, and 72°C for 1 min. The results were expressed as number of 16S
13
rRNA copies per ng of total DNA.
using
0.2
M
of
the
universal
bacterial
primer
E8F
(5′-
14
15
16S RNA: Phylogenetic analysis, biodiversity and clustering. 16S rRNA gene reads
16
with a low quality score (<20 out of 40 quality units assigned by the 454) and short read
17
lengths (<170 nucleotides) were removed. Potential chimeras were also removed from
18
the remaining sequences by applying the Chimera-Slayer script1 implemented by the
19
identify_chimeric_seqs.py script in the QIIME v1.6 pipeline.2 Taxonomic information
20
of the 16S rDNA sequences were obtained by comparison with the Ribosomal Database
21
Project-II (RDP)3 using the pick_otus_through_otu_table.py pipeline available in
22
QIIME v1.6.0 software. In studies based on the 16S rRNA gene, the operational
23
taxonomic units (OTUs) are the representation of the different clusters of species that
24
are sharing the same microbiome. The criteria for collapsing each of the sequences into
25
OTUs is given by the percentage of identity between the sequences, normally taken
1
26
97% similarity, is standard practice for mapping the 16S rRNA amplicon sequences to
27
its corresponding species. OTUs were created using Uclust4 and by applying a cluster
28
criterion of 97% similarity. The most representative sequence for each OTU was then
29
compared against the QIIME cluster version of the Greengenes database5 (database
30
97_otus.fasta). The annotation was accepted when the bootstrap confidence estimation
31
value was over 0.8, and the assignation was stopped at the last well-identified
32
phylogenetic level. Representative sequences were aligned with PyNAST
33
clustered version of the Greengenes database (database core_set_aligned.fasta.imputed)
34
to be used as an input to reconstruct the phylogenetic tree with the FastTree software.7
35
The genus abundance table was summarized from the resulting otu_table.txt file by the
36
script summarize_taxa_through_plots.py.
37
The Shannon index,8 the richness estimators Chao1 and ACE9 and the total number of
38
taxa were calculated to assess the OTUs and genus diversity within the community
39
using the alpha_diversity.py script from the QIIME v1.6 pipeline in the case of the
40
OTU and the “diversity” function from the R package Vegan (Version 2.0-9) for the
41
genus level. The OTUs rarefaction analyses were performed with the alpha_diversity.py
42
script and the same diversity indexes by implementing 80 rarefactions per step.
43
The clustering analysis of the samples was performed with the total OTU table and the
44
table summarized at the genus level using the statistical package R (version 3.0.1) as
45
described by Arumugam et al.
46
(PAM) algorithm (library “cluster”, function “pam”) was used to identify the potential
47
cluster in our dataset by testing 4 different distances: Bray-Curtis12 (library “Vegan”,
48
function “vegdist”), Jensen-Shanon divergence13,14 (library “phyloseq”, function
49
“distance”), Jensen-Shannon distance,15,16 (calculated according to Arumugam et al.10)
50
and weighted Unifrac17 (implemented by the beta_diversity.py script in the QIIME 1.6
10
6
against the
and Koren et al.11 The Partitioning Around Medoids
2
51
pipeline) Weighted Unifrac was used only for the OTUs cluster analysis. The optimal
52
cluster configuration was defined as the distance that maximized the silhouette index
53
(library “cluster”, function “silhouette”); enhance the variance explained by the first
54
component of the Principal Coordinates Analysis (PCoA) (function “dudi.pco”
55
function“ad4”) and the distances that were based on ecological or phylogenetic
56
principles. The samples were plotted as a scatter diagram, and the clusters created by the
57
PAM algorithm (function “s.class” function“ad4”) of the distance that enhances the
58
values from cluster configuration were set as a factor. Clusters were validated by
59
applying the permutational multivariate analysis of variance using distance matrices
60
(ADONIS test) based on the weighted Unifrac and Brays-Curtis distances and default
61
999 permutations.
62
63
Markers of adaptive immune activation
64
sj/β-TREC ratio quantification. The six DβJβ-TRECs from cluster one were amplified
65
together in the same PCR reaction tube: the sj-TREC was amplified in a different PCR
66
reaction tube. Twenty-one amplification rounds were performed to guarantee an
67
accurate quantification at the real time PCR step. All amplicons (DβJβ- and sj-TRECs)
68
were then amplified together in a second PCR round using a LightCycler® 480 system
69
(Roche, Mannheim, Germany). Six microliters of a 1:10 mixed dilution of the first
70
round PCR were amplified in a 20 μL final volume. Specific Förster Resonance Energy
71
Transfer (FRET) specific probes for the sj-17 and the DβJβ-TRECs18 were used.
72
73
Additional statistical methods. Between-group comparisons of continuous variables
74
were analyzed using the Wilcoxon rank-sum test with a significance level of <0.05. For
75
the microbiota analysis, differences in the Shannon diversity index and richness
3
76
estimators (Chao1 and ACE) were analyzed using the same test. The values were
77
expressed as mean ± standard deviation (SD). All the p-values were adjusted using the
78
Benjamini-Hochberg correction (library “stats”, function “p.adjust”).
79
80
References
81
1.
82
83
and 454-pyrosequenced PCR amplicons. Genome Res. 21, 494–504 (2011).
2.
84
85
3.
Cole, J. R. et al. The Ribosomal Database Project: improved alignments and new
tools for rRNA analysis. Nucleic Acids Res. 37, 141–145 (2009).
4.
88
89
Caporaso, J. et al. QIIME allows analysis of high-throughput community
sequencing data. Nat. Methods 7, 335–336 (2010).
86
87
Haas, B. J. et al. Chimeric 16S rRNA sequence formation and detection in Sanger
Edgar, R. C. Search and clustering orders of magnitude faster than BLAST.
Bioinformatics 26, 2460–2461 (2010).
5.
DeSantis, T. Z. et al. Greengenes, a chimera-checked 16S rRNA gene database
90
and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072
91
(2006).
92
6.
93
94
95
Caporaso, J. G. et al. PyNAST: a flexible tool for aligning sequences to a
template alignment. Bioinformatics 26, 266–267 (2010).
7.
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2--approximately maximumlikelihood trees for large alignments. PLoS One 5, e9490 (2010).
4
96
8.
97
98
379–423 (1948).
9.
99
100
Chao, A., Hwang, W., Chen, Y. & Kuo, C. Estimating the number of shared
species. Stat. Sin. 10, 227–246 (2000).
10.
101
102
Shannon, C. A Mathematical Theory of Communication. Bell Syst. Tech. J. 27,
Arumugam, M. et al. Enterotypes of the human gut microbiome. Nature 473,
174–180 (2011).
11.
Koren, O. et al. A guide to enterotypes across the human body: meta-analysis of
103
microbial community structures in human microbiome datasets. PLoS Comput.
104
Biol. 9, e1002863 (2013).
105
12.
106
107
Wisconsin. Ecological Monograph. Ecol. Monogr. 27, 325–349 (1957).
13.
108
109
Bray, J. R. & Curtis, J. T. An ordination of upland forest communities of southern
Schütze, H. & Manning, C. Elements of Information Theory. 304 (The MIT
Press, 1999).
14.
Dagan, I., Lee, L. & Pereira, F. Similarity-based Methods for Word Sense
110
Disambiguation. in Proc. Eighth Conf. Eur. Chapter Assoc. Comput. Linguist.
111
56–63 (Association for Computational Linguistics, 1997).
112
doi:10.3115/979617.979625
113
114
15.
Low, M. G. et al. A new metric for probability distributions. Inf. Theory, IEEE
Trans. 49, 1858–1860 (2003).
5
115
16.
Osterreicher, F. & Vajda, I. A new class of metric divergences on probability
116
spaces and its applicability in statistics. Ann. Inst. Stat. Math. 55, 639–653
117
(2003).
118
17.
119
120
121
Lozupone, C. & Knight, R. UniFrac: a New Phylogenetic Method for Comparing
Microbial Communities. Appl. Environ. Microbiol. 71, 8228–8235 (2005).
18.
Dion M.L. et al. HIV infection rapidly induces and maintains a substantial
suppression of thymocyte proliferation. Immunity 21,757–768 (2004).
6
122
Supplementary Figures
123
Figure S1 Average silhouette index. Average silhouette index from all the possible
124
numbers of cluster configurations within the genus (a) and OTUs (b). The Bray-Curtis
125
index (red squares), the Jensen−Shannon distance (green diamonds) and the
126
Jensen−Shannon divergence (black triangles) were tested for both taxonomical levels in
127
order to ascertain the distance that maximizes the average silhouette index. Since the
128
weighted Unifrac distance (blue circles) could only be computed by estimating a
129
phylogenetic tree and was incompatible for use at higher levels, it was analyzed only at
130
the OTU level.
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
7
147
Figure S2 Microbiota comparison between HIV+ART and uninfected subjects.
148
PCoA of the bacterial composition in controls (blue dots) and cases (red dots) at genus
149
level. The stars in blue and red correspond to the medoid retrieved from the PAM
150
algorithm for each cluster. The centroid is represented by a capital letter (C for controls
151
and H for cases), whilst the blue and red ellipses represent 95% of the samples
152
belonging to each condition. Each point contains a halo proportional to its silhouette
153
index value: as larger is the halo, more dissimilar is the element to its corresponding
154
object.
155
8
156
Figure S3 Heat map of the samples at genus level. HIV+ subjects are marked in red
157
and controls in blue. The top dendogram is divided in two main sub trees highlighted in
158
red or blue, according to the predominance of samples from HIV+ individuals or
159
controls, respectively. In the heat map the percentage range of sequences assigned to
160
main taxa (abundance >1% in at least one sample) is represented by a color gradient.
161
9
162
Figure S4 Total Bayesian network. Network represents the relationships between
163
genus abundance (blue ellipses), pathway abundance (green ellipses) and markers of
164
adaptive immunity, thymic function, and bacterial translocation (pink ellipses). Arrows
165
indicate conditional dependencies between variables. The Spearman correlation
166
coefficient
is
indicated
next
10
1
0
to
the
lines.
167
Supplementary Tables
168
Supplementary Table 1. Clinical variables of participants.
169
p-valuea
q-valuef
48.5 (31-54)
0.76
0.97
3/12
7/8
0.13
0.33
Hypertensive (Y/N)
1/14
2/11
0.99
0.99
Smoker (Y/N)
7/8
2/12
0.99
0.99
24.5 (23.2-24.7)
23.5 (21.2-28.3)
0.63
0.92
Framingham risk score (%)
4.5 (1-7)
2 (1-6)
0.12
0.33
Time from HIV diagnosis to
initiation of ART (months)
14 (3-25)
NA
-
Cases
Controls
N = 15
N = 15
43 (34-48)
Sex ratio (F/M)
Clinical characteristics
Age
Body mass index (kg/m2)
-
Time on HIV suppression (months)
74 (52-113)
NA
-
-
Nadir CD4+ T cell count (cells/µL)
203 (127-284)
NA
-
-
CD4+ T cell count (cells/µL)
584 (466-794)
762 (645-927)
-
-
1.2 (0.9-1.3)
1.5 (1.2-1.9)
-
-
203 (127-284)
NA
-
-
91 (81-96)
89 (86-95)
0.84
0.97
1.0 (0.9-1.1)
0.9 (0.7-1.0)
0.1
0.31
Total cholesterol (mg/dL)
190 (169-214)
201 (157-230)
0.62
0.92
LDL cholesterol (mg/dL)
106 (97-124)
114 (81-136)
0.87
0.97
HDL cholesterol (mg/dL)
55 (50-63)
56 (49-75)
0.56
0.92
Triglycerides (mg/dL)
106 (78-155)
75 (61-176)
0.38
0.71
25-hidroxy-vitamin D (mg/dL)
28.2 (21.7-36)
28.4 (21.9-33.9)
0.93
0.99
0.18 (0.06-0.47)
0.08 (0.04-0.29)
0.24
0.54
CD4/CD8 ratio
Nadir CD4+ T cell count (cells/µL)
Metabolic profile in plasma
Glucose (mg/dL)
Creatinine (mg/dL)
Markers of innate immunity
Inflammation
hs-CRPb (mg/L)
11
IL6c (pg/mL)
2 (2-2)
2 (1-2.6)
0.51
0.89
199 (168-301)
212 (122-304)
0.87
0.97
1663 (1483-1958)
1439.5 (12631516)
0.05
28.3 (3.0-113.6)
10.0 (4.9-12.3)
0.66
0.92
0.97 (0.89-1.12)
1.10 (1.05-1.10)
0.25
0.54
2.2 (1.8-2.6)
1.1 (0.6-1.2)
<0.001
0.01
%CD38+
16.1 (14.3-22.8)
11.6 (10.7-13.2)
<0.001
0.01
%CD25+
4.3 (3.7-6.6)
2.8 (1.9-4.1)
0.01
0.06
%CD57+
5.7 (4.0-9.5)
2.5 (1.6-5.7)
0.04
0.19
%HLADR+CD38+
3.6 (2.5-7.1)
1.5 (1.1-1.7)
<0.001
0.01
%CD38+
7.3 (6.1-12.9)
5.4 (4.0-8.4)
0.01
0.06
%CD25+
0.4 (0.3-0.7)
5.4 (4.0-8.4)
0.29
0.58
%CD57+
26.5 (17.5-41.8)
23.1 (15.8-43.8)
0.77
0.97
5.7 (0-13.6)
18-5 (3.2-57.8)
0.06
0.21
Thrombosis
Dimers-D (ng/mL)
Bacterial translocation
sCD14d (ng/mL)
BPIe (ng/mL)
0.20
Endothelial function
ADMA (µM/L)
Markers of adaptive immunity
T cell markers
CD4+ T cells
%HLADR+CD38+
CD8+ T cells
Thymic function
sj/β-TREC ratio
170
171
All values are expressed as median (P25-P75)
Analysis was performed using a Wilcoxon rank-sum test. P is probability at α=0.05.
172
a
173
b
174
c
Interleukin-6.
175
d
Soluble CD14.
176
e
Bactericidal-permeability increasing protein.
High-sensitivity C reactive protein.
1
2
177
f
p-value adjusted according to the Benjamini-Hochberg method.
1
3
178
179
Supplementary Table 2. Diversity parameters of microbiota
Patients on HAART a
Controls a
p-value b
q-value c
OTU level
Shannon index
5.96 ± 1.03
7.00 ± 0.51
0.01
0.04
Chao1 estimator
567.69 ± 175.21
776.27 ± 166.63
0.02
0.05
Ace estimator
563.69 ± 176.44
794.90 ± 172.46
0.01
0.04
Shannon index
1.82 ± 0.32
2.03 ± 0.23
0.43
0.60
Chao1 estimator
29.28 ± 7.22
27.39 ± 6.03
0.62
0.72
ACE estimator
30.57 ± 7.56
29 ± 5.68
0.86
0.86
0.03
0.05
Genus level
Bacterial density
Number of 16S RNA
gene copies/ngDNA
1451434.41 ± 899075.31
762212.50 ± 317670.42
180
181
a
182
b
183
c
Values are expressed as mean ± standard deviation (SD).
Analysis was performed using a Wilcoxon rank-sum test. P is probability at α=0.05.
p-value adjusted according to the Benjamini-Hochberg method.
184
1
4
185
Supplementary Table 3. LEfSe biomarker statistics for KEEG pathways
Condition
Control
Biomarker pathway
Starch and sucrose
LogLDA
p-valuea
%Control
%Case
coverageb
coveragec
3.08
0.03
51.04
47.92
2.96
0.01
60.44
50.55
2.94
0.00
44.26
29.51
2.89
0.04
18.18
8.08
2.87
0.04
78.95
65.79
2.87
0.04
45.88
42.35
2.75
0.01
34.69
24.49
2.66
0.04
51.28
56.41
2.63
0.01
7.14
5.36
metabolism
[PATH:ko00500]
Control
Glycolysis /
Gluconeogenesis
[PATH:ko00010]
Control
Valine, leucine, and
isoleucine degradation
[PATH:ko00280]
Control
Lysosome
[PATH:ko04142]
Control
Pyruvate metabolism
[PATH:ko00620]
Control
Glycine, serine, and
threonine metabolism
[PATH:ko00260]
Control
Fatty acid metabolism
[PATH:ko00071]
Control
Histidine metabolism
[PATH:ko00340]
Control
PPAR signaling pathway
[PATH:ko03320]
1
5
Control
Ascorbate and aldarate
2.59
0.01
32.43
24.32
2.55
0.00
16.92
15.38
2.55
0.01
9.68
6.45
2.50
0.01
23.33
20.00
2.49
0.01
28.57
14.29
2.47
0.01
24.00
8.00
2.47
0.01
18.75
6.25
2.47
0.03
17.24
11.49
2.44
0.00
10.53
10.53
2.40
0.02
2.44
2.44
2.30
0.02
18.75
15.63
metabolism
[PATH:ko00053]
Control
Tryptophan metabolism
[PATH:ko00380]
Control
Polycyclic aromatic
hydrocarbon degradation
[PATH:ko00624]
Control
Lysine degradation
[PATH:ko00310]
Control
Caprolactam degradation
[PATH:ko00930]
Control
Dioxin degradation
[PATH:ko00621]
Control
Xylene degradation
[PATH:ko00622]
Control
Benzoate degradation
[PATH:ko00362]
Control
Steroid hormone
biosynthesis
[PATH:ko00140]
Control
MAPK signaling
pathway - yeast
[PATH:ko04011]
Control
Naphthalene degradation
1
6
[PATH:ko00626]
Control
Phosphonate and
2.30
0.03
17.65
14.71
2.29
0.03
25.00
25.00
2.21
0.04
37.50
37.50
3.19
0.01
52.78
40.28
3.15
0.00
58.82
44.12
2.87
0.00
53.73
53.73
2.87
0.01
13.95
20.93
2.87
0.00
7.35
7.35
2.79
0.04
41.67
33.33
phosphinate metabolism
[PATH:ko00440]
Control
Proximal tubule
bicarbonate reclamation
[PATH:ko04964]
Control
Geraniol degradation
[PATH:ko00281]
Case
Ribosome
[PATH:ko03010]
Case
Lipopolysaccharide
biosynthesis
[PATH:ko00540]
Case
Phenylalanine, tyrosine,
and tryptophan
biosynthesis
[PATH:ko00400]
Case
Vibrio cholerae
pathogenic cycle
[PATH:ko05111]
Case
Legionellosis
[PATH:ko05134]
Case
Terpenoid backbone
biosynthesis
[PATH:ko00900]
1
7
Case
Fatty acid biosynthesis
2.68
0.04
53.33
46.67
2.68
0.03
53.66
53.66
2.59
0.01
59.09
63.64
2.52
0.04
36.36
25.00
2.47
0.01
12.50
12.50
2.47
0.01
17.65
11.76
[PATH:ko00061]
Case
Nicotinate and
nicotinamide metabolism
[PATH:ko00760]
Case
Thiamine metabolism
[PATH:ko00730]
Case
Ubiquinone and other
terpenoid-quinone
biosynthesis
[PATH:ko00130]
Case
Zeatin biosynthesis
[PATH:ko00908]
Case
Toluene degradation
[PATH:ko00623]
186
a
Analysis was performed using a Wilcoxon rank-sum test. P is probability at α=0.05.
187
b
The percentage of control coverage was calculated as the observed number of KOs per
188
pathway divided by the total number of KOs for each condition.
189
c
190
divided by the total number of KOs for each condition.
The percentage of case coverage was calculated as the observed number of KOs per pathway
1
8
Download