Supplementary information SI Materials and Methods

advertisement
1
Supplementary information
2
SI Materials and Methods
3
Subject’s selection
4
BV status was assessed using the Amsel clinical criteria for all subjects [1] and confirmed
5
using Gram-stain criteria (Nugent scores) [2]. The inclusion and exclusion criteria for these
6
patients were as previously described. The participants who met three or more of the
7
following criteria were clinically diagnosed as BV: a homogenous, milky vaginal discharge; >
8
20% clue cells on wet mount; a vaginal discharge with an elevated pH (≥ 4.5); and release of
9
a fishy amine odor upon addition of 10% potassium hydroxide (KOH) solution to the vaginal
10
fluid (the so-called “whiff” test). The Nugent scoring system involves performing a Gram
11
stain on a vaginal smear and enumerating lactobacilli versus Gram-negative rods and other
12
bacterial morphotypes. Only participants with a Gram stain score ≥ 7 were confirmed to have
13
BV. Participants without these changes were defined as the healthy control group. Any
14
participant having any of the following exclusion criteria was excluded from participation: <
15
18 years of age; pregnancy; diabetes mellitus; the use of probiotics, prebiotics, synbiotics,
16
antibiotics, or other vaginal antimicrobials (orally or by topical application in the
17
vulvar/vaginal area) in the previous month; menstruation; presence of an intrauterine device;
18
vaginal intercourse within the latest 3 days; known active infection due to Chlamydia, yeast,
19
Neisseria gonorrhoeae, or Trichomonas vaginalis; clinically apparent herpes simplex
20
infection; or defined diagnosed HPV, HSV-2, or HIV-1 infection.
21
Sample preparation
1
22
When women underwent genital examinations before and after treatment, three swabs were
23
taken near the mid-vagina using a sterile swab from each woman, packaged, and placed in ice
24
packs. Two swabs were used to assess the BV status with the Amsel clinical criteria and
25
Nugent scoring system; the third vaginal swab was used for bacterial genomic DNA
26
extraction which were immediately transferred to the laboratory in an ice-box, and stored at
27
-80°C after preparation within 15 min for further analysis.
28
Total bacterial genomic DNA extraction
29
The bacterial cells retrieved on swabs were submerged in 1 ml of sterile normal saline
30
(prepared with RNase-free H2O [pH 7.0]) and vigorously agitated to dislodge cells, while
31
vaginal swabs that were not used serves as negative controls, which were handled in the same
32
way. The cells were pelleted by centrifugation (Thermo Electron Corporation, Boston, MA,
33
USA) at full speed (≥ 10,000 g) for 10 min, washed by re-suspending the cells in sterile
34
normal saline, and centrifuged at full speed for 5 min. Then, bacterial DNA was extracted
35
from the vaginal swabs using QIAamp DNA Mini Kit (QIAGEN, Hilden, Germany)
36
according to manufacturer’s instructions, with the following minor modification: samples
37
were agitated with 100 mg zirconium beads (0.1 mm) in Mini-beadbeater (FastPrep, Thermo
38
Electron Corporation, USA) for 2 min and incubated at 56°C for 1 hour in lysis solution
39
containing proteinase K [3, 4]. The concentration of extracted DNA was determined using a
40
NanoDrop ND-1000 spectrophotometer (Thermo Electron Corporation); the integrity and size
41
were checked using 1.0% agarose gel electrophoresis containing 0.5 mg/ml ethidium bromide.
42
All DNA was stored at -20°C before further analysis.
43
PCR Amplification and Sample Pooling for 454 pyrosequencing
2
44
PCR amplification of the 16S rRNA gene hypervariable V3-region was performed with
45
universal bacterial primer which corresponded to positions 341 to 534 in Escherichia coli.
46
Amplicon pyrosequencing was performed with standard 454/Roche GS-FLX Titanium
47
protocols. To pool and sort multiple samples in a single 454 GS-FLX run, we designed unique
48
barcode of 8 nucleotides to identify each sample (Table S5). We used a set of 8-bp barcodes
49
designed according to Fierer et al. [5-7]. The main criterion of these barcodes is that the
50
adjoining nucleotides must be different because the single nucleotide repeats are the main
51
source of errors in pyrosequencing technology. The resulting forward primer was a fusion of
52
the 454 life science adaptor A, the barcode, and 341F (5’-GCCTCCCTCGCGCCATCAG
53
-NNNNNNNN-ATTACCGCGGC TGCTGG -3’). And the resulting reverse primer was a
54
fusion of the 454 life science adaptor B, the same barcode with forward primer, and 534R (5’-
55
GCCTTGCCAGCCCGCTCAG-NNNNNNNN-CCTACGGGAGGCAGCAG -3’). The PCR
56
amplicon library was created for each individual DNA sample. The amplification mix
57
contained 1.25 U of Hot Start Taq polymerase (Takara, Dalian, China), 1 x PCR buffer (2.5
58
mM MgCl2 included), 3 pmol of each primer, 200 mM each deoxynucleoside triphosphate
59
(dNTP) and 1 l of extracted bacterial DNA in a total volume of 50 l. Samples were initially
60
denatured at 94 °C for 5 min, then amplified by using 30 cycles of 94 °C for 30 s, 55 °C for
61
30 s, and 72 °C for 30 s. A final extension of 7 min at 72 °C was added at the end of the
62
program to ensure complete amplification of the target region. Negative controls (both
63
no-template and template from unused swabs) were included in all steps of the process to
64
check for primer or sample DNA contamination. Before pooling these 100 samples, PCR
65
products were purified by electrophoresis on a 1% agarose gel and eluted with QIAquick Gel
3
66
Extraction Kit (QIAGEN). The concentration of each PCR product was measured by using a
67
NanoDrop ND-1000 spectrophotometer (Thermo Electron Corporation) three times and then
68
quantified by using on-chip gel electrophoresis with Agilent 2100 BioAnalyzer (Agilent
69
Technologies, Santa Clara, CA, USA) and DNA LabChip Kit 7500. Individual amplicon
70
libraries were pooled in equimolar amounts, and subjected to emulsion PCR, and generated
71
amplicon libraries and sequenced unidirectionally in the reverse direction (B-adaptor) by
72
means of the Genome Sequencer FLX (GS-FLX) system (Roche, Basel, Switzerland) at 454
73
Life Sciences. Because samples were pooled by equal mass, variation in the number of
74
sequences recovered from each sample likely reflects slight biases in PCR efficiency among
75
primer barcodes.
76
Sequence processing pipeline for 454 pyrosequencing reads
77
Initially, all pyrosequencing reads were screened and filtered for quality and length of the
78
sequences using customized Perl scripts. Raw sequences were processed and analyzed
79
following the procedure described previously [7]. Sequences were included in the subsequent
80
analysis, only if the sequences met all four of the following criteria: (1) the sequence carries
81
the correct barcode and exact match to the primer in at least one end; (2) the sequence carries
82
the correct primer sequence in the other end, even though the barcode absent; (3) the sequence
83
has a length of longer than 160 nucleotides (excluding barcode and primer A sequences) [8];
84
and (4) the sequence without any ambiguous bases (Ns). Because all of the samples are
85
pooled into a single sequencing reaction, we incorporate the barcodes to allow reads from
86
each individual sample to be identified. In this way, we could analyze the sequences from
87
each sample separately. Sequencing reads were derived directly from FLX sequencing run
4
88
output files. This resulting multi-FASTA file contained 498,814 total high-quality reads.
89
Based upon the individual sample barcode sequence, those sequences specific for each sample
90
were extracted from the multi-FASTA file into individual FASTA files. The sequences were
91
then relabeled according to denote the original sample. In order to facilitate the subsequent
92
pipeline pyrosequencing analysis, the script then trimmed off the forward primer sequence
93
and barcode sequence after alignment and oriented the remaining sequence such that all
94
sequences begin with the 5’ end according to standard sense strand conventions. We included
95
only sequences with the forward primer motif to ensure that the highly informative V3 region
96
was available for taxonomic assignment.
97
Phylogenetic assignment, alignment and clustering of 16S rRNA gene fragments
98
The qualified 16S rRNA gene fragments were phylogenetically assigned according to their
99
best matches to sequences based upon BLASTn using WND-BLAST [9] and a curated 16S
100
database derived from high quality 16S sequences obtained from RDPII database [10].
101
Phylogenetic assignments were also evaluated using the Nearest Alignment Space
102
Termination (NAST, http://greengenes.lbl.gov/NAST) database [11]. Multiple sequence
103
alignment was done using a newly update version of DOTUR, called MOTHUR (version
104
1.20.0; http://www.mothur.org/) [12], which are designed to be a platform that will enable to
105
align their 16S rRNA gene sequences, calculate pairwise distances, and analyze the resulting
106
distance matrices. With the MOTHUR filter.seqs command, we will remove any vertical gaps
107
from the alignment. Based on the alignment, a distance matrix at a given % dissimilarity was
108
constructed using with MOTHUR dist.seqs command. The proportion of OTUs shared among
109
the communities was determined using MOTHUR, which uses the .list output files from
5
110
MOTHUR as input and determines the fraction of OTUs shared by communities as a function
111
of genetic distance. These pairwise distances served as input to MOTHUR for clustering the
112
sequences into OTUs of defined sequence similarity that ranged from unique to 3%
113
dissimilarity. A distance matrix was then generated using the MOTHUR tree.shared
114
command. These clusters served as OTUs for generating predictive rarefaction models and for
115
making calculations with the richness (diversity) indexes ACE and Chao1 [13] in MOTHUR.
116
These programs were run on a SUSE Linux Enterprise 10 machine, 24 quad core 48
117
processors at 3.0 Ghz with 128 GB of RAM.
118
Statistical data analysis of OTU richness: rarefaction, Chao 1 and ACE
119
To estimate species richness and diversity, taxonomy-independent methods were used.
120
Clustering was done with a given % dissimilarity for inclusion into an OTU and was
121
performed on alignments of sequences from individual participant. The matrices were used to
122
define operational taxonomic units with 1% dissimilarity for determination of the coverage
123
percentage by Good’s method. The species richness and relative abundance of species
124
(Evenness) was estimated by further sampling-based (rarefaction) analyses of OTU data and
125
of calculated Shannon and Simpson diversity indices. These clusters served as OTUs for
126
generating rarefaction curves and for making calculations with the richness and diversity
127
indexes, abundance based coverage estimator (ACE), bias-corrected Chao1 richness estimator,
128
in MOTHUR at each dissimilarity level. Shannon index characterize diversity based on the
129
number of species present (species richness). The Shannon index of evenness was calculated
130
with the formula E = H/ln(S), where H is the Shannon diversity index and S is the total
131
number of sequences in that group. This metric is insensitive to the taxa richness and ranges
6
132
from 0 to 1, with 0 representing complete dominance and 1 representing an evenly structured
133
community. Good’s coverage percentage was calculated as [1/ (n/N)]/100, where n represents
134
the number of single-member phylotypes and N represents the number of sequences. The
135
resulting tables of OTU clusters versus dataset and primer were the source data for the Venn
136
diagrams. We plotted our Venn diagrams using the Venn Diagram Plotter program written by
137
Littlefield and Monroe at the Department of Energy, PNNL, Richland, VA. Taxonomy-based
138
analyses were performed by assigning taxonomic status to each sequence using the Naïve
139
Bayesian CLASSIFIER program of the Michigan State University Center for Microbial
140
Ecology Ribosomal Database Project (RDP) database (http://rdp.cme.msu.edu/) [14] with an
141
50% bootstrap score. Sequences were aligned using INFERNAL Aligner both from individual
142
participant and then as pooled sequences from all participants of a single group. Cluster
143
analysis was performed using the complete linkage clustering algorithm available through the
144
Pyrosequencing pipeline of the Ribosomal Database Project [15]. The neighbor-joining tree
145
was constructed using the MEGA 4.0 program based on the Jukes-Cantor model and used for
146
UniFrac analysis. Principal coordinates analysis (PCA) of vaginal bacterial communities
147
among the 115 samples from 4 groups was obtained by pyrosequencing, and performed using
148
the R program. Statistical analyses for Shannon and Simpson index were performed using
149
SPSS Data Analysis Program version 16.0 (SPSS Inc, Chicago, IL) with One-Way ANOVA.
150
All tests for significance were two-sided, and p values < 0.05 were considered statistically
151
significant.
152
Real-time qPCR for vaginal microbiota
153
To estimate the accurate copy numbers of bacteria in vagina samples and validate the relative
7
154
abundance of bacteria in genus determined by 454 pyrosequencing, 16S rRNA gene-targeted
155
quantitative PCR (qPCR) was performed with a Power SYBR Green PCR Master Mix
156
(Takara, Dalian, China) on an ABI 7900 Real-time PCR instrument according to the
157
manufacturer’s instructions (Applied Biosystems, Foster city, CA). Species-specific primer
158
sets were chosen to quantify total bacteria, Lactobacillus genus, L. iners, L. crispatus, L.
159
jensenii, Gardnerella vaginalis, Atopobium vaginae, Eggerthella sp., Megasphaera typeⅠsp.,
160
Leptotrichia/Sneathia sp. and Prevotella sp. (Table S6). For each primer set, a constructed
161
plasmid was chosen to create a 10-log-fold standard curve for direct quantification of all
162
samples. With the exception of total domain Bacteria and Lactobacillus genus, all standard
163
curve genes were amplified from the vaginal samples, constructed plasmids, sequenced and
164
confirmed the source of target organisms by BLAST in GenBank. For total domain Bacteria
165
and Lactobacillus genus, Escherichia coli ATCC 25922 and Lactobacillus casei ATCC 27139
166
was used to create the plasmid standards, respectively. For each, the product was cloned into
167
pMD18-T vector using the Simple TA Cloning Kit (Takara, Dalian, China) following the
168
manufacturer’s procedure. Purified insert-containing plasmids were quantified using a
169
NanoDrop ND-1000 spectrophotometer (Thermo Electron Corporation), and taking into
170
account the size of the product insert, the number of target gene copies was calculated from
171
the mass of DNA. Tenfold serial dilutions ranging from 1 × 109 to 1 gene copies were
172
included on each 96-well plate. Each subject’s extracted DNA was subjected to a human
173
-Globin PCR to ensure that amplifiable DNA was successfully extracted from the sample
174
and to monitor for PCR inhibitors with the same protocol listed for bacterial PCR [13]. Each
175
qPCR contained 12.5 L of 2 × Takara Perfect Real Time master mix, 10.9 L of water, 0.3
8
176
L of a 10 M F/R primer mix, and 1 L of extracted bacterial genomic DNA. Cycling
177
conditions: 95 °C for 3 min; 40 repeats of the following steps: 94 °C for 30 s, 30 s annealing
178
at different temperature, and 72°C for 30 s. At each cycle, accumulation of PCR products was
179
detected by monitoring the increase in fluorescence of the reporter dye, dsDNA-binding
180
SYBR Green. Following amplification, melting temperature analysis of PCR products was
181
performed to determine the specificity of the PCR. Melting curves were obtained from 55 °C
182
to 90 °C, with continuous fluorescence measurements taken at every 1 °C increase in
183
temperature. Data analysis was conducted with Sequence Detection Software version 1.6.3,
184
supplied by Applied Biosystems. All reactions were carried out in triplicate and a nontemplate
185
control was performed in every analysis. In addition, the abundance of each group relative to
186
total domain Bacteria gene copy number was calculated for each replicate, and the mean,
187
standard deviation and statistical significance were determined. Comparisons between BV-M
188
and BV-L women were calculated with unpaired t-tests (SPSS Data Analysis Program version
189
16.0, SPSS Inc, Chicago, IL) and were considered statistically significant if p < 0.05.
190
9
191
References
192
1.
Amsel R, Totten PA, Spiegel CA, Chen KC, Eschenbach D, Holmes KK (1983)
193
Nonspecific vaginitis. Diagnostic criteria and microbial and epidemiologic associations.
194
Am J Med, 74:14-22.
195
2.
Nugent RP, Krohn MA, Hillier SL (1991) Reliability of diagnosing bacterial vaginosis
196
is improved by a standardized method of gram stain interpretation. J Clin Microbiol,
197
29:297-301.
198
3.
Ling Z, Kong J, Liu F, Zhu H, Chen X, Wang Y, Li L, Nelson KE, Xia Y, Xiang C
199
(2010) Molecular analysis of the diversity of vaginal microbiota associated with
200
bacterial vaginosis. BMC Genomics, 11:488.
201
4.
Ling Z, Kong J, Jia P, Wei C, Wang Y, Pan Z, Huang W, Li L, Chen H, Xiang C (2010)
202
Analysis of oral microbiota in children with dental caries by PCR-DGGE and
203
barcoded pyrosequencing. Microb Ecol, 60:677-690.
204
5.
Fierer N, Hamady M, Lauber CL, Knight R (2008) The influence of sex, handedness,
205
and washing on the diversity of hand surface bacteria. Proc Natl Acad Sci U S A,
206
105:17994-17999.
207
6.
Parameswaran P, Jalili R, Tao L, Shokralla S, Gharizadeh B, Ronaghi M, Fire AZ
208
(2007) A pyrosequencing-tailored nucleotide barcode design unveils opportunities for
209
large-scale sample multiplexing. Nucleic Acids Res, 35:e130.
210
7.
Hamady M, Walker JJ, Harris JK, Gold NJ, Knight R (2008) Error-correcting
211
barcoded primers for pyrosequencing hundreds of samples in multiplex. Nat Methods,
212
5:235-237.
10
213
8.
Roh SW, Kim KH, Nam YD, Chang HW, Park EJ, Bae JW (2010) Investigation of
214
archaeal and bacterial diversity in fermented seafood using barcoded pyrosequencing.
215
ISME J, 4:1-16.
216
9.
Dowd SE, Zaragoza J, Rodriguez JR, Oliver MJ, Payton PR (2005) Windows .NET
217
Network Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST). BMC
218
Bioinformatics, 6:93.
219
10.
Cole JR, Chai B, Farris RJ, Wang Q, Kulam-Syed-Mohideen AS, McGarrell DM,
220
Bandela AM, Cardenas E, Garrity GM, Tiedje JM (2007) The ribosomal database
221
project (RDP-II): introducing myRDP space and quality controlled public data.
222
Nucleic Acids Res, 35:D169-172.
223
11.
DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi
224
D, Hu P, Andersen GL (2006) Greengenes, a chimera-checked 16S rRNA gene
225
database and workbench compatible with ARB. Appl Environ Microbiol,
226
72:5069-5072.
227
12.
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski
228
RA, Oakley BB, Parks DH, Robinson CJ et al (2009) Introducing mothur: open-source,
229
platform-independent, community-supported software for describing and comparing
230
microbial communities. Appl Environ Microbiol, 75:7537-7541.
231
13.
model. Biometrics, 58:531-539.
232
233
234
Chao A, Bunge J (2002) Estimating the number of species in a stochastic abundance
14.
Wang Q, Garrity GM, Tiedje JM, Cole JR (2007) Naive Bayesian classifier for rapid
assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ
11
Microbiol, 73:5261-5267.
235
236
15.
Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen AS,
237
McGarrell DM, Marsh T, Garrity GM et al (2009) The Ribosomal Database Project:
238
improved alignments and new tools for rRNA analysis. Nucleic Acids Res,
239
37:D141-145.
12
Download