SUPPLEMENTARY INFORMATION Supplement to: Borgdorff H, Tsivtsivadze E, Verhelst R, Marzorati M, Jurriaans S, Ndayisaba GF, Schuren FH, van de Wijgert JHHM. A Lactobacillus-dominated cervicovaginal microbiome is associated with reduced HIV/STI prevalence and genital HIV viral load in African female sex workers. Supplementary materials and methods Specimen collection and diagnostic testing Diagnostic tests for sexually transmitted infections (STIs), bacterial vaginosis (BV), candidiasis, pregnancy, and cervical cytology were conducted at regular intervals throughout the study, including the month 6 visit (M6; all) and the year 2 visit (Yr2; all except for bacterial STIs). The median time period between the M6 and Yr2 visits was 17 months (range 14-20 months). Testing was conducted on-site at Rinda Ubuzima in Kigali, Rwanda, unless stated otherwise. Whole blood was tested for HIV using the First Response rapid test (Premier Medical Corporation, Nani Daman, India), followed by Uni-Gold rapid test (Trinity Biotech Plc, Bray, Ireland) when the first test was positive, and the Capillus HIV-1/HIV-2 rapid test (Trinity Biotech Plc, Bray Ireland) as tie-breaker if needed. CD4 count of HIV-positive samples was determined using CD4 cytometry (testing done at the National Reference Laboratory in Kigali). Plasma specimens were tested for HSV-2 (HerpeSelect 2 ELISA, Focus Diagnostics Inc, Cypress, USA; with index ≥3.5 defined as positive) and syphilis serology (Spinreact Rapid Plasma Reagin test with confirmation by Spinreact T. Pallidum Haemagglutination test, Girona, Spain). Spatulas and cytobrushes were used to prepare a slide for conventional cytology and to rinse in Preservcyt medium (ThinPrep Pap Test; Cytyc Corporation, Boxborough, USA). The Preservcyt specimens were stored at - 1 80⁰C until batched testing at the end of the study in specialized laboratories for HPV genotyping (Linear Array HPV Genotyping Test, Roche Molecular Systems, USA; testing done at the Institute of Tropical Medicine, Antwerp, Belgium) and phylogenetic microarray analysis (TNO, Zeist, the Netherlands). An endocervical swab was used for N. gonorrhoeae and C. trachomatis testing using the Amplicor CT/NG PCR test (PCR Roche Diagnostic Corp, Indianapolis, USA; testing done at the Institute of Tropical Medicine, Antwerp, Belgium), and a vaginal swab was used for T. vaginalis InPouch (Biomed Diagnostics, White City, USA) testing. Vaginal swabs were also used to prepare a wet mount and Gram stain slide to determine the presence of >20% clue cells, and for Gram stain Nugent scoring. KOH was added to the wet mount slide to detect an amine smell and visualize yeasts. The vaginal pH was measured by pressing a pH paper strip against the vaginal wall (pH range 2-9 with 0.5 increments). A serum pregnancy test (Fortress Diagnostics hCG serum pregnancy test, Antrim, UK) was used to screen for pregnancy. Microarray design The first version of the microarray contained 461 DNA hybridization probes targeting microorganisms and 164 positive (16S conserved regions) and negative controls. The probes (approximately 20 base pairs targeting 16S or 18S rRNA gene sequences unique for certain species or higher taxonomic levels, or groEL sequences targeting Enterobacteriaceae) were designed based on available literature (Dols et al., 2011). We used various methods to determine the validity and specificity of the probes, including the ‘probe match’ function of the Ribosomal Database Project database (RDP, release 10, Cole et al., 2009), hybridization with wellcharacterized culture isolates, comparisons of duplicate micro-array runs, and comparisons with quantitative PCR results (data not shown). Of the 461 probes, 68 16S probes were aspecific and 210 probes never gave a signal/background (S/B) ratio>5. Of the remaining 251 probes, 66 16S 2 probes were species-specific, 56 16S probes targeted multiple bacterial species within one genus, 36 16S probes were specific at family or order level, 69 targeted higher taxonomic levels, 5 were groEL probes, 16 were 18S probes, and 3 were viral probes. The 18S probes included eukaryotes in the kingdoms Fungi, Metazoa, and Alveolata, including Candida glabrata, but excluding Candida albicans and Trichomonas vaginalis. The latter were assessed by separate diagnostic tests (see above). A complete list of the 251 probes is listed in the supplementary data file. The second version of the micro-array contained 42 additional probes targeting micro-organisms that were selected after 454-sequencing of a highly diverse set of 100 cervicovaginal samples (from African, American and European women of different ages with and without HIV, other STIs, pelvic inflammatory disease and BV) using 16S, 18S, and intergenic transcribed spacer (ITS) primers. Most samples in this study were analyzed on the first version of the micro-array, and the 42 additional probes did not provide important additional information: 11 of these probes were species- or genus-specific but only 2 returned S/B ratios with variation between samples (one targeting G. vaginalis, which was already sufficiently covered by other probes, and one targeting Actinomyces odontolyticus/ A. marimammalium/ A. canis, returning only low signals). We therefore only present data on the 461 probes that were present on both versions of the microarray. Probe names The probe names in the manuscript are based on the results of ‘probe-match’ searches in RDP, allowing a maximum of one mismatch. When possible, only bacteria identified at the species or genus-level in RDP were used in our analyses and figures. A few probes were classified as 3 ‘uncultured bacterium’ in RDP. We also refer to these bacteria as ‘uncultured’. A probe targeting a bacterium classified by RDP as an uncultured bacterium in the Lachnospiraceae family matched perfectly with a bacterium recently named BV-associated bacterium 1 (BVAB1) in Genbank (Genbank entry AY724739.1, Fredricks et al., 2005). We refer to it as BVAB1 and included it in the 122 species/genus-specific probes. In addition, we made the following assumptions based on existing knowledge about bacterial presence in the vagina: 1) The 16S L. crispatus/ L. kefiranofaciens probe mostly identified L. crispatus; 2) The 16S M. mulieris and A. hongkongensis probe mostly identified M. mulieris; and 3) the 16S probe targeting Escherichia and Shigella spp. as well as Trabulsiella odontotermitis identified only Escherichia and Shigella spp. Probes that were specific for several species within one genus are referred to by one or two species, followed by ‘other’, to improve readability. The following table shows the additional species that are targeted by these probes: Abbreviated probe names targeting multiple species in the Lactobacillus genus. Probe name Species targeted (as determined by RDP) L. jensenii/L. salivarius/other L. jensenii/ L. saerimneri/ L. fornicalis/ L. salivarius L. gasseri/L. johnsonii/other L. gasseri/ L. johnsonii/ L. taiwanesis/ L. collinoides/ L. acetotolerans/ L. paracollinoides/ L. similis/ L. odoratitofui L. vaginalis/other L. vaginalis/ L. collinoides/ L. paracollinoides/ L. similis/ L. odoratitofui/ L. panis/ L. pontis/ L. psittaci/ L. kimchicus Mycoplasma spp. Mycoplasma hominis/ pulmones/ salivarium/ anseris/ subdolum/ cloacale/ arthritidis/ hyosynoviae/ auris/ buccale/ spumans/ gypis/ columborale/ crocodyli/ simbae/ feliminutum Veillonella spp. Veillonella dispar/ caviae/ montpellierensis/ rogosae/ atypica/ rodentium/ parvula/ denticariosi Ureaplasma spp. Ureaplasma urealyticum/ parvum/ canigenitalium 4 Two Gardnerella probes are included in the figures and microbiome descriptions: ‘G. vaginalis_1’, which targets a cultured strain with Genbank entry M58744; and ‘Gardnerella uncultured bacterium’, which targets uncultured strains with Genbank entries AY738666 and JX104017. These 16S sequences were discovered after 16S rDNA sequencing of vaginal samples (Fredricks et al., 2005). Microarray sample preparation and labeling Microarray sample preparation, labeling, amplification, and hybridization were described previously (Dols et al., 2011). For DNA extraction, we used the AGOWA mag Mini DNA isolation kit (AGOWA GmbH , Berlin, Germany), using 250 μl sample (suspended in Preservcyt medium), 250 μl 0.1 mm zirconium beads (BioSpec Products Inc, Bartlesville, USA) and 250 μl phenol and 100 μl lysis buffer. Lysis of cells was caused by bead beating of this suspension in a BeadBeater (BioSpec Products Inc, Bartlesville, USA). After centrifugation at 5000 rpm, the aqueous phase was separated from the phenolic phase using 400 μl binding buffer, 15 μl magnetic beads and a magnetic separator (Dynal MPC-9600, Invitrogen Dynal, Oslo, Norway). The isolated DNA was stored at -20 °C until further analysis. Amplification and hybridization The PCR labeling reaction contained 10 μl DNA solution, Qiagen 2X Multiplex PCR Master Mix (Qiagen Inc., Chatsworth, USA), the primers 16S-F/ unibifi-phospho (25-2.5 pmol/μl), 16S338-F-phospho (25 pmol/μl), 16S-1061-R-Cy3 (25 pmol/μl), 18S-Diat-F-phospho (25 pmol/μl), 18S-Diat-R-Cy3 (25 pmol/μl), Entero(Hsp60)-F-phospho (25 pmol/μl), Entero(Hsp60)-R-Cy3 (25 pmol/μl), HPV-MY09-phospho (25 pmol/μl), HPV-MY11-Cy3 (25 pmol/μl), HSV-gpB-Fphospho (25 pmol/μl), and HSV-gpB-R-Cy3 (25 pmol/μl) with a total volume of 27 μl. The 5 program was 15 minutes at 94 °C, 32 x (20 sec 94 °C / 90 sec 50 °C / 80 sec 72 °C), 5 minutes 72 °C. The PCR products were purified using G50 Autoseq columns (GE Healthcare, Buckinghamshire, England). The samples were dried and 8.5 μl milli-Q water, 1 μl lambda exonuclease buffer and 0.5 μl lambda exonuclease (5u/μl, New England Biolabs, Ipswich, USA) were added, incubated for 25 minutes at 37 °C and the enzyme was inactivated in 10 minutes at 75 °C. Fifteen μl milli-Q water was added and the samples were purified using the columns and dried again. Forty-five μl Easy Hyb (Roche Applied Science, Roche Applied Science, IN, USA) was added to the samples before they were placed onto a thermoblock for 2 minutes at 95 °C and transferred onto the microarray slides at 37 °C. Each slide contained 12 arrays, and each array contains 625 spots (containing the probes). The samples were hybridized during 4 hours at 37 °C while shaken at 130 rpm. After washing the slides with 4 washing buffers and drying them with N2, the slides were scanned using a ScanArray Express 4000 Scanner at 10 μm pixel size (Perkin-Elmer, Waltham, USA). Supplementary results Correlations between micro-array probes As described in the main manuscript, to assess positive and negative correlations between bacteria, we first determined Spearman correlation coefficients (with Bonferroni correction) between genera, followed by those between species within genera that were statistically significantly correlated plus species not yet classified at genus level, such as BVAB1. Most anaerobic bacteria were positively correlated with each other but negatively correlated with Lactobacillus species (Figure S1). P. bivia, other Prevotella species, BVAB1, and the uncultured bacterium in the Gardnerella genus correlated poorly with other anaerobic bacteria. 6 To correlate microarray findings with Nugent scores, the S/B ratios of 12 Lactobacillus probes (covering 70 species) were summed, the S/B ratios of one G. vaginalis probe and one Bacteroides fragilis probe were summed, and one M. mulieris probe was used. As expected, we found that the probes representing the morphotypes included in Nugent scoring (G. vaginalis and Bacteroides fragilis combined, M. mulieris, and 12 lactobacilli probes covering 70 species) correlated well with the Nugent scores (all p<0.001) (Figure S2). Probes for the anaerobic genera Prevotella, Atopobium, Megasphaera, Dialister, and Sneathia, as well as BVAB1, also correlated well with Nugent scores (all p <0.001). Positive correlations were found between HIV-1 RNA concentrations and semi-quantitative levels of Prevotella (rho=0.40, p=0.002), Sneathia (rho=0.38, p=0.003), Dialister (rho=0.36, p=0.006), Porphyromonas (rho=0.33, p=0.01), Atopobium (rho=0.30, p=0.02), Arcanobacterium (rho=0.28, p=0.03), Moryella (rho=0.28, p=0.04), and Mobiluncus (rho=0.27, p=0.04) species. Lactobacillus was the only genus with a negative correlation with HIV-1 RNA levels (rho=0.37, p=0.005). In logistic regression models adjusted for CD4 count, the association between these genera and detectable HIV-1 RNA remained statistically significant, except for the correlations with Arcanobacterium, Moryella, and Mobiluncus species. 7 References Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ et al. (2009). The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37: D141D145. Dols JAM, Smit PW, Kort R, Reid G, Schuren FHJ, Tempelman H et al. (2011). Microarraybased identification of clinically relevant vaginal bacteria in relation to bacterial vaginosis. Am J Obstet Gynecol 204: 305.e1-7. Fredricks DN, Fiedler TL, Marrazzo JM. (2005). Molecular identification of bacteria associated with bacterial vaginosis. N Engl J Med 353: 1899-1911. 8 Supplementary table and figures Table S1: Determinants of cervicovaginal microbiome clusters An expanded version of Table 3, showing sociodemographic, behavioral, and clinical characteristics of women in each of the 6 microbiome clusters as well as the group of women not assigned to a cluster. Figure S1: Probe correlation matrix The probes that are shown in the same order on the x- and y-axis correlated significantly with each other (after Bonferroni correction) with a Spearman’s rho of more than 0.5 or less than -0.5. The color scale corresponds to Spearman’s rho values. The (-) sign indicates correlations that were not significant after Bonferroni correction. 1Abbreviated probe name; additional targeted species in the same genus are listed in the table above. Figure S2. Correlation between Nugent scoring and probes Correlation between Nugent scoring and probes representing the morphotypes included in Nugent scoring (G. vaginalis and Bacteroides fragilis combined, M. mulieris, and 12 lactobacilli probes covering 70 species). Graphs show the correlation between (A) the S/B ratio of the respective probes and the Nugent score; (B) the S/B ratio of the respective probes and the Gscore, M-score, and L- score (the three components of the Nugent score), respectively. The line represents the median S/B ratio per score, and the error bars represent the interquartile range. 9 SUPPLEMENTARY DATA FILES These could not be converted to pdf and are available upon request. - Microarray dataset (MIAME 1): raw data for each hybridization. - Microarray dataset (MIAME 2-5): final processed data, including sample annotation, sample data relationships, and probe annotation (targeted micro-organisms per probe; oligo sequences are protected by TNO, Zeist). - Required information related to MIAME 6 is described in the methods section and supplementary information. 10