1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Supporting information - Materials and Methods DNA extraction from isolated cultures For bacteria, the enzyme-heat lysis method for preparation of cell-free DNA lysate (Jeyaram et al. 2010) was used with additional treatment of 20 U of mutanolysin along with lysozyme (Sigma-Aldrich). For yeasts, the above two enzymes were replaced by 50 U of lyticase (Sigma-Aldrich). The DNA content was quantified spectrophotometrically using a ND-1000 spectrophotometer (NanoDrop Technologies, Rockland, USA). The cell-free DNA lysates with absorbance ratio (A260/280) of 1.8 to 2.2 were used as the template for PCR. For LAB isolates which failed to give PCR amplification, the genomic DNA was extracted using the method developed in this study for metagenomic DNA extraction, as described in the subsequent sections, with the following modifications. Cells equivalent to 2 OD660 of 24 48 h old cultures were used for the extraction after washing with 0.1 M PBS buffer (pH 6.4). Five KU of lysozyme (Sigma-Aldrich) and 20 U of mutanolysin (Sigma-Aldrich) were used for bacteria, and 50 U of lyticase (Sigma-Aldrich) for yeasts. The initial cell lysis was done at 55 C for 2 h. DNA was stored at -20 C until further required. Identification of culturable microorganisms The culturable microorganisms were identified by amplified ribosomal DNA restriction analysis (ARDRA) based grouping followed by rRNA gene sequencing. PCR amplification was carried out in 25 µL final reaction volume containing 30 50 ng of the genomic DNA as mentioned in the Supporting Information Table S1. Template free PCR amplification was done for every set of PCR reaction. Four to five µL of the amplified bacterial SSU rRNA gene and yeasts ITS1-5.8S-ITS2 amplicons were separately digested with HaeIII, HinfI and CfoI (for bacteria), and with HaeIII, DdeI and TaqI (for yeasts) (Promega, Madison, WI, USA) in 10 µL reaction volume as per manufacturer’s instructions. The restriction patterns were analyzed by electrophoresis of the 10 µL reaction volume on 2.0 % (w/v) agarose gel in parallel with PCR 100 bp Low DNA ladder (Sigma-Aldrich) as molecular size standard. Electrophoresis was run at 80 V for 2 h in 0.5 TBE [45 mM Trisborate, 1 mM EDTA (pH 8.0)] buffer. After staining in 0.5 µg mL-1 ethidium bromide solution and destaining for 30 min each, the gel was documented using ChemiDoc MP (Bio Rad, Hercules, USA). The sizes of the DNA fragments were measured using linear regression method in Image Lab v4.0.1 software (Bio Rad). The restriction fingerprints were analyzed using GelCompar II software, v6.5 (Applied Maths, Sint-Martens-Latem, Belgium). A composite data set of the digestion profiles obtained from the three restriction enzymes was generated with 2 3 % position tolerance. A dendrogram was created using UPGMA based clustering of Jaccard similarity coefficient to cluster the isolates with similar ARDRA profiles into phylotype groups. The cluster cut-off value was visually set at 50 % similarity between the composite restriction patterns. Both qualitative (the different phylotype groups) and quantitative (relative number of isolates per phylotype group) data were used for determination of biodiversity estimates and statistical analyses. The representative isolates from each phylotype group were randomly selected for sequencing of bacterial SSU rRNA gene and yeasts LSU rRNA gene D1/D2 domain and ITS1-5.8S-ITS2 regions using the conditions described in Supporting Information Table S1. The amplified products were purified using NucleoSpin Extract II gel extraction kit (Macherey-Nagel, Düren, Germany) following manufacturer’s instructions. The sequencing reactions were performed (Merck, Bangalore, India) to cover the full length of the target regions using multiple primers. The full length sequences were generated by assembling the 1 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 partial sequences into contigs using DNA Baser v3 software (Heracle BioSoft SRL, Arges, Banat). The base calls of the sequences were validated using Chromas LITE v2.01 software (www.technelysium.com.au). The assembled bacterial sequences were quality-checked for the presence of chimera using Pintail v1.0 and DECIPHER softwares (http://decipher.cee.wisc.edu/FindChimeras.html) (Wright et al. 2012). To designate the taxonomic status of the isolates the sequences were queried against NCBI's non-redundant, reference RNA sequence database (refseq_rna) using BLASTN algorithm (http://blast.ncbi.nlm.nih.gov/Blast.cgi) with 98 % similarity as identification threshold. Any ambiguous identification arises was confirmed using Ribosomal RNA Database Project release 10 (http://rdp.cme.msu.edu/seqmatch/seqmatch_intro.jsp) and CBS yeast nucleotide database (http://www.cbs.knaw.nl/Collections/). The identified sequences were aligned using Clustal X algorithm implemented in MEGA5 along with the sequences of the nearest known taxa, and a neighbour joining tree was constructed based on the evolutionary distance calculated using Kimura-2-parameter substitution model (Tamura et al. 2011). Determination of DGGE band identity The DGGE bands of interest were excised from the polyacrylamide gel using sterile scalpel and the DNA was eluted in 50 µL sterile deionized water by overnight incubation at 4 C. Two µL of the eluted DNA was re-amplified using the conditions described previously. The re-amplified products were checked for the quality in agarose gel, presence and position of the selected band by running in DGGE and comparing with parent DGGE profile. The above elution and re-amplification steps were repeated twice or until getting a pure single band that co-migrated with parent DGGE band, after which the PCR products were purified and sequenced using M13 primer as described in the Supporting Information Table S1. The closest known taxonomic identities of the DGGE bands were determined by sequence similarity search as previously described. The DGGE bands that produced reproducible multiple-band profile on subsequent elution steps were declared as heteroduplexes. Determination of microbial diversity estimates To check the changes in the bacterial diversity during the fermentation, the cultivation-dependent and the cultivation-independent PCR-DGGE data were used for the calculation of richness estimates and diversity indices using EstimateS v9.1.0 software (Colwell 2013). For the culturable-based data, the number of different phylotype groups and relative number of isolates per phylotype group were used for the calculations. For the culture-independent data, the absence/presence and quantitative densitometric values of intensities of different DGGE bands were used for the calculations. Evenness (Shannon index of evenness) was calculated using the formula eH´/N, where H´ is the Shannon diversity index and N is the observed number of phylotypes or DGGE bands. The percentage of coverage was calculated by Good’s method using the formula [1(n/N)]100, where n is the number of phylotypes represented by one isolate (singletons) present in each fermentation stage and N is the total number of isolates in that fermentation stage. Illumina data analysis The raw paired-end reads obtained as fastq files from the Illumina MiSeq platform were joined to generate the V4-V5 targeted amplicons and demultiplexed to assign the reads according to the sample by using the web-based file processing tools of the MG-RAST metagenomic analysis server (Meyer et al. 2008). The sequences were quality-filtered based on length, number of ambiguous bases and phred quality scores using the default pipeline options. Sequences having base calls with phred quality score, Q < 15 were filtered out. Quality-filtered sequences were then passed through MG-RAST’s rRNA pipeline under 2 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 default parameters for secondary filtering to remove non-rRNA sequences. Sequences with less than 70 % identity to rRNA sequences from the databases of Greengenes, SILVA (SSU and LSU) and RDP were pre-screened as non-rRNA sequences. rRNA sequences were clustered into operational taxonomic units (OTUs) at 97 % identity threshold using the uclust algorithm (Edgar 2010) and the longest sequence in each OTU was chosen as the OTU representative sequence. Taxonomic annotations of the OTUs were carried out using the “best hit classification” method. BLAT similarity search algorithm (Kent 2002) with minimum alignment length cut-off of 100 bp and maximum e-value cut-off of 1e-5 was used to assign species-level taxonomic annotation at 97 % similarity threshold against the nonredundant multi-source M5RNA rRNA database implemented in MG-RAST. The MG-RAST generated OTU table was used for various downstream analyses. To assure a higher level of accuracy during subsequent analyses, the OTU table was quality filtered to remove eukaryota-specific (chloroplast and mitochondria origin) OTUs and taxonomically unassigned OTUs that did not match any reference sequences in the databases. Only those taxa that had an average relative abundance of 1 % or greater across the samples studied are indicated; taxa with less than 1 % relative abundance are combined together. The relative abundance of OTUs were analyzed at various taxonomic levels (family, genus and species) and studied by PCA using PAST v3.02 software. Hierarchical clustering of the OTUs and the stage-wise samples were performed using the complete linkage algorithm with the euclidean distance matrix calculated from the normalized relative abundance and a heat map was generated to depict the change in microbial community structure during the fermentation in R environment (http://www.r-project.org/) using the "gplots" package. Data were normalized by log10 (xi+1) transformation. For alpha diversity analysis and generation of alpha rarefaction curves, the qualityfiltered OTU table at species-level was rarefied at a range of depth of 100 8670 reads per sample using the multiple_rarefactions.py script in QIIME v1.8.0 bioinformatics pipeline (Caporaso et al. 2010). The depth is selected based on the lowest number of quality-filtered reads assigned amongst the samples analyzed. In alpha diversity analysis, two richness estimates (observed species and Chao1), three diversity indices (Fisher alpha, Simpson's reciprocal and Shannon), evenness (Shannon's equitability) and estimated sample coverage (Good's coverage) were calculated. References: Jeyaram K, Romi W, Singh TA, Devi AR, Devi SS (2010) Bacterial species associated with traditional starter cultures used for fermented bamboo shoot production in Manipur state of India. International Journal of Food Microbiology, 143, 18. Caporaso JG, Kuczynski J, Stombaugh J et al. (2010) QIIME allows analysis of highthroughput community sequencing data. Nature Methods, 7, 335336. Kent WJ (2002) BLAT the BLAST-like alignment tool. Genome Research, 12, 656664. Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26, 24602461. Meyer F, Paarmann D, D'Souza M et al. (2008) The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics, 9, 386. Colwell RK (2013) EstimateS: Statistical estimation of species richness and shared species from samples. Version 9 and earlier. User’s guide and application. Available from: http://viceroy.eeb.uconn.edu/estimates/. 3 149 150 151 152 153 154 155 156 157 158 159 Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular Biology and Evolution, 28, 27312739. Wright ES, Yilmaz LS, Noguera DR (2012) DECIPHER, a search-based approach to chimera identification for 16S rRNA sequences. Applied and Environmental Microbiology, 78, 717725. 4