1 Multi-omics analysis of niche specificity provides new insights into 2 ecological adaptation in bacteria 3 Bo Zhu1*, Muhammad Ibrahim1*, Zhouqi Cui1, Guanlin Xie1, Gulei Jin2, Michael 4 Kube3, Bin Li1$, Xueping Zhou1$ 5 1 6 University, Hangzhou 310029, China 7 2 Hangzhou Guhe Info Co., Ltd, Hangzhou 310029, China 8 3 Albrecht Daniel Thaer-Institute of Agricultural and Horticultural Sciences, 9 Humboldt-Universität zu Berlin, 14195 Berlin, Germany State Key Laboratory of Rice Biology, Institute of Biotechnology, Zhejiang 10 11 Running title: Ecological adaptation in B. seminalis 12 13 *Authors contribute equally to the work 14 $ 15 Bin Li, Xueping Zhou Corresponding author: 16 17 Mailing address: State Key Laboratory of Rice Biology, Institute of Biotechnology, 18 Zhejiang University, 310058, Hangzhou, China. 19 Phone: 86-571-88982412. Fax: 86-571-88982412. 20 libin0571@zju.edu.cn; zzhou@zju.edu.cn 21 22 Conflict of Interest Statement 23 The authors declare no conflict of interest. 24 1 25 Materials and Methods 26 Strains used in this study 27 B. seminalis strains DSM 23518 (= LMG 24067 T), 0901, S9 and R456 originated 28 from CF patient’s sputum (Vanlaere et al 2008), diseased apricot (Fang et al 2009), 29 westlake water (Fang et al 2011), and rice rhizospheric soil (Li et al 2011), 30 respectively. Unless otherwise specified, cultures of bacterial strains were maintained 31 on nutrient agar (NA) or nutrient broth (NB) media at 30°C prior to use. Cultures 32 were stored long term in 20% aqueous glycerol at -80°C. 33 Characterization of ecological roles 34 B. seminalis strains were tested for virulence in the alfalfa model (Bernier et al 2003), 35 which was carried out as described by Ibrahim et al. (2012). Pathogenicity of B. 36 seminalis to apricot was examined according to the method of Fang et al. (2009) 37 except that premature fruits were inoculated with 10 μL of bacterial suspensions at the 38 concentration of 1 × 105 CFU/mL using sterilized tips. Inhibition of B. seminalis on 39 the mycelial growth of R. solani was determined according to the method of Li et al. 40 (2011). The morphology of bacterial cells was observed using a JEOL JSM-6400 41 scanning electron microscope (Hitachi, Tokyo, Japan). 42 Growth in various niches 43 Adaptation of B. seminalis strains to various niches were investigated by incubating 44 the four strains under CF, water and soil extract media, respectively, while plant 45 condition was excluded for only strain 0901 was pathogenic to apricot. Water medium 46 that contains M9 minimal salts with 3% glycerol, was used to simulate the water 47 environment (Schell et al 2011). CF medium was prepared to mimic the sputum of CF 48 patients according to the method of (Dinesh 2010). Soil extract medium was prepared 49 to mimic soil conditions based on recent paper (Yoder-Himes et al 2009) with the 50 exception that soil was collected from the rice rhizosphere, which was the original 51 niche for strain R456. In addition, three different niche conditions were tested for 52 each of strains S9, DSM 23518, and R456. The bacterial numbers were counted based 53 on the measurement of OD 600 value (Ibrahim et al 2012). 2 54 Whole genome sequencing, assembly and annotation 55 Bacterial genomic DNA, isolated using Wizard Genomic DNA Purification Kit 56 (Promega, Madison, WI, USA), was used for whole-genome sequencing, which was 57 performed by using Pacbio sequencing (Pacific Biosciences, Menlo Park, CA, USA), 58 454 sequencing (Roche, Branford, CT, USA) and Illumina sequencing (Illumina, San 59 Diego, CA, USA). Sequence runs for four single-molecule real-time (SMRT) cells 60 were performed on the PacBio RS II sequencer with a 120-minute movie time/SMRT 61 cell. SMRT Analysis portal version 2.1 was used for read filtering and adapter 62 trimming, with default parameters, and postfiltered data of 350 - 580 Mb (around 40 - 63 60X coverage) on each cell/per strain with an average read length of 7 kb were 64 considered for further assembly. All the four genomes were first de novo assembled 65 using HGAP assembly protocol, which is available with the SMRT Analysis packages 66 and accessed through the SMRT Analysis Portal version 2.1. After this first round, 67 PBJelly V14.1.14 was used to fill and reduce as many captured gaps as possible to 68 produce upgraded draft genomes (English et al 2012). As B. seminlais genomes are 69 much bigger than that of the normal bacteria, around 50 scaffolds were generated 70 after this step. Then quality filtered Illumina and 454 sequencing reads were then used 71 to correct the false SNPs and Indels due to the low coverage in some regions. Also, 72 these reads were used enabling gap closure on the pre-assembled genomes by using 73 WGS-assembler and SSPACE (Boetzer et al 2011, Myers et al 2000). Finally, the 74 consensus was obtained based on the above procedure. If it was not complete 75 sequence, scaffolding and gap closure were repeated again until we get the almost 76 complete bacterial genome sequences. 77 Coding DNA Sequences (CDSs) were predicted using Prodigal version 2.6 with 78 default parameters (Hyatt et al 2010). To refine the accuracy, RNA-Seq results were 79 also used for improvement of gene prediction. Gene functions were automatically 80 assigned by RAST annotation engine (Aziz et al 2008) Predicted genes were 81 compared via Blastn against the genomic sequences to verify the accuracy. rRNA 82 operons and tRNA were predicted by RNAmmer and tRNAscan-SE (Lagesen et al 83 2007, Lowe and Eddy 1997), while additional analysis was carried out by using 84 NCBI’s uniprot database (http://www.ncbi.nlm.nih.gov/), COG (Tatusov et al 2001) 85 KEGG (Ogata et al 1999) and GO terms (Ashburner et al 2000). 86 Variant calling 3 87 Paired-end reads generated from Illumina sequencing were mapped onto genome 88 sequence by using Burrows–Wheeler Alignment (Li and Durbin 2009). Default 89 settings were used except the maximum edit distance was set to 0.02 (-n 0.02). 90 MarkDuplicates command in Picard (http://picard.sourceforge.net/) was used to 91 remove the reads that mapped to the same positions in strain DSM 23518 genome 92 (PCR duplications). After IndelRealigner and BaseRecalibrator, SNPs and Indels 93 were called using GATK (Gac et al 2013, Tenaillon et al 2012). Default settings were 94 used except the maximum read depth in GATK was set to 500 (-dcov 500). The 95 generated SNPs and Indels were then filtered using custom Perl scripts to minimize 96 the false positive mutation calls. First, mutations with a total read depth below 20X 97 were discarded. Second, SNPs and Indels with a Phred quality score below 30 were 98 removed. Third, the mutation calls were only kept when at least 80% of the reads was 99 positive. The lists of SNPs/Indels were then annotated by in-house Perl scripts. For 100 the mutations that happened in the coding regions, PROVEAN was used to predict 101 whether a protein sequence variation is deleterious or neural (Chieng et al 2012). 102 Phylogenetic and comparative genome analysis 103 The sequences from four whole genome sequenced strains were aligned and 104 visualized by using Murasaki software (Popendorf et al 2010). For genome-based 105 phylogeny, in addition to the four B. seminalis genomes that sequenced in this study, 106 28 complete Burkholderia genome sequences were obtained from Burkholderia 107 Genome Database (Winsor et al 2008). Furthermore, a well-resolved phylogenetic 108 tree were also generated based on the multi-locus sequence analysis (MLSA) of the 109 atpD, gltB, gyrB, lepA, phaC, recA and trpB genes, which has been widely applied in 110 identification and discrimination of the Burkholderia species (Spilker et al 2009). The 111 identity of strains was confirmed by calculating whole-genome average nucleotide 112 identity (ANI) based on Blast and MUMer algorithm by using JSpecies (Richter and 113 Rosselló-Móra 2009). Multiple sequence alignment was done by using Muscle 3.8 114 (Edgar 2004) and ML tree was generated by MEGA 6 (Tamura et al 2013). In addition, 115 GIs were detected by applying IslandViewer which integrated with mostly used GI 116 detection algorithem IslandPick, SIGI-HMM and IslandPath-DIMOB (Langille and 117 Brinkman 2009). 118 DNA methylation analysis pipeline 4 119 SMRT generated data was analyzed with RS_Modification and Motif Analysis 120 pipeline in SMRT analysis 2.2, which was provided by Pacific Biosciences SMRT 121 portal with default parameters. In this default parameters, coverage and IPD 122 (inter-pulse duration) ratio were calculated by dividing a methylated base in the DNA 123 template to an incorporation opposite of a canonical base (Lluch-Senar et al 2013). 124 All the data sets contain kinetic values for each reference position and DNA strand 125 with the corresponding sequences generated from assembly procedure. For statistical 126 analysis, methylation site positions were divided into three parts (up-stream 200 bp 127 coding region, coding region and down-stream 200 bp coding region). For every gene, 128 top methylated strain was then selected out for further analysis. 129 Growth conditions for RNA-Seq analysis 130 In order to simulate the original niche environments of four B. seminalis strains, 2 mL 131 of overnight cultured bacteria were inoculated into 50 mL of the following four types 132 of media. Water medium that contains M9 minimal salts (0.6% Na2PO4 + 0.3% 133 KH2PO4 + 0.05% NaCl + 0.1% NH4Cl + 0.02% MgSO4 + 0.015% CaCl2) with 3% 134 glycerol, was used for simulation of the water environment (Schell et al 2011). CF 135 medium was prepared according to the method of (Dinesh 2010). Briefly, 5.0 g/L 136 mucin from pig stomach mucosa (Sangon Biotech), 4.0 g/L low molecular-weight 137 salmon sperm DNA (Fluka), 5.9 mg diethylenetriaminepentaacetic acid (DTPA) 138 (Sigma), 5.0 g/L NaCl (Sigma), 2.2 g/L KCl (Sigma), 1.8 g/L Tris base (Sigma), were 139 mixed together autoclaved and 5.0 mL/L egg yolk emulsion (Oxoid), 5.0 g/L 140 casamino acids (Sangon Biotech) were added when temperature reached to 37°C after 141 autoclaving. Soil extract medium was prepared to mimic soil conditions based on 142 recent paper (Yoder-Himes et al 2009) with the exception that soil was collected from 143 the rice rhizosphere, which was the original niche for strain R456. Plant condition to 144 obtain in vivo bacteria was prepared according to the method of our recent paper (Li 145 et al 2014). 146 Total RNA harvesting 147 Each bacterial strain was incubated under its condition to stationary phase. After 148 centrifugation of 4500 g at 4°C, pellets were re-suspended in 3 mL of PBS. One 5 149 milliliter of bacterial culture was subjected to RNA purification by RNeasy Mini Kit 150 (Qiagen) and eluted in 50 µl of RNase-free water. Samples were treated with DNaseI 151 to remove any residual DNA and purified by phenol-chloroform-isoamyl alcohol 152 extraction and ethanol precipitation. 153 mRNA purification and cDNA synthesis 154 Ten micrograms from each total RNA sample was treated with the MICROBExpress 155 Bacterial mRNA Enrichment kit (Ambion) and RiboMinus™ Transcriptome Isolation 156 Kit (Bacteria) (Invitrogen) following the manufacturer’s instructions. Samples were 157 resuspended in 15 μL of RNase-free water. Bacterial mRNAs were chemically 158 fragmented to the size range of 200-250 bp using 1 × fragmentation solution (Ambion) 159 for 2.5 min at 94°C. cDNA was generated according to instructions given in 160 SuperScript Double-Stranded cDNA Synthesis Kit (Invitrogen). Briefly, each mRNA 161 sample was mixed with 100 pmol of random hexamers, incubated at 65°C for 5 min, 162 chilled on ice, mixed with 4 μL of First-Strand Reaction Buffer (Invitrogen), 2 μL of 163 0.1 M DTT, 1 μL of 10 mM RNase-free dNTPmix, 1 μL of SuperScript III reverse 164 transcriptase (Invitrogen), and incubated at 50°C for 1 h. To generate the second 165 strand, the following Invitrogen reagents were added: 51.5 μL of RNase-free water, 166 20 μL of second-strand reaction buffer, 2.5 μL of 10 mM RNase-free dNTP mix, 50 U 167 E. coli DNA Polymerase, 5 U E. coli RNase H, and incubated at 16 °C for 2.5 h. 168 RNA Sequencing 169 The Illumina Paired End Sample Prep kit was used for RNA-Seq library creation 170 according to the manufacturer’s instructions as follows: Fragmented cDNA was 171 end-repaired, ligated to Illumina adaptors, and amplified by 18 cycles of PCR. 172 Paired-end 100-bp reads were generated by high-throughput sequencing with the 173 Illumina Hiseq2000 Genome Analyzer instrument. 174 RNA-Seq data analysis 175 After removing the low quality reads and adaptors, RNA-Seq reads were aligned to 176 the corresponding B. seminalis genome using TopHat 2.0.7 (Trapnell et al 2009), 177 allowing for a maximum of two mismatches. If reads mapped to more than one 178 location, only the one showing the highest score was kept. Reads mapping to rRNA 179 and tRNA regions were removed from further analysis. After getting the reads number 6 180 from every sample, edgeR with TMM normalization method was used to determine 181 the DEGs. Significantly differentially expressed genes (FDR value < 0.05 and at least 182 two fold changes) were selected for further analysis. Cluster 3.0 and Treeview 1.1.6 183 were used to generate the heatmap cluster based on the RPKM values (de Hoon et al 184 2004, Saldanha 2004). 185 COG enrichment analysis 186 All the DEGs between different strains or conditions will be classified by COG 187 category (Tatusov et al 2001). Based on the whole-genome COG classification, the 188 significance of COG category about DEGs under the same COG category will be 189 tested based on the Hypergeometric Distribution, p n 190 N M n i M i i x N n 191 In which, N means the number of genes in the genome, M means the number of genes 192 assigned to one COG category in the whole genome, n means the number of DEGs 193 and I means the number of genes fill into one COG category in DEGs. The results 194 were shown on Table S4. 195 Validation of mix sample method 196 Each sample was derived from a pool of five biological replicates, which has been 197 developed to increase the efficiency and cost-effectiveness with equivalent statistical 198 power (Greenwald et al 2012, Peng et al 2003). To validate the accuracy of 199 mix-sample method, single biological RNA sample from SE of strain DSM 23518 200 were prepared. Correlation coefficient between samples was determined by statistical 201 analysis. 202 Quantitative real-time PCR 203 Total RNAs were extracted from exponentially growing cells, using an RNeasy Mini 204 spin columns Kit (Qiagen) and was treated with a unit of RNase-free DNase I 7 205 (Qiagen), and cDNA synthesis was performed with a Moloney murine leukemia virus 206 reverse transcriptase first-strand cDNA synthesis kit (QIAGEN). The cDNA was then 207 used directly as the template for qRT-PCR using a SYBER Green master mix (Protech 208 Technology Enterprise Co., Ltd.) on an ABI Prism 7000 sequence detection system 209 (Applied Biosystems). Primers for quantitative real-time PCR (qRT-PCR) of the 210 selected genes were designed by using Primer 3 based on the genome sequences 211 (Untergasser et al 2012). All these primers are listed in Table S3 and an annealing 212 temperature of 58ºC was used for all the primers. Short-chain dehydrogenase 213 (BCAL2694), which has been proved to be stably expressed in Bcc, was used as 214 internal control (Van Acker et al 2013). Fold changes were calculated according to the 215 delta-delta CT method and the values were also shown on Table S3. The correlation 216 between RNA-Seq results and qRT-PCR results were tested by Pearson's correlation 217 method. 218 219 Supplementary references 220 221 Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM et al (2000). Gene Ontology: tool for the unification of biology. Nat Genet 25: 25-29. 222 223 Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA et al (2008). The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9: 75. 224 225 226 Bernier SP, Silo-Suh L, Woods DE, Ohman DE, Sokol PA (2003). Comparative analysis of plant and animal models for characterization of Burkholderia cepacia virulence. Infect Immun 71: 5306-5313. 227 228 Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W (2011). Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27: 578-579. 229 230 Chieng S, Carreto L, Nathan S (2012). Burkholderia pseudomallei transcriptional adaptation in macrophages. BMC Genomics 13: 328. 231 232 de Hoon MJL, Imoto S, Nolan J, Miyano S (2004). Open source clustering software. Bioinformatics 20: 1453-1454. 233 234 Dinesh SD (2010). Artificial doi:10.1038/protex.2010.212. Sputum 8 Medium. Protocol Exchange 235 236 Edgar RC (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792-1797. 237 238 239 English AC, Richards S, Han Y, Wang M, Vee V, Qu J et al (2012). Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One 7: e47768. 240 241 Fang Y, Li B, Wang F, Liu B, Wu Z, Su T et al (2009). Bacterial fruit rot of apricot caused by Burkholderia cepacia in China. Plant Pathol J 25: 429-432. 242 243 244 Fang Y, Xie G, Lou M, Li B, Muhammad I (2011). Diversity analysis of Burkholderia cepacia complex in the water bodies of West Lake, Hangzhou, China. The Journal of Microbiology 49: 309-314. 245 246 247 Gac M, Cooper TF, Cruveiller S, Médigue C, Schneider D (2013). Evolutionary history and genetic parallelism affect correlated responses to evolution. Mol Ecol 22: 3292-3303. 248 249 250 Greenwald JW, Greenwald CJ, Philmus BJ, Begley TP, Gross DC (2012). RNA-seq analysis reveals that an ECF σ Factor, AcsS, regulates achromobactin biosynthesis in Pseudomonas syringae pv. syringae B728a. PLoS One 7: e34804. 251 252 253 Hyatt D, Chen GL, LoCascio PF, Land ML, Larimer FW, Hauser LJ (2010). Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11. 254 255 256 257 Ibrahim M, Tang Q, Shi Y, Almoneafy A, Fang Y, Xu L et al (2012). Diversity of potential pathogenicity and biofilm formation among Burkholderia cepacia complex water, clinical, and agricultural isolates in China. World J Microb Biot 28: 2113-2123. 258 259 260 Lagesen K, Hallin P, Rødland EA, Stærfeldt HH, Rognes T, Ussery DW (2007). RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35: 3100-3108. 261 262 263 Langille MGI, Brinkman FSL (2009). IslandViewer: an integrated interface for computational identification and visualization of genomic islands. Bioinformatics 25: 664-665. 264 265 266 267 Li B, Liu BP, Yu RR, Lou MM, Wang YL, Xie GL et al (2011). Phenotypic and molecular characterization of rhizobacterium Burkholderia sp. strain R456 antagonistic to Rhizoctonia solani, sheath blight of rice. World J Microb Biot 27: 2305-2313. 268 269 270 Li B, Ibrahim M, Ge M, Cui Z, Sun G, Xu F et al (2014). Transcriptome analysis of Acidovorax avenae subsp. avenae cultivated in vivo and co-culture with Burkholderia seminalis. Sci Rep 4. 9 271 272 Li H, Durbin R (2009). Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25: 1754-1760. 273 274 275 Lluch-Senar M, Luong K, Lloréns-Rico V, Delgado J, Fang G, Spittle K et al (2013). Comprehensive methylome characterization of Mycoplasma genitalium and Mycoplasma pneumoniae at single-base resolution. PLoS Genetics 9: e1003191. 276 277 Lowe TM, Eddy SR (1997). tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25: 955-964. 278 279 Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ et al (2000). A whole-genome assembly of Drosophila. Science 287: 2196-2204. 280 281 Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M (1999). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 27: 29-34. 282 283 284 Peng X, Wood CL, Blalock EM, Chen KC, Landfield PW, Stromberg AJ (2003). Statistical implications of pooling RNA samples for microarray experiments. BMC Bioinformatics 4: 26. 285 286 287 Popendorf K, Tsuyoshi H, Osana Y, Sakakibara Y (2010). Murasaki: A Fast, Parallelizable Algorithm to Find Anchors from Multiple Genomes. PLoS One 5: e12651. 288 289 Richter M, Rosselló-Móra R (2009). Shifting the genomic gold standard for the prokaryotic species definition. Proc Natl Acad Sci U S A 106: 19126-19131. 290 291 Saldanha AJ (2004). Java Treeview—extensible visualization of microarray data. Bioinformatics 20: 3246-3248. 292 293 294 Schell MA, Zhao P, Wells L (2011). Outer Membrane Proteome of Burkholderia pseudomallei and Burkholderia mallei From Diverse Growth Conditions. J Proteome Res 10: 2417-2424. 295 296 297 Spilker T, Baldwin A, Bumford A, Dowson CG, Mahenthiralingam E, LiPuma JJ (2009). Expanded multilocus sequence typing for Burkholderia species. J Clin Microbiol 47: 2607-2610. 298 299 Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013). MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30: 2725-2729. 300 301 302 Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS et al (2001). The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 29: 22-28. 303 304 Tenaillon O, Rodríguez-Verdugo A, Gaut RL, McDonald P, Bennett AF, Long AD et al (2012). The molecular diversity of adaptive convergence. Science 335: 457-461. 10 305 306 Trapnell C, Pachter L, Salzberg SL (2009). TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25: 1105-1111. 307 308 Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M et al (2012). Primer3—new capabilities and interfaces. Nucleic acids research 40: e115-e115. 309 310 311 Van Acker H, Sass A, Bazzini S, De Roy K, Udine C, Messiaen T et al (2013). Biofilm-grown Burkholderia cepacia complex cells survive antibiotic treatment by avoiding production of reactive oxygen species. PLoS ONE 8: e58943. 312 313 314 315 316 Vanlaere E, LiPuma JJ, Baldwin A, Henry D, De Brandt E, Mahenthiralingam E et al (2008). Burkholderia latens sp. nov., Burkholderia diffusa sp. nov., Burkholderia arboris sp. nov., Burkholderia seminalis sp. nov. and Burkholderia metallica sp. nov., novel species within the Burkholderia cepacia complex. Int J Syst Evol Microbiol 58: 1580-1590. 317 318 319 Winsor GL, Khaira B, Van Rossum T, Lo R, Whiteside MD, Brinkman FSL (2008). The Burkholderia Genome Database: facilitating flexible queries and comparative analyses. Bioinformatics 24: 2803-2804. 320 321 322 Yoder-Himes D, Chain P, Zhu Y, Wurtzel O, Rubin E, Tiedje JM et al (2009). Mapping the Burkholderia cenocepacia niche response via high-throughput sequencing. Proc Natl Acad Sci U S A 106: 3976-3981. 323 324 325 326 327 328 329 330 331 332 333 334 335 11 336 Supplementary Figure and Table Legends 337 Figure S1: Distribution of differentially expressed genes along the chromosome. 338 Grey thick circles sorted by strain 0901 from inner to outer represent strains 0901, 339 DSM 23518, R456 and S9 chromosomes, respectively. The red, green, blue and black 340 peaks outside the chromosome represent the log2 RPKM values of genes under CF, 341 apricot, soil and water conditions. Outside the black peak (water RPKM value) is the 342 heatmap of genes density every 10 kb along the chromosome from blue to red. 343 Figure S2: Full genome alignment among the four strains 0901, DSM 23518, R456 344 and S9 of Burkholderia seminalis. 345 Figure S3: Expression pattern cluster based on the normalized RPKM values. The 346 cluster of RNA-Seq samples based on the log2 RPKM values. 347 Figure S4: The histogram of the number of DNA methylation in Burkholderia 348 seminalis strains 0901, S9, R456 and DSM 23518. 349 Figure S5: Phylogenetic relationship of four Burkholderia seminalis strains to other 350 species of Burkholderia. (a) Maximum-likelihood tree was constructed by using 351 MLSA from four sequenced B. seminalis strains in this study and other 28 352 Burkholderia strains. Among these strains, B. seminalis DSM 23518 (= LMG 24067), 353 B. lata 383, B. thailandensis E264, B. mallei ATCC 23344, B. phymatum STM815, B. 354 phytofirmans PsJN and B. xenovorans LB400 are type strains. (b) Maximum 355 likelihood tree was constructed based on whole genome sequences. Among these 356 strains, the type strains are the same as that of (a). 357 Figure S6: Correlation coefficient between SE-single sample and SE-mix sample of 358 strain DSM 23518 based on the log2 RPKM values. 12 359 Figure S7: Correlation coefficient between SE-mix sample and W-mix sample of 360 strain DSM 23518 based on the log2 RPKM values. 361 362 Table S1: Physiological characteristics of Burkholderia seminalis strains 0901, S9, 363 R456 and DSM 23518. 364 Table S2: Comparison of general genomic features between Burkholderia seminalis 365 strains 0901, DSM 23518, R456 and S9. 366 Table S3: Summary of RNA-Seq results (Illumina HiSeq 2000). 367 Table S4: Integrated information of Burkholderia seminalis strains 0901, S9, R456 368 and DSM 23518. 369 Table S5: Average Nucleotide Identity (ANI) among the Burkholderia seminalis 370 genomes and the selected Burkholderia cenocepacia genomes. 371 Table S6: COG enrichment results from DEGs. a), strain 0901; b), strain DSM 23518; 372 c), strain S9; d), strain R456. 373 Table S7: Gene clusters involved in niche adaptation. 374 Table S8: (a): Primers of qRT-PCR used in this study. (b): Internal primer used in 375 qRT-PCR and its RPKM values in different strains and conditions. 376 13