Supplementary Material (doc 68K)

SUPPLEMENTARY METHODS Cultures, DNA extraction and genome size determination. S. ruber M8 (from the authors’ private collection) was grown in culture as previously described (Peña et al., 2005). Cells were harvested at the end of the exponential phase. High molecular weight DNA was extracted, and its size determined by pulsed field gel electrophoresis as described in López-García et al. (1996). S. ruber genome sequences. The genome of S. ruber strain M8 was sequenced by the random shotgun method of three different libraries. For two of them, genomic DNA was fragmented by mechanical shearing and 3 kb (A) and 10kb (B) inserts were respectively cloned onto pNAV (pcdna2.1 derived) and pCNS (pSU18 derived) plasmid vectors. In addition, a large insert (25 kb (C)) BAC library was constructed from SAU3A partial digest and cloning onto pBeloBAC11. Vector DNAs were purified and end-sequenced (43912 (A), 14527 (B), 5900 (C)) using dye-terminator chemistry on ABI3730 sequencers. To reduce assembly problems due to repeated sequences, the assembly was performed as described by Vallenet et al. (2008) with Phred/Phrap/Consed software package (www.phrap.com). Additional 849 sequences were needed for the finishing phase. As a result, a fully closed circular chromosome of 3,619,447 bp (66.1% GC) and for plasmids of 84,340 bp, 61,373 bp, 56,529 bp, and 11,229 bp were obtained. The S. ruber DSM 1385 chromosome (CP000159) and pSR35 plasmid (CP000160) sequences were obtained from the NCBI GenBank database. Gene prediction and annotation of the M8 genome sequence. Potential protein coding open reading frames (ORFs) were identified with mORFind v2 (Waldmann and Teeling, unpublished), a meta-genefinder that combines three different gene prediction programs Critica (Badger and Olsen 1999), Glimmer (Delcher et al., 1999), and Zcurve (Guo et al., 2003) - with additional evidences for coding, such as the prediction of signal peptides, transmembrane helices, and ribosomal binding site identification. The genome sequence was annotated in the multi-user, web-based annotation system GenDB v2.2 (Meyer et al., 2003), which comprises a variety of bioinformatic tools for similarity searches at the DNA and amino acid level against sequence databases (NCBI nt, NCBI nr, SwissProt, KEGG) and protein family databases (Pfam, Prosite, InterPro, COG). tRNA genes were identified with tRNAScan-SE (Lowe and Eddy 1997). Transmembrane regions and signal peptides were identified with TMHMM (Krogh et al., 2001) and SignalP 3.0 (Bendtsen et al., 2004), respectively. Cumulative GC skew (GC/G+C) and GC content were computed with a custom Perl script, using a sliding window of 1,500 bp. Chromosome-wide fluctuations of DNA curvature, bending and stacking energy were computed with banana and btwisted from the EMBOSS software suite (Rice et al. 2000) and visualized with GeneWiz (Hallin et al., 2004). The positions of the genes were visualized with GenomePlot (Gibson and Smith 2003). Intrachromosomal fluctuations in tetranucleotide composition were computed and visualized with TETRA (tetranucleotide usage patterns in DNA sequences) (Teeling et al., 2004) for intervals of 5 kb. Codon usage analysis was conducted with CIAJava (Carbone et al., 2002), using the full set of predicted genes and 15 iterations, as well as with codonw (Peden 1999), which was first trained with a high-quality set of genes that excluded genes smaller than 300 bp and those coding for hypothetical proteins, phage proteins, transposases and integrases. The results were then used to assess the codon adaptation index (CAI), frequency of optimized codons (Fop), effective number of codons (NC), and codon bias index (CBI) for the complete set of chromosomal genes. Trends in codon usage were also analyzed with CoBias (McHardy et al., in preparation). Whole-genome comparisons. The nucleotide sequence from the S. ruber M8 genome was divided into consecutive fragments of 1,020 nt (Goris et al., 2007). These fragments were used as queries for a BlastN search (Altschul et al., 1997) against the complete genome sequence of S. ruber M31. The percentage of DNA conservation between the two strains was computed by summation of the lengths of the regions that aligned with a sequence identity above 90% and subsequent division of this number by the total length of the query genome. The Average Nucleotide Identity (ANI) between the two strains was computed as the mean identity of all BlastN matches with more than 30% sequence identity over a region of at least 70% of the query size (740 nt). The S. ruber M31 and M8 genomes were aligned with MUMmer (Kurtz et al., 2004), using amino acid sequences for the predicted proteins, and nucleotide sequence for the whole genome, and plotted with M-GCAT (Treangen and Messeguer 2006). M8 was always used as the query. Phylome reconstruction. We reconstructed a complete set of phylogenetic trees, also known as phylome, for the two S. ruber genomes. Prior to the analysis, a database (SaliniDB) was created containing the proteins encoded in 426 fully-sequenced bacterial and archaeal genomes obtained from the EBI (www.ebi.ac.uk/integr8/) as of January 2007. In addition, proteomes from H. walsbyi (NC_008212, NCBI), Zobellia galactanovorans (Bielefield), Flavobacteria bacterium BBFL7, Flavobacteriales bacterium HTCC2170, Flavobacterium sp, Photobacterium sp (obtained from the Gordon and Betty Moore Foundation), S. ruber strain M31 (Mongodin et al., 2005) and strain M8 were stored in SaliniDB. For each protein encoded in one of the S. ruber genomes, a Smith-Waterman search (Smith and Waterman 1981) was performed against all other sequences in the SaliniDB database (E-value cutoff < E-3). Sequences that aligned with a continuous region longer than 50% with respect to the query sequence were selected and aligned using MUSCLE 3.6 (Edgar 2004) with default parameters. Columns in the alignment with more than 10% gaps were removed, unless this procedure removed more than one third of the positions in the alignment. In such cases, the percentage of sequences with gaps allowed was automatically increased until at least two thirds of the initial positions were retained. Phylogenetic trees were derived from the resulting alignments using several methods, including Neighbor Joining (NJ) trees derived from scoredist distances as implemented in BioNJ (Gascuel 1997) and Maximum Likelihood (ML) trees as implemented in PhyML v2.4.4 (Guindon and Gascuel 2003) assuming a discrete gamma-distribution model with four rate categories and invariant sites, where the gamma shape parameter and the fraction of invariant sites were estimated from the data. In the ML analyses, three different evolutionary models were used (JTT, Blosum62 and RtREV). The model best fitting the data, as determined by direct comparison of the likelihoods, was used in further analyses. All trees and alignments have been deposited in PhylomeDB (Huerta-Cepas et al., 2008, www.phylomedb.org). Orthology determination. Orthologous and paralogous relationships among genes encoded in the two Salinibacter genomes were determined via a phylogenetic approach. Orthology predictions were generated for each sequence by mapping duplication and speciation events on its corresponding phylogenetic tree, as determined by a speciesoverlap algorithm (HuertaCepas et al., 2007). In order to identify the proportion of highly divergent orthologs shared by the two strains, their orthologous sequences were aligned with Needle from the EMBOSS package (Rice et al., 2000). An orthologous gene pair was considered to be ‘divergent’ when their amino acid sequence identity was below 90%. Rates of synonymous (dS) and non-synonymous (dN) substitutions. All identified orthologs were aligned with MUSCLE (v3.6, default parameters) (Edgar 2004), and dN and dS were subsequently calculated from the alignments using the Yang and Nielsen method (Yang and Nielsen 2000), as implemented in the program Yn00 of the PAML package (v3.15) (Yang 1997). Identification of strain- and species-specific genes. All Salinibacter sequences in the SaliniDB were compared to all other sequences within the database using BlastP (Altschult et al., 1997), with an E-value cutoff of E-20 in conjunction with the additional criterion of a minimum match length of 85% with respect to the query sequence. Genes present in the M8 and M31 genomes but absent from all other genomes were considered to be S. ruber species- specific. Genes present only in one of the two S. ruber strains but absent from all other genomes were considered to be strain-specific. All M8-specific genes were rechecked against the M31 genome, which resulted in a set of genes that had been overlooked in the course of the M31 genome annotation, and which were thus reclassified as S. ruber species-specific genes. All of the S. ruber species-specific genes were validated by BlastX searches against the Environmental Samples Database and nucleotide collection (nr/nt) in NCBI database (as in November 2008). Identification of genes putatively involved in Bacteria-Archaea interdomain LGT. We adopted a combined strategy to identify genes putatively involved in interdomain LGT events. - Phylogenetic analysis. Genes with a best BLAST hit to archaeal genes with E-values below E-20, a minimum sequence overlap of 85% with respect to the query sequence and whose phylogenetic tree in the phylome indicated monophyletic grouping with archaeal genes were considered to be interdomain LGT candidates. This procedure revealed 40 candidate genes. - Analysis of oligonucleotide frequencies of interdomain LGT candidate genes. Three different types of oligonucleotide analyses were performed on the previously selected interdomain LGT candidate genes. (I) Their relative di-, tri-, tetra-, and pentanucleotide frequencies were computed with TETRA (Teeling et al., 2000), using a TETRA-database with 572 bacterial and archaeal chromosome and plasmid DNA sequences with pre-computed di-, tri-, tetra-, and pentanucleotide frequencies. Individual all-against-all correlation matrices were computed for the di-, tri-, and tetranucleotide frequencies of all sequences in the database, clustered with the program neighbor from the PHYLIP package (Felsentsein 2004) and subsequently visualized as phylogenomic trees with the program ATV (Zmasek and Eddy 2001). Genes were considered as LGT candidates, when they clustered with the sequences of halophilic Archaea rather than the Salinibacter sp. chromosomal and plasmid sequences (data not shown). (II) For each candidate gene, the 30 chromosomes/plasmids with the best correlating oligonucleotide patterns were extracted individually for di-, tri-, and tetranucleotides with TETRA. Genes were considered as LGT candidates when good correlations were observed mainly to halophilic Archaea. (III) Self-organizing maps (SOMs) were constructed using the program SOMA (Weber and Teeling, unpublished), which implements Kohonen‘s SOM algorithm (Kohonen 1994, Self-Organizing Maps) for oligonucleotide frequencies as described by Abe et al. (2003). SOMs were constructed for normalized frequencies of di-, tri- and tetranculeotides of the chromosomes of S. ruber DSM 13855 (whose oligonucleotide characteristics are hardly distiguishable from the ones of S. ruber M8), plus the chromosomes of four halophilic Archaea: Halobacterium sp. NRC-1, Haloarcula marismortui ATCC 43049, Natronomonas pharaonis DSM 2160 and H. walsbyi. In order for the three SOMs to reflect the intragenomic oligonucleotide fluctuation, each of five chromosomes was split into non-overlapping fragments of 30 kb (484 fragments in total). After computation of the SOMs, the fragments were mapped on them to identify the regions that represent each species. Then, the 40 interdomain LGT candidates were mapped on the SOMs. Genes were considered as LGT candidates when they were mapped on archaeal nodes by at least two of the three SOMs. Functional class analyses. COG functional classes significantly associated with different ranges of CAI and dN/dS values were detected by using the program performed FatiScan (AlShahrour et al., 2007), which is implemented as a part of the babelomics suite (Al-Shahrour et al., 2006). In brief, the FatiScan program performs a segmentation test which checks for asymmetrical distributions of biological labels associated to genes ranked in a list. In our case, Salinibacter genes were ranked from greater to smaller attending to their CAI or dN/dS values. The test was performed by using COG classes as functional labels and a two-tailed Fisher test with a partition size of 30. Significance threshold was set to p-value<0.05. RT-PCR of putative LTG. RNA from cultures of S. ruber strains M8 and M31 was extracted along the growth curve. Cells were collected by centrifugation of 10 ml of culture at 16.000 xg, during 15 min at 4ºC. Pellets were washed with SW25% and total nucleic acids were obtained by lysis with sodium dodecyl sulfate (SDS)- proteinase K and treatment with cetyltrimethyl ammonium bromide (CTAB), as described by Wilson (1987), followed by three successive extractions with phenol-chloroform-isoamyl alcohol (25:24:1). Finally, nucleic acids were precipitated with isopropanol, washed with ethanol 70% and resuspended in 100 l of sterile deionized water. DNA was digested using DNA-freeTM DNase (Ambion). RNA was reverse transcribed using the Super-Script III first-strand synthesis system (Invitrogen) following the manufacturer’s recommendations, and cDNA was used for PCR amplification using specific primers that amplify putative LTG (Supplementary table 2). All the PCR reactions included controls for absence of DNA in the RNA prior to reverse transcription. Transposase analysis. In order to assess whether annotated transposases were likely part of insertion elements or transposons (http://www-is.biotoul.fr/, Chandler et al., 2008), we searched for terminal inverted repeats by blasting the S. ruber M8 intergenic nucleotide sequences against themselves. Regions were considered as putative transposable elements, if they harbored at least one transposase, were flanked by inverted repeats of at least 11 bp and did not exceed 20 kb. Direct repeats, which typically are formed by transposition events, were searched manually at the extremes adjoining the inverted repeats. Metabolomic analysis. High-resolution mass spectra were acquired on a Bruker (Bruker Daltonics, Bremen, Germany) APEX Qe Fourier transform ion cyclotron resonance mass spectrometer equipped with a 12 Tesla superconducting magnet and an Apollo II ESI source in the negative ion mode. Mass spectra were acquired in broadband mode and with a time domain size of 1 MWord with a mass range of ca. 150-2,000 m/z. A single sine apodization was performed before Fourier transformation of the time domain transient with a processing size of 2 MWord. Cytoplasmic, extracellular fractions and pellet extracts were desalted using solid phase extraction on C18-columns with methanol elution as described in Rosselló-Mora et al. (2008). Spectra were externally calibrated on clusters of arginine (10 mg/l in methanol) and internally on fatty acids resulting in calibration errors in the relevant mass range <550 m/z below 100 ppb needed for their conversion into CHONS elementary compositions (RosselloMora et al., 2008) and annotation in the metabolic database (www.masstrix.org) (Sushre et al., 2008) and Japanese metabolome database (www.metabolome.jp) based on their exact masses. Phage susceptibility. Two different natural samples: SP-COCB (23,2% salinity) and SP-CR (34,2% salinity) from Bras del Port saltern (Santa Pola, Alicante, Spain); SCMCO (24% salinity) were used as source of virus for infection. The virus suspensions were prepared by centrifugation of 1ml of natural sample at 13,000 rpm for 30 min at RT in Biofuge pico Heraeus and filtration of the supernantant through 0.22 m size pore Millipore filters. Cultures of S. ruber strains M8 and M31 were simultaneously inoculated in standard growth medium (SW25%: 25% w/v total salts and 0.2% w/v yeast extract, Peña et al., 2005) and grown until OD600 nm reached 0.3-0.5. For each culture, the relation between OD and UFC/ml was obtained using a calibration curve. 107 UFC from each culture and 0.1 ml of the virus suspensions were mixed and incubated at RT for one hour. Then, 4 ml of top agar (SW25% plus 0.7% agar), kept at 50ºC in a water bath, were added to this mixture, immediately poured onto pre-warmed plates (15 ml of SW25% plus 2% agar) and incubated at 37°C, during 10-12 days. Competition experiments between M8 and. Two experiments were carried out to assess competition between the strains M8 and M31. At first, cultures were grown separately with standard growth medium (25% w/v total salts and 0.2% w/v yeast extract) and with NaClsaturated medium with 0.2% w/v yeast extract, at 37º C with shaken. Then, aliquots containing equal amounts of cells from each strain were mixed (as determined via DAPI counts). OD600 nm were measured along the co-culture curve and cells were also fixed with formaldehyde for DAPI counts (7% v/v final concentration) and collected by centrifugation (10 ml of culture, centrifuged at 16,000 g, 15 min at 4 ºC) for DNA extraction as described above, and stored at 70 ºC until use. Quantitative PCR assays were performed in 96-well plates in an ABI PRISM 7000 Sequence Detection System (Applied Biosystems, Foster City, CA, USA) in a total reaction volume of 20 l per well. Each reaction mix contained 10 l SYBR Green master mix (Applied Biosystems), with 0,2 mM of each primer (M8-F: 5’ ACA TGA GTG ACC TCC AAG AC 3'; M8-R: 5' CGT GTT GAC GTG GTT ATT C 3'. M31-F: 5’ ACG ACA GGA ATG ATG AGA AC 3’; M31-R: 5’ GTC ACG TTG ACA AGG AGA TT 3’), several dilutions of the DNAs and sterile Milli-Q water (Millipore, Billercia, MA, USA). The thermal cycling conditions comprised a hot start denaturation at 95 ºC for 10 min, followed by 40 cycles of denaturation at 95 ºC for 15 s, annealing at 58 ºC for 1 min, and extension at 72 ºC for 15 s. All experiments were performed in triplicates. For quantification, cell-based calibration curves for M8 and M31 (constructed by preparing 10-fold serial dilutions of each DNA) were included in each reaction. Cells in every culture were measured by DAPI counting as described previously (Antón et al., 1999). The error for the results was calculated from the mean to two duplicate experiments. References: 1. Abe T, Kanaya S, Kinouchi M, Ichiba Y, Kozuki T, Ikemura T. (2003). Informatics for Unveiling Hidden Genome Signatures. Genome Res. 13:693-702. 2. Al-Shahrour F, Arbiza L, Dopazo H, Huerta-Cepas J, Mínguez P, Montanter D, et al., (2007). From genes to functional classes in the study of biological systems. BMC Bioinformatics. 8:114. 3. Al-Shahrour F, Mínguez P, Vaquerizas JM, Conde L, Dopazo J. (2006). BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments. Nucl Acids Res. 34:W472-W476. 4. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl Acids Res. 25:3389-3402. 5. Antón J, Llobet-Brossa E, Rodríguez-Valera F, Amann R. (1999). Fluorescence in situ hybridization analysis of the prokaryotic community inhabiting crystallizer ponds. Environ Microbiol. 1:517-23. 6. Badger J, Olsen G. (1999). CRITICA: coding region identification tool invoking comparative analysis. Mol Biol Evol. 16:512-524. 7. Bendtsen JD, Nielsen H, von Heijne G, Søren-Brunak S. (2004). Improved prediction of signal peptides: SignalP 3.0. J Mol Biol. 340:783–795. 8. Carbone A, Zinovyev A, Kepes F. (2003). Codon adaptation index as a measure of dominating codon bias. Bioinformatics. 19:2005-2015. 9. Chandler M, Siguier P, Mahillon J. (2008). Nomenclature for insertion elements and other forms. Microbe. 3 :445. 10. Delcher AL, Harmon D, Kasif S, White O, Salzberg SL. (1999). Improved microbial gene identification with GLIMMER. Nucl Acids Res. 27:4636-4641. 11. Edgar R. (2004). MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 5:113. 12. Felsenstein J. (2004). PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author, Department of Genome Sciences, University of Washington, Seattle. 13. Gascuel O. (1997). BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol. 14:685-695. 14. Gibson R, Smith DR. (2003). Genome visualization made fast and simple. Bioinformatics. 19:1449.1450. 15. Goris J, Konstantinidis KT, Klappenbach JA, Coeny T, Vandamme P, Tiedje JM. (2007). DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol. 57:81-91. 16. Guindon S, Gascuel O. (2003). A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology. 52:696-704. 17. Guo FB, Ou HY, Zhang CT. (2003). ZCURVE: a new system for recognizing proteincoding genes in bacterial and archaeal genomes. Nucl Acids Res. 31, 1780-1789. 18. Hallin PF, Binnewies TT, Ussery DW. (2004). Genome Update: Chromosome Atlases. Microbiology. 150:3091-3093. 19. Huerta-Cepas J, Bueno A, Dopazo J, Gabaldón T. (2008). PhylomeDB: a database for genome-wide collections of gene phylogenies. Nucl Acids Res.36:D491-496. 20. Huerta-Cepas J, Dopazo H, Dopazo J, Gabaldón T. (2007). The human phylome. Genome Biology. 8:R109. 21. Kohonen T. (1994) Self-Organization maps. Berlin, Heidelberg, New York, Barcelona, Hong Kong, London, Milan, Paris, Singapore, Tokyo: Springer, 3rd edition. 22. Krogh A, Larsson B, von Heijne G, Sonnhammer ELL. (2001). Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 305:567-580. 23. Kurtz S, Adam P, Delcher AL, Smoot M, Shumway M, Antonescu C et al. (2004). Versatile and open software for comparing large genomes. Genome Biol. 5:R12.1-12.9. 24. López-García P, Antón J, Amils R. (1996). Sizing chromosomes and megaplasmids in haloarchaea. Microbiology. 142:1423-1428. 25. Lowe T, Eddy S. (1997). tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucl Acids Res. 25:955-964. 26. Meyer F, Goesmann A, McHardy AC, Bartels D, Bekel T, Clausen J, et al. (2003). GenDB--an open source genome annotation system for prokaryote genomes. Nucl Acids Res. 31:2187-2195. 27. Mongodin EF, Nelson KE, Daugherty S, DeBoy RT, Wister J, Khouri H, et al. (2005). The genome of Salinibacter ruber: Convergence and gene exchange among hyperhalophilic bacteria and archaea. Proc Natl Acad Sci USA.102:18147–18152. 28. Peden J. (1999). Analysis of codon usage. PhD Thesis, Dept. of genetics. University of Nottingham. 29. Peña A, Valens-Vadell M, Santos F, Buczolits S, Antón J, Kämpfer P, et al. (2005). Intraspecific comparative analysis of the species Salinibacter ruber. Extremophiles. 9:151-161. 30. Rice P, Longden I, Bleasby A. (2000). EMBOSS: The european molecular biology open software suite. Trends in Genetics 16:276-277. 31. Rosselló-Mora R, Lucio M, Peña A, Brito-Echeverría J, López-López A, Valens-Vadell M, et al. (2008). Metabolic evidence for biogeographic isolation of the extremophilic bacterium Salinibacter ruber. ISME Journal 2:242-253. 32. Smith TF, Waterman MS. (1981). Identification of common molecular subsequences. J Mol Biol. 147:195-197. 33. Suhre K, Schmitt-Kopplin P. (2008). MassTRIX: mass translator into pathways. Nucl Acids Res. 36:W481-484. 34. Teeling H, Waldmann J, Lombardot T, Bauer M, Glockner FO. (2004). TETRA: a webservice and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics. 5:163. 35. Treangen T, Messeguer X. (2006). M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species. BMC Bioinformatics. 7:433. 36. Vallenet D, Nordmann P, Barbe V, Poiler L, Mangenot S, Bataille E, et al. (2008). Comparative Analysis of Acinetobacters: Three Genomes for Three Lifestyles. PLoS ONE. 3:e1805. 37. Wilson K. (1987). Preparation of genomic DNA from bacteria. Ausubel FM, Brent R, Kingston RE, Moore DD, Seidman JG, Smith JA, Struhl K (eds). Current protocols in molecular biology. Wiley-Interscience, New York.: 2. 38. Yang Z, Nielsen R. (2000). Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 17:32-43. 39. Yang Z. (1997). PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 13:555-556. 40. Zmasek CM, Eddy SR. (2001). ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics. 17:383-384.

Supplementary Material (doc 68K)

Related documents

Products

Support

Supplementary Material (doc 68K)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib