1 SUPPLEMENTARY INFORMATION 5 “Other environmental fosmids with similarity to H. walsbyi DSM 16790 genome outside of the GIs” 10 Fosmids with rearrangements Fosmids eHwalsbyi 063, 421, 503, 539 and 562 contain mainly H. walsbyi genes, with large rearrangements (Figures 1 and S1). Fosmid eHwalsbyi 063 (11651 pb) combines two distinct syntenic regions of H. walsbyi DSM 16790 genome, nucleotides 1947109-1952814 15 and 2496506-2501815. The recombination event probably occurred adjacent to genes purM and trpD (eHQ2665 and eHQ3169), both encode phosphorybosyl-related proteins with a 48% sequence similarity to each other. Also noteworthy is that the separate regions also shared a secreted CBS (Cystathionine Beta Synthase) domain protein (eHQ2664 and eHQ3172) that had 46% sequence similarity to each other. Fosmid eHwalsbyi 421 (15429 20 pb) contained a cluster of ABC-type nitrate-sulfonate-bicarbonate transport genes (tauABC). Compared to the DSM 16790 genome, there was a rearrangement in the region adjacent to two phosphorybosyl-related proteins, albeit different from those of eHwalsbyi 063 (NCAIR mutase, eHQ1978, and nicotinate phosphoribosyl transferase, eHQ1979). However, located two ORFs away there are two IS1341-type transposases that might be the most obvious 25 reason for the rearrangement. Fosmid eHwalsbyi 503 (11076 pb) contained a cluster of ABC-type antimicrobial peptide transport system genes (salXY) the synteny to the corresponding region in the DSM 16790 genome appears broken in the middle where four genes and two pseudogenes were missing in the fosmid. Fosmid eHwalsbyi 562 (25950 pb), is largely syntenic and the shared genes had ca. 98% sequence similarity with DSM 30 16790. In this genome there are four different dipeptide/oligopeptide nickel ABC transporter 2 dpp systems, two of which are adjacent in the region syntenic to this fosmid (Figure S2). However, in the fosmid one of the two dpp clusters was absent (Figure S2). On the other hand the fosmid had two IS elements not present in the strain genome. Different dpp genes are probably involved in the transport of different oligopeptide substrates. The lineage 35 represented by eHwalsbyi 562 could therefore have a different oligopeptide transport (Monnet, 2003) or environmental sensing specificity (Abouhamad et al., 1991). Fosmids containing genes not found in H. walsbyi DSM 16790 40 Many fosmids had syntenic regions with high sequence similarity (>98%) to sections of the DSM 16790 genome, and therefore indisputably originated from H. walsbyi lineages, and other ORFs with no significant similarity to any sequence in this genome (Figure S1, Table S1). The variable sections were typically enriched in transposable elements. Several ORFs 45 with %GC that differed considerably from expected H. walsbyi values were found and sometimes identified as affiliated with other known species or had no affiliation at all. Some examples of these fosmids are described below and in Figure S1 and Table S1. Fosmid eHwalsbyi 012 (11918 pb) contained phage-related genes, including five transposases or transposase-related genes, and one DNA-repair helicase (rad3), which was 50 99% identical to HQ1981. Fosmid eHwalsbyi 021 (37918 pb) had twenty-six ORFs, but only three of them were assigned to H. walsbyi (eHQ1890, 1889 and 1888) two hypothetical proteins and one Kef-type potassium transport system protein, with identities higher than 95.47%. Fosmid eHwalsbyi 023 (37445 pb) had largely conserved synteny with the DSM 16790 genome, but genes with significant similarity to other haloarchaea (rather than to H. 55 walsbyi DSM 16790) formed a few discontinuities. However, these genes have their nucleotide composition within the typical values for H. walsbyi. The non-syntenic section contained two genes that might be phage related (Table S1). Fosmid eHwalsbyi 035 (25956 pb) was again mostly syntenic and with high similarity to DSM 16790. The non-syntenic part 3 contained a methylase-modification system not present in the DSM 16790 genome and a 60 phage integrase (Table S1). In fosmid eHwalsbyi 332 (24026 pb), the large syntenic region to DSM 16790 also had sequence similarities between 72.5-100%. However, seven genes at one end are not found in this genome and three of them had significant sequence similarity to a gene cluster involved in sugar transport found in H. marismortui. Again, they all had the H. walsbyi low %GC signature. Fosmid eHwalsbyi 461 (18681 bp) has no significant 65 similarity to DSM 16790 genome except for a copy of the ubiquitous ISH4 element (eHQ3552) and a hypothetical protein (eHQ3213). However, several ORFs had the H. walsbyi %GC signature including a cluster of eight genes involved in sugar metabolism, which could be involved in cell envelope component synthesis or sugar decoration. Fosmid eHwalsbyi 464 (12763 pb) (see also section on GI 3 in Results) contains a liv gene cluster 70 (bound to a functional IS1341) interrupted between the livJ and livF subunits via insertion of three genes for consecutive reactions in nucleotide sugar metabolism (Figure 5). Two genes code for Glucose-1-phosphate thymidylytransferases (graD), key enzymes in prokaryotes (their substrates are the precursors of a large number of modified sugars) which transform Glucose-1P or dTTP-Glucose to dTDP-D-Glucose. The other gene inserted in this fosmid is 75 galE2, which encodes an UDP-glucose 4'-epimerase that transforms dTDP-D-Glucose to dTDP-D-Galactose. The GalE2 protein of fosmid eHwalsbyi 465 has 79% amino acid identity to one UDP-Glc 4-epimerase in Halobacterium and 64% to other GalE in the DSM 16790 genome (HQ2683). Two other GalE proteins (HQ3509 and HQ3510) together with other GraD (HQ3507) are also found within GI 4. As previously described, H. walsbyi contains 80 several galE homologs that are important in sulfolipid biosynthesis (Bolhuis et al., 2006). Sulfolipids have been found to replace phospholipids in response to phosphate limitation in a number of photosynthetic bacteria and plants. Therefore, the cluster could be involved in the synthesis of different glyco-sulfonolipid or different sugar residues that might be patchily distributed among H. walsbyi lineages. Fosmid eHwalsbyi 558 (38210 pb) had no synteny 85 with DSM 16790 and its adscription to H. walsbyi is only based on its GC content (average of 50.29%). However, sixteen of the ORFs had their best, albeit low, similarity to this genome 4 (the average identity is 47.4%). The other best hits were to other haloarchaea, mainly H. marismortui, Natromonas pharaonis and to eubacteria. The most relevant functional features in this fosmid are the presence of three transcriptional regulators of H. walsbyi (HQ2434, and 90 two copies of HQ1946) and a spermidine/putrescine ABC incomplete transport cluster most similar to H. marismortui (Table S1). This fosmid contained three adjacent IS1341-type transposases that were similar to some found in N. pharaonis (Table S1). 5 REFERENCES FOR SUPPLEMENTARY MATERIAL: 95 1. Abouhamad, W. N., Manson, M., Gibson, M. M. and Higgins, C. F. (1991). Peptide transport and chemotaxis in Escherichia coli and Salmonella typhimurium: characterization of the dipeptide permease (Dpp) and the dipeptide-binding protein. Mol Microbiol 5, 1035-47. 2. Bolhuis, H. H., Palm, P. P., Wende, A. A., Falb, M. M., Rampp, M. M., Rodriguez-Valera, F. F., 100 Pfeiffer, F. F. and Oesterhelt, D. D. (2006). The genome of the square archaeon Haloquadratum walsbyi: life at the limits of water activity. BMC Genomics 7, 169. 3. Monnet, V. (2003). Bacterial oligopeptide-binding proteins. Cell Mol Life Sci 60, 2100-14.