1 Supplementary text S1: 2 EvoMining of Streptomyces sviceus draft genome reveals an Enolase enzyme family member 3 recruited into a new phosphonate BGC 4 Enolase is a glycolytic enzyme that catalyzes the dehydration of 2-Phosphoglycerate (2-PGA) to 5 produce phosphoenolpyruvate (PEP) in a Mg++-dependent reaction. The enolase phylogeny (Tree 6 S2) has two main clades; a major clade that includes orthologs associated with central metabolism 7 from representatives of most species in the genome database (red braches, Figure S1A). As 8 expected, the general topology of this clade reflects that of the guide species tree (Tree S1). A 9 divergent clade (cyan, blue and green branches, Figure S1A) includes a homolog from Streptomyces 10 viridochromogenes that has previously been identified found in the BGC for the phosphinothricin 11 tripeptide (PTT) (1). This clade also includes a homolog from Streptomyces sviceus (GI 297146550; 12 eno2-SSV) that has not been linked to NP biosynthesis. 13 The S. viridochromogenes PTT enolase or carboxyphosphoenolpyruvate synthase (CPS GI: 14 302549806) shares 33% sequence identity with its glycolytic counterpart, i.e. GI 302551949. A 15 detailed sequence analysis showed only few changes in the active site residues (Figure S2A). To 16 identify the tridimensional position of these changes, a structural model of eno2-SSV was obtained 17 and compared with the crystal structure of the yeast enolase (PDB: 2ONE), which has been 18 thoroughly characterized (2). This sequence and structural analysis revealed that the mutation 19 E211S (numbering of yeast enolase) affects the active site of CPS. To analyze the effect of this 20 mutation, the CPS substrate, 2-Phosphonoformylglycerate was modeled in the active site of both 21 structures. This analysis showed that the ancestral glutamine residue would not allow the 22 accommodation of the substrate (Figure 2SB). Therefore, this particular mutation seems key to 1 23 substrate specificity in CPS. Overall, this analysis suggests that other members of the divergent 24 clade are related to a new enzyme function, likely involved in NP biosynthesis. 25 The draft genome sequence of S. sviceus has been deposited as a single scaffold with 551 gaps 26 (GenBank accession: CM000951.1 and BioProject PRJNA59513). Six gaps were located in the 27 region of interest, including one at the 5’ end of the enolase homolog, leading to a partial sequence. 28 Remarkably, neither PKSs nor NRPSs could be found in the gene neighborhood of the CPS gene, 29 although an incomplete CDS for a mutase resembling those related to phosphonates (1) could be 30 detected. On the basis of the phylogenetic analysis we expected that the divergent clade includes 31 enolases that are part of a BGC. To confirm this, the six gaps in the region were closed by iterative 32 PCR amplification (Supplementary Text 1 associated Table 1) and sequencing, followed by manual 33 annotation of the region. The annotation and functional predictions confirmed the presence of a 34 BGC putatively encoding a pathway that shares common steps with PTT biosynthesis, including 35 those related to the formation of phosphinopyruvate from phosphoenolpyruvate (1) (Supplementary 36 Figure 2C). Moreover, the complete sequence allowed for the identification of the mutase- 37 decarboxylase pair of enzymes present in most Streptomyces phosphonate biosynthetic systems 38 (Figure S1B). Overall, this functional annotation suggests that the product of this BGC is a 39 previously uncharacterized phosphonate natural product. 40 41 42 43 44 2 45 Supplementary text 1 associated table 1. 46 Annotation of a new phosphonate BGC in S. sviceus 47 Gene Locus tag Predicted function Length (Amino acids) Closest homolog ID* 1 SSEG_06268 LysR family transcriptional regulator 306 LysR family transcriptional regulator, Nostoc punctiforme PCC 73102 36% 2 SSEG_09941 ABC transporter ATPase subunit 326 ABC transporter, Catenulispora acidiphila DSM 44928 63% 3 SSEG_09940 ABC-type multidrug transporter permease 261 ABC transporter, Catenulispora acidiphila DSM 44928 63% 4 SSEG_06265 Phosphonate dehydrogenase 372 D-isomer specific 2-hydroxyacid dehydrogenase, Cyanothece sp. ATCC 51142 45% 5 SSEG_09939 Phosphopantetheinyl transferase 223 4'-phosphopantetheinyl transferase, Nocardia asteroides 39% 6 SSEG_09938 Phosphopantetheine-binding protein 106 Hypothetical protein, Actinokineospora enzanensis 46% 7 SSEG_06262 Phosphonate-acyltransferase 556 Hypothetical protein, Salinispora pacifica 50% 8 SSEG_06261 Manganese transporter MntH 439 mn2+/fe2+ transporter, nramp family, Micromonospora sp. L5 YP_004081943.1 52% 9 SSEG_09937 Phosphoenolpyruvate phosphomutase 309 Phosphoenolpyruvate phosphomutase, Saccharopolyspora spinosa 61% 10 SSEG_09936 Rieske (2Fe-2S) iron-sulfur domain protein 126 Rieske (2Fe-2S) iron-sulfur domain-containing protein, Pseudonocardia dioxanivorans CB1190 46% 11 SSEG_09935 Metallo-dependent amidohydrolase 361 Hypothetical protein, Paenibacillus daejeonensis 50% 12 SSEG_09934 Short chain dehydrogenase/reductase family 284 Alcohol dehydrogenase, Nocardiopsis halotolerans 44% 13 SSEG_09933 Glutamate-1-semialdehyde aminotransferase 468 Glutamate-1-semialdehyde aminotransferase, Pseudomonas mendocina NK-01 36% 14 SSEG_09932 Aminolevulinate-coenzyme A ligase 412 8-amino-7-oxononanoate synthase, Pontibacter sp. BAB1700 58% 15 Within a gap Putative alcohol dehydrogenase 382 Alcohol dehydrogenase, Streptomyces rimosus 41% 16 SEG_10418 Hydroxyethylphosphonate dioxygenase 439 2-hydroxyethylphosphonate dioxygenase phpD, Streptomyces viridochromogenes 46% 17 SSEG_10417 3-phosphoglycerate dehydrogenase 338 D-3-phosphoglycerate dehydrogenase, Frankia alni ACN14a 62% 18 Within a gap Phosphonopyruvate decarboxylase 383 phosphonopyruvate decarboxylase, Nocardia brasiliensis ATCC 700358 63% 19 Within a gap Nicotinamide mononucleotide adenylyltransferase 184 Nicotinamide mononucleotide adenylyltransferase phpF, Streptomyces viridochromogenes 74% 20 SEG_10416 2,3-bisphosphoglycerateindependent phosphoglycerate mutase 427 PhpG, Streptomyces viridochromogenes 61% 21 SSEG_10415 Enolase 421 Carboxyphosphoenolpyruvate synthase, Streptomyces viridochromogenes 50% 22 SSEG_10414 Carboxyphosphonoenolpyruvate mutase 286 Carboxyphosphonoenolpyruvate mutase, Streptomyces hygroscopicus 80% 23 SSEG_10413 Aldehyde dehydrogenase 462 Hypothetical protein, Amycolatopsis nigrescens 48% 24 SSEG_08119 Beta-lactamase domaincontaining protein 255 Aldehyde dehydrogenase PhpJ, Streptomyces viridochromogenes 69% *Percentage of amino acid sequence identity based in the BlastP alignment 3 48 Supplementary text 1 associated methods. 49 Streptomyces sviceus BGC gap closure. The gaps and misassembles found in the region between 50 8 and 24 kbp downstream and upstream of the PTT enolase (ZP_06914376.1) in the S. sviceus draft 51 genome sequence, which was obtained from GenBank (GI: NZ_ABJJ00000000), were closed by 52 PCR amplification and product sequencing (Supplementary text 1 associated table 2); for gap 3, 53 which was too long for a single PCR, 3 iterative rounds of sequencing and primer synthesis were 54 required until the gap was closed. 55 Molecular modeling of the recruited enolases (PhpH). The molecular model of PhpH was 56 constructed with Modeller (3) using as template the crystal structure of the dimeric yeast enolase in 57 complex with magnesium, 2-phosphoglycerate (2-PGA) and phosphoenolpyruvate (PEP) obtained 58 from the Protein Data Bank (PDB : 2ONE) (2). This enolase shares 33% identity with the Carboxy- 59 phosphoenolpyruvate synthase (phpH or PTT enolase) from S. viridochromogenes (1). A model of 60 the product of the PTT enolase, carboxy-phosphoenol pyruvate was built with VegaZZ (4), and 61 located in an analog position with respect to PEP in the active site of the PTT enolase, by means of 62 using superimpositions of the model and template in Pymol (The PyMOL Molecular Graphics 63 System, Version 0.99 Schrödinger, LLC; http://www.pymol.org/). 64 Supplementary text 1 associated table 2. Primers used for gap closing Fragment 1 2 3 4 5 6 Total sequenced Forward primer Reverse primer bases 148 F-TGCCGCCCAGTTCGAGCAGA R-ATCCGAACGCACACCGCTG 566 F-CCAGCGTTCTGGCCAGGGCT R-CACGATCGCGACCGACGACT FA-AAGGCGCCCTGCTTGATGAA RA-CAAACTCCAGGCCTTCTACG FFB-GAAGTTGATGCGGAACGCCA RB-GCCGAGAACATCCTGCACGTG 2726 FC-GCTGATGGGTTTGTCGTCGC RC-GGTGGCGTGATGGTCACAGC RD-CGTGTGCACCACCGGCAAGTC 538 F-ATTCCGGTTGTTGGCGTGCC; R-TAGTTGTTGATGCTCCACAC 484 F-GTCGTCGAAGTCATGGGCGT; R-CATGGTCTTCGACACCCTGG 535 F-GAGTGGTCGGCATGGGCCGG; R-GTGACCTCGTGATCCGGGAC 65 4 66 67 68 69 70 71 72 73 74 75 76 77 78 79 Supplementary text 1 associated references: 1. Blodgett JA et al. (2007) Unusual transformations in the biosynthesis of the antibiotic phosphinothricin tripeptide. Nat Chem Biol 3:480–5 2. Zhang E, Brewer JM, Minor W, Carreira LA, Lebioda L (1997) Mechanism of enolase: the crystal structure of asymmetric dimer enolase-2-phospho-D-glycerate/enolase-phosphoenolpyruvate at 2.0 A resolution. Biochemistry 36:12526–34. 3. Sali A, Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234:779–815. 4. Pedretti A, Villa L, Vistoli G (2004) VEGA--an open platform to develop chemo-bio-informatics applications, using plug-in architecture and script programming. J Comput Aided Mol Des 18:167– 73. 80 5 81 82 Figure S1. A. Phylogenetic reconstruction of the actinobacterial enolases (Tree S2). Black branches 83 include homologs associated with glycolysis while green branches were linked to NP BGCs, a 84 homolog from S. sviceus, highlighted in red implicates the loci shown in B in phosphonate 85 biosynthesis. B. The gene cluster (top) that encodes a novel biosynthetic pathway for a cryptic 86 phosphonate NP identified using EvoMining on the genome of S. sviceus. The gene cluster 87 organization is compared with the PTT gene cluster of S. viridochromogenes. At the bottom the 88 common biosynthetic steps between the PTT and PSV pathways are shown 89 6 90 91 Figure S2. Structure-function analysis of enolases and carboxyphosphoenolpyruvate 92 synthases (CPS). A. Sequence alignment of enolases from various organisms and CPS, the amino 93 acid numbers are relative to the yeast enolase. The catalytic residues are indicated at the top and 94 central homologs are shown in white background, and recruited homologs in green as in the 95 phylogenetic reconstruction in supplementary figure 1A. B. Comparison of the yeast enolase crystal 96 structure bound with its product phosphoenolpyruvate (PEP) and a structural model of the CPS from 97 S. viridochromogenes and its substrate carboxy-phosphoenolpyruvate (CPEP), K345 the conserved 98 catalytic base, and the mutations in the catalytic acid E211S and the catalytic water molecule holder 99 E168Q are indicated and shown in sticks. C. Reactions catalysed by the glycolytic enolases and the 100 CPSs, colour code is the same as in A and supplementary figure 1A. 101 7 102 103 104 Figure S3. Distribution of EvoMining hits by BGC class as annotated by AntiSMASH. The most 105 abundant known classes of BGCs are NRPSs (23%) and PKSs (PKS1, PKS2, PKS3 and TransPKS; 106 18 % in total). EvoMining predictions and EvoMining hits detected by ClusterFinder are altogether 107 the most abundant class (30 %) and may represent several classes of unprecedented BGCs 108 8 109 110 Supplementary figure S4. A. HPLC analysis (Vydac C-18 column) of extracts of a leupA deficient 111 mutant in S. roseus ATCC31245 in comparison with wild type S. roseus and a leupeptin authentic 112 standard, leupeptin can be detected (see figure S5 for MS analyses) in wild type S. roseus while the 113 leupA mutant cannot longer produce leupeptin. B. HPLC analysis (Restek C-18 column) of extracts 114 from E. coli DH10B carrying the 8_10B and 9_18N clones with the leup locus in comparison with 115 a leupeptin authentic standard. Both strains produced leupeptin (see figure S5 for MS analyses). 9 116 117 118 119 120 Figure S5. A. MS analysis of peaks with retention times equivalent to the leupeptin standard (See 121 figure S4) confirming heterologous production of leupeptin using genomic clones containing the 122 leup genes. B. MS2 analysis of genomic clones containing the leup genes, the fragmentation patterns 123 of the m/z =427.3 from the extracts are identical to those of the m/z=427.3 from the leupeptin 124 standard. Similar patterns were obtained from the extracts of wild type S. roseus ATCC 31245. 10 125 126 Figure S6. Postulated pathway for arseno-organic NP biosynthesis in S. coelicolor and S. 127 lividans. The reactions proposed for SLI_1096, SLI_1097 and SLI_1091 are responsible for the 128 biosynthesis of the As-C bond at the early stages of the biosynthetic pathway. The biosynthetic logic 129 proposed for SLI_1088-9 is related to the synthesis of an acyl chain that is proposed to be linked to 130 the As-C containing intermediary by other enzymes in the BGC. At the left, structural predictions 131 of potential products for the pathway based on high resolution MS data are shown. This pathway 132 and further studies on the water-soluble As-species present in the samples (data not shown) suggest 133 a non-methylated As-moiety as shown in the last structure, which has not been described in literature 134 yet. 135 11 136 137 Figure S7. A selected EvoMining Prediction. This BGC was predicted after identification of a 138 recruited AroA homolog which was not identified by ClusterFinder or antiSMASH. Detailed 139 annotation is available as table S7. 12