Supplementary material for antiSMASH 2.0 – a versatile platform for genome mining of secondary metabolite producers Kai Blin*, Marnix H. Medema*, Daniyal Kazempour, Michael A. Fischbach, Rainer Breitling, Eriko Takano, and Tilmann Weber Content Table S1: Signature HMMs for detection of secondary metabolite biosynthesis gene clusters Table S2: Rules for detection of secondary metabolite biosynthesis gene clusters Table S3: Validation of the detection of new gene cluster classes Table S4: antiSMASH benchmark Supplementary references 1 Table S1: Signature HMMs for detection of secondary metabolite biosynthesis gene clusters (extended according to Medema et al. 2011) Compound class Description HMM name Source NRPS NRPS NRPS Condensation AMP-binding A-OX PKS (neg.) Condensation domain Adenylation domain Adenylation domain with integrated oxidase Thiolation domain Ketosynthase domain Acyltransferase domain Trans-acyltransferase docking domain Bacterial type I fatty acid synthase Fungal type I fatty acid synthase Type II fatty acid synthase PKS (neg.) FabH fatty acid synthase fabH PKS PKS PKS PKS PKS PKS Enediyine ketosynthase Modular ketosynthase Hybrid ketosynthase Iterative ketosynthase Trans-AT ketosynthase Unusual PKS HglD-like ene_KS mod_KS hyb_KS itr_KS tra_KS hglD PKS Unusual PKS HglE-like hglE PKS Type II PKS ketosynthase t2ks PKS t2ks2 Terpene Terpene Type II PKS ketosynthase, model 2 Type II PKS Chain length factor Type III PKS N-terminal Type III PKS C-terminal Terpene synthase C terminal Terpene sythase Phytoene_synthase PFAM PF00668.13 PFAM PF00501.21 Medema et al. (2011) PFAM PF00550.18 SMART SMART Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Yadav et al. (2009) Yadav et al. (2009) Yadav et al. (2009) Yadav et al. (2009) Yadav et al. (2009) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) PFAM PF00195.12 PFAM PF00195.12 PFAM PF03936.9 Terpene Terpene Lycopene cyclase Terpene cyclase Lycopene_cycl terpene_cyclase Terpene NapT7-like protein NapT7 Terpene Fungal geranylgeranyl fung_ggpp NRPS/PKS PKS PKS PKS PKS (neg.) PKS (neg.) PKS PKS PKS Terpene 2 PP-binding PKS_KS PKS_AT ATd bt1fas ft1fas t2fas t2clf Chal_sti_synt_N Chal_sti_synt_C Terpene_synth_C Terpene synth phytoene_synth PFAM PF01397.14 Medema et al. (2011) PFAM PF05834.5 Medema et al. (2011) Medema et al. (2011) Medema et al. Compound class Terpene Terpene Terpene Lantipeptides Lantipeptides Lantipeptides Lantipeptides Lantipeptides Lantipeptides Lantipeptides Lantipeptides Lantipeptides Lantipeptides Lantipeptides Lantipeptides Lantipeptides Lantipeptides Lantipeptides Lantipeptides Lantipeptides Lantipeptides Lantipeptides Lantipeptides Bacteriocin Bacteriocin Description pyrophosphate synthase Fungal geranylgeranyl pyrophosphate synthase, model 2 Dimethylallyl tryptophan synthase Trichodiene synthase LanC-like Lantibiotics biosynthesis protein Lantibiotic dehydratase, Nterminus Lantibiotic dehydratase, Cterminus Lantibiotic antimicrobial peptide 18 Gallidermin Lantibiotic, type A Lantibiotic, gallidermin/nisin family Lantibiotic leader lacticin 481 group Lantibiotic leader mersacidin cinnamycin group Lantibiotic leader LanBC modified Lantibiotic peptide lacticin 481 group Lantibiotic peptide nisin epidermin group Lantibiotic peptide nisin group Lantibiotic peptide epidermin group Lantibiotic peptide two component alpha Lantibiotic peptide two component beta lantibiotic peptide lacticin 481 group (dufour et al) lantibiotic leader lacticin 481 group (dufour et al) FxLD family lantipeptide Streptomyces PEQAXS motif lantipeptide Putative Streptomyces bacteriocin Antimicrobial peptide 14 3 HMM name Source fung_ggpp2 (2011) Medema et al. (2011) dmat LANC_like Medema et al. (2011) Medema et al. (2011) PFAM PF05147.6 Lant_dehyd_N PFAM PF04737.6 Lant_dehyd_C PFAM PF04738.6 Antimicrobial18 PFAM PF08130.4 Gallidermin L_biotic_typeA TIGR03731 PFAM PF02052.8 PFAM PF04604.6 TIGR03731 LE-LAC481 De Jong et al. (2010) De Jong et al. (2010) trichodiene_synt LE-MER+2PEP LE-LanBC MA-LAC481 MA-NIS+EPI MA-NIS MA-EPI MA-2PEPA MA-2PEPB LE-DUF MA-DUF TIGR04363 Strep_PEQAXS strepbact Antimicrobial14 De Jong et al. (2010) De Jong et al. (2010) De Jong et al. (2010) De Jong et al. (2010) De Jong et al. (2010) De Jong et al. (2010) De Jong et al. (2010) De Jong et al. (2010) De Jong et al. (2010) TIGR04363 This study Medema et al. (2011) PFAM PF08109.4 Compound class Description HMM name Source Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin_IIc domain Bacteriocin_IId domain Bacteriocin_IIdc_cydomain Bacteriocin_II domain Bacteriocin_IIi domain Lactococcin Antimicrobial peptide 17 Lactococcin 972 Lactococcin G-beta Subtilosin A Cloacin Linocin M18 Bacteriocin biosynthesis cyclodehydratase Bacteriocin biosynthesis docking scaffold SagB-type dehydrogenase Bacteriocin, circularin A/uberolysin famil Bacteriocin, microcyclamide/patellamide family Thiazole-containing bacteriocin maturation protei Bacteriocin propeptide Bacteriocin biosynthesis cyclodehydratase Bacteriocin, BA_2677 family Bacteriocin protoxin, streptolysin S family Two-chain TOMM family NHLP leader peptide domain Bacteriocin maturation radical SAM protein 1 Microviridin A Bacteriocin_IIc Bacteriocin_IId BacteriocIIc_cy Bacteriocin_II Bacteriocin_IIi Lactococcin Antimicrobial17 Lactococcin_972 LcnG-beta Subtilosin_A Cloacin Linocin_M18 TIGR03603 PFAM PF10439.2 PFAM PF09221.3 PFAM PF12173.1 PFAM PF01721.11 PFAM PF11758.1 PFAM PF04369.6 PFAM PF08129.4 PFAM PF09683.3 PFAM PF11632.1 PFAM PF11420.1 PFAM PF03515.7 PFAM PF04454.5 TIGR03603 TIGR03604 TIGR03604 TIGR03605 TIGR03651 TIGR03605 TIGR03651 TIGR03678 TIGR03678 TIGR03693 TIGR03693 TIGR03798 TIGR03882 TIGR03798 TIGR03882 TIGR03601 TIGR03601 TIGR03602 TIGR03602 TIGR03795 TIGR03793 TIGR03795 TIGR03793 TIGR03975 TIGR03975 mvnA Thiostrepton-like thiopeptides Putative subtilosin biosynthesis enzyme ywiA Cypermycin prepeptide Marinostatin/microviridin prepeptide Lasso peptide modification enzyme thiostrepton subtilosin Medema et al. (2011) Medema et al. (2011) This study cypermycin mvd This study This study lasso This study Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin 4 Compound class Description HMM name Source Bacteriocin Small prepeptide associated domain Microcin J25 prepeptide Microcin J25 processing protein McjC Glycocin prepeptide Bottromycin biosynthesis enzyme SkfC biosynthesis enzyme Thuricin prepeptide Sublancin prepeptide Beta-lactam synthase DUF692 PFAM DUF692 micJ25 mcjC This study This study glycocin botH This study This study skfc thuricin sublancin BLS Clavulanic acid synthaselike Tabtoxin synthase-like CAS Aminoglycosides / aminocyclitols Aminoglycosides / aminocyclitols Aminoglycosides / aminocyclitols Aminoglycosides / aminocyclitols Aminoglycosides / aminocyclitols Aminoglycosides / aminocyclitols Aminoglycosides / aminocyclitols Aminoglycosides / aminocyclitols Aminoglycosides / aminocyclitols Aminocoumarins 2-deoxy-scyllo-inosose synthase NeoL-like deacetylase DOIS SpcD-/SpcK-like thymidylyltransferas SpcF-/SpcG-like glycosyltransferase StrH-like glycosyltransferase StrK-like phosphatase spcDK_like_glyc StrK-like phosphatase strK_like2 ValA-like 2-epi-5-epivaliolone synthase 2-epi-5-epi-valiolone synthase, SalQ-like NovK-like reductase valA_like Aminocoumarins NovJ-like reductase novJ Aminocoumarins NovI-like cytochrome P450 novI Aminocoumarins NovH-like protein novH Aminocoumarins SpcD/SpcK-like thymidylyltransferase, aminocoumarins group Siderophore synthase spcDK_like_cou This study This study This study Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Bacteriocin Beta-lactams Beta-lactams Beta-lactams Siderophores 5 Tabtoxin neoL_like spcFG_like strH_like strK_like1 salQ novK IucA_IucC PFAM PF04183.5 Compound class Ectoines Description HMM name Source Ectoine synthase ectoine_synt Butyrolactones AfsA Nucleosides AfsA-like butyrolactone synthase StaD-like chromopyrrolic acid synthase domain LipM-like nucleotidyltransferase LipU-like protein Medema et al. (2011) PFAM PF03756.6 Nucleosides LipV-like dehydrogenase LipV Nucleosides ToyB-like synthase ToyB Nucleosides TunD Nucleosides TunD-like putative Nacetylglucosamine transferase Pur6-like synthetase Nucleosides Pur10-like oxidoreductase pur10 Nucleosides NikJ-like protein nikJ Indoles Nucleosides NikO-like enolpyruvyl transferase Phosphoglycolipids MoeO5-like prenyl-3phosphoglycerate synthase Phosphoglycolipids Phosphoglycolipid glycosyltransferase MelC-like melanin synthase Melanins Nucleosides Oligosaccharide Oligosaccharide Oligosaccharide Oligosaccharide Oligosaccharide Oligosaccharide Furan Homoserine lactone Thiopeptide Phenazine indsynth LipM LipU pur6 nikO MoeO5 moeGT Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) This study Secondary metaboliterelated glycosyltransferase Secondary metaboliterelated glycosyltransferase Secondary metaboliterelated glycosyltransferase Secondary metaboliterelated glycosyltransferase Secondary metaboliterelated glycosyltransferase Secondary metaboliterelated glycosyltransferase MmyO-like protei Autoinducer synthetase Glycos_transf_1 Medema et al. (2011) PFAM PF00534.14 Glycos_transf_2 PFAM PF00535.20 Glyco_transf_28 PFAM PF03033.14 DUF1205 PFAM DUF1205 MGT This study MGT2 This study mmyO Autoind_synth This study PFAM PF00765.12 YcaO-like Phenazine biosynthesis YcaO phzB PFAM PF02624.11 This study 6 melC Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Compound class Description HMM name Source Phosphonate Others Others gene Phosphonate biosynthesis NAD-binding domain 4 LmbU-like protein phosphonates NAD_binding_4 LmbU Others Goadsporin-like protein goadsporin_like Others Neocarzinostat Others Neocarzinostatin-like protein Cyanobactin protease Others Cyclodipeptide synthase cycdipepsynth Others Fom1-like phosphomutase fom1 Others BcpB-like phosphomutase bcpB Others FrbD-like phosphomutase frbD Others MitE-like CoA-ligase mitE Others Valanimycin biosynthesis VlmB domain Pyrrolnitrin biosynthesis PrnB domain Nitrosynthase domain Bacilysin-related ligase Cypermycin biosynthesis cypI enzyme vlmB This study PFAM PF07993.5 Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) Medema et al. (2011) This study This study This study Others Others Others Others 7 cyanobactin_synth prnB CaiA bacilysin cypI Table S2: Rules for detection of secondary metabolite biosynthesis gene clusters (extended according to Medema et al., 2011) Biosynthetic class Type I PKS Trans-AT type I PKS Type II PKS Type III PKS Type IV PKS Non-ribosomal peptide synthetase Terpene Rules - - Lantipeptides - Bacteriocins - Beta-lactams - Aminoglycosides / - KS & AT HMM scores > 50 within one protein KS score > bactTypeIFAS / fungTypeIFAS / HglD&E / FabH scores trans-AT docking domain HMM score > 65 KS score > 50 no match to rules for normal type I PKSs as above type II KS or CLF HMM score > 50 KS/CLF score > enedyineKS / modularKS / hybridKS / iterativeKS / transATKS / bactTypeIFAS / fungTypeIFAS / TypeIIFAS / HglD&E / FabH HMM scores no match to rules for normal/trans type I PKSs as above Chal_sti_synt_C or Chal_sti_synt_N HMM scores > 35 no match to rules for normal type I&II PKSs as above HglE or HglD HMM scores > 50 HglD/E HMM score > bactTypeIFAS / fungTypeIFAS / TypeIIFAS / FabH HMM scores no match to rules for normal type I&II&III PKSs as above C & A / A-OX domain HMM scores > 20 within one protein OR single domain C & A proteins scores > 20 within 20 kb distance Terpene_Synth HMM score > 23 OR Terpene_Synth_C HMM score > 23 OR phytoene_synt HMM score > 20 OR Lycopene_cycl HMM score > 80 OR terpene_cyclase HMM score > 50 OR NapT7 HMM score > 250 OR fung_ggpps HMM score > 420 OR fung_ggpps2 HMM score > 312 OR dmat HMM score > 200 OR trichodiene_synth HMM score > 150 LANC_like HMM score > 80 OR (Lant_dehydN and LantdehydC HMM scores > 20 within one protein) OR one of a range of lantibiotic prepeptide HMM scores > 20 OR TIGR03731 HMM score > 18 Strepbact HMM score > 50 OR Antimicrobial14 HMM score > 90 OR Bacteriocin_IId HMM score > 23 OR BacteriocIIc_cy HMM score > 92 OR Bacteriocin_II HMM score > 40 OR Lactococcin HMM score > 24 OR Antimicrobial17 HMM score > 31 OR Lactococcin_972 HMM score > 25 OR Bacteriocin_IIc HMM score > 27 OR LcnG-beta HMM score > 78 OR Bacteriocin_Iii HMM score > 56 OR Subtilosin_A HMM score > 98 OR Cloacin HMM score > 27 OR Linocin_M18 HMM score > 25 OR TIGR03603 HMM score > 150 OR TIGR03604 HMM score > 440 OR TIGR03605 HMM score > 200 OR TIGR03651 HMM score > 18 OR TIGR03678 HMM score > 35 OR TIGR03693 HMM score > 400 OR TIGR03798 HMM score > 16 OR TIGR03882 HMM score > 150 OR TIGR03601 HMM score > 50 OR TIGR03602 HMM score > 50 OR TIGR03795 HMM score > 41 OR TIGR03793 HMM score > 51 OR TIGR03975 HMM score > 282 OR mvnA HMM score > 20 OR thiostrepton HMM score > 20 OR subtilosin HMM score > 140 OR cypermycin HMM score > 10 OR mvd HMM score > 20 OR lasso HMM score > 400 OR DUF692 HMM score > 40 OR micJ25 HMM score > 21 OR mcjC HMM score > 60 OR glycocin HMM score > 30 OR both HMM score > 65 OR skfc HMM score > 70 OR thuricin HMM score > 30 OR sublancin HMM score > 30 Beta-lactam synthase HMM score > 250 OR clavulanic acid synthase HMM score > 250 OR tabtoxin synthase score > 500 strH HMM score > 50 OR strK1 HMM score > 800 OR strK2 HMM score > 8 Biosynthetic class aminocyclitols Rules Aminocoumarins - Siderophores Ectoines Butyrolactones Indoles Nucleosides - Phosphoglycolipids Melanins Oligosaccharide - Furan Homoserine lactone Thiopeptide - Phenazine Phosphonate Others - - 650 OR NeoL HMM score > 50 OR DOIS HMM score > 500 OR ValA HMM score > 600 OR SpcFG HMM score > 200 OR SpcDK_glyc HMM score > 600 OR salQ HMM score > 480 novK HMM score > 200 OR novJ HMM score > 350 OR novI HMM score > 600 OR novH HMM score > 750 OR spcDK_like_cou HMM score > 600 IucA_IucC HMM score > 30 Ectoine_synt HMM score > 35 AfsA HMM score > 25 ind_synth HMM score > 100 LipM HMM score > 50 OR LipU HMM score > 30 OR LipV HMM score > 375 OR ToyB HMM score > 175 OR TunD HMM score > 200 OR pur6 HMM score > 200 OR pur10 HMM score > 600 OR nikJ HMM score > 200 OR nikO HMM score > 400 MoeO5 HMM score > 65 OR moeGT HMM score > 40 melC HMM score > 40 at least three out of (Glycos_transf_1 HMM score > 20, Glycos_transf_2 HMM score > 20, Glyco_transf_28 HMM score > 26, MGT HMM score > 100, MGT2 HMM score > 150, DUF1205 HMM score > 20) mmyO HMM score > 500 Autoind_synth HMM score > 20 (Lant_dehyd_N HMM score > 20 OR Lant_dehyd_C HMM score > 20) and YcaO HMM score > 25 exist within 10 kilobases of each other phzB HMM score > 80 phosphonates HMM score > 275 PP-binding & AMP-binding HMM scores > 20 within one protein OR (PPbinding HMM score > 20 and A-OX HMM score > 50 within one protein) OR (NAD_binding_4 HMM score > 40 and A-OX HMM score > 50 within one protein) OR (NAD_binding_4 HMM score > 40 and AMP-binding HMM score > 20 within one protein) OR LmbU HMM score > 50 OR goadsporin_like HMM score > 500 OR Neocarzinostat HMM score > 28 OR cyanobactin_synth HMM score > 80 OR cycdipepsynth HMM score > 110 OR fom1 HMM score > 750 OR bcpB HMM score > 400 OR frbD HMM score > 350 OR mitE HMM score > 400 OR vlmB HMM score > 250 OR prnB HMM score > 200 or CaiA HMM score > 200 or bacilysin HMM score > 160 or cypI HMM score > 15 9 Table S3: Validation of the detection of new gene cluster classes Compound everninomicin avilamycin thiocillin cyclothiazomycin thiostrepton thiomuracin siomycin nosiheptide nocathiacin GE37468 GE2270 TP-1161 phenazine phenazine phenazine phenazine phenazine phenazine methylenomycin homoserine lactone homoserine lactone homoserine lactone homoserine lactone homoserine lactone fosfomycin dehydrophos FR900098 Compound class oligosaccharide oligosaccharide thiopeptide thiopeptide thiopeptide thiopeptide thiopeptide thiopeptide thiopeptide thiopeptide thiopeptide thiopeptide phenazine phenazine phenazine phenazine phenazine phenazine furan homoserine lactone homoserine lactone homoserine lactone homoserine lactone homoserine lactone phosphonate phosphonate phosphonate GenBank Accession nr GY241466 AF333038 NC_004722 FJ472825 FJ652572 FJ461360 FJ436355 FJ438820 GU564398 GE37468 GE2270 TP-1161 JX843821 JQ659263 FN178498 AM384985 AY927995 HM594285 AJ276673 ECU17224 ASU65741 L48616 AF079136 AF057718 EU924263 GU199252 DQ267750 10 Detected successfully? yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes Table S4: antiSMASH benchmark antiSMASH 1.0 Pseudomonas fluorescens Pf-5 Streptomyces griseus IFO 13350 Kitasatospora setae NBRC 14216T Salinispora tropica CNB-440 Aspergillus fumigatus Af293 Percentage detected Percentage false positives Percentage false negatives Percentage found new Total annotated gene clusters in genome Detected annotated publication gene clusters 10 Newly detected gene clusters, not annotated in genome publication False positives False negatives 9 0 0 1 Misannotated 0 34 33 8 0 1 0 24 17 26 111 24 16 26 108 12 5 10 35 0 0 0 0 0 1 0 3 0 0 2 2 97.3 0.0 2.7 31.5 antiSMASH 2.0 Pseudomonas fluorescens Pf-5 Streptomyces griseus IFO 13350 Kitasatospora setae NBRC 14216T Salinispora tropica CNB-440 Aspergillus fumigatus Af293 Percentage detected Percentage false positives Percentage false negatives Percentage found new Total annotated gene clusters in genome Detected annotated publication gene clusters 10 Newly detected gene clusters, not annotated in genome publication False positives False negatives 9 2 0 1 Misannotated 0 34 33 9 0 1 0 24 17 26 24 16 26 13 5 10 0 0 0 0 1 0 0 0 2 111 108 39 0 3 2 97.3 0.0 2.7 35.1 11 Supplementary references 1. 2. Yadav, G., Gokhale, R.S. and Mohanty, D. (2009) Towards prediction of metabolic products of polyketide synthases: an in silico analysis. PLoS Comput. Biol., 5, e1000351. de Jong, A., van Heel, A.J., Kok, J. and Kuipers, O.P. (2010) BAGEL2: mining for bacteriocins in genomic data. Nucleic Acids Res., 38, W647-651. 12