Lecture 1: Methods for in silico analysis of cryptic natural product biosynthetic gene clusters
Microbial Genomics and Secondary Metabolites Summer School,
MedILS, Split, Croatia, 25-29 June 2007
Greg Challis
Department of Chemistry
Overview
• Introduction cryptic (orphan) gene clusters in microbial genomes
• Clusters encoding nonribosomal peptide synthetases (NRPSs) domains, modules, substrate specificity, predicting products
• Clusters encoding modular polyketide synthases (PKSs) domains, modules, substrate specificity, predicting products
• Clusters encoding other biosynthetic systems terpene synthases, iterative PKSs
‘Cryptic’ (orphan) biosynthetic gene clusters
• Present in many of the 300 or so sequenced microbial genomes e.g. Streptomyces avermitilis
Streptomyces coelicolor
Bacillus subtilis
Pseudomonas fluorescens
Pseudomonas syringae
Nostoc punctiforme
• Polyketide synthases
• Nonribosomal peptide synthetases
• Terpene synthases
Aspergillus nidulans
• May prove a valuable new source of bioactive metabolites
Genome sequence of the model antibioticproducer Streptomyces coelicolor M145
HO
2
C
O
OMe
H
N
N N
H prodiginines
O OH
OH O
O OH
OH O actinorhodin
O
CO
2
H
O
O
CO
2
H methylenomycin A
O
O
HN
N
H
HN
OH
H
N
O
HN
O
O
O
O
H
N
N
H
O
O
HO
2
C
HO
2
C
R'
O
H
2
NH
H
N
NOC OR
O
NH
CO
2
HN
O
H
N
H
O
O
NH
CO
2
H calcium-dependent antibiotic
OH
Gene clusters directing complex metabolite biosynthesis in the S. coelicolor genome
Biosynthetic System
Type II iterative polyketide synthase
Type II iterative polyketide synthase
Fatty acid synthase; Butenolide synthase
NRPS; Type I Modular PKS; FAS
Nonribosomal peptide synthetase
Nonribosomal peptide synthetase
Nonribosomal peptide synthetase
Nonribosomal peptide synthetase
Type I Modular polyketide synthase
Type I Modular polyketide synthase
Type I Iterative polyketide synthase
Type III iterative polyketide synthase
Type III polyketide synthase
Type III polyketide synthase
Sesquiterpene synthase
Sesquiterpene synthase
Squalene-Hopene cyclase
Phytoene synthase
NRPS-independent siderophore
NRPS-independent siderophore
Type II FAS
Butanolide synthase
Metabolite
Actinorhodin
Grey spore pigment
Methylenomycin
Prodiginines
Calcium-dependent antibiotics
Siderophore?
Siderophore?
Unknown
Polyene?
Unknown
Polyunsaturated fatty acid?
Tetrahydroxynaphthalene?
Unknown
Unknown
Geosmin?
Geosmin?
Hopanoids?
Isorenieratine?
Desferrioxamines?
Siderophore?
Unknown
SCB1?
Size (kb) Location
22
8
20
33
80
20
26
14
70
10
19
1
3.5
1
2
2.5
15
15
7
4
10
1
5071-5092
5314-5320
SCP1
5877-5898
3210-3249
0489-0499
7681-7691
6429-6438
6273-6288
6826-6827
0124-0129
1206-1208
7669-7671
7221
6073
5222-5223
6759-6771
0185-0191
2780-2785
5799-5801
1265-1273
6266-6267
Bentley et al. Nature (2002) 417 , 141-147
Recap of NRPS organisation and function: the gramicidin S synthetase as an example grsT grsA grsB synthetase 1
A module 1
E module 2
C A synthetase 2 module 4 module 3
C A C A module 5
C A TE
A = Adenylation
PCP = peptidyl carrier protein
C = Condensation
E = Epimerisation
TE = Thioesterase
O
H
2
N
O
NH
2
O N
H
2
N
O
H HN
NH
O
O
N
H
2
N
NH
2 2
O
NH
2
O
HN
NH
O
O N
H
2
N
NH
2
Recap of NRPS organisation and function: the gramicidin S synthetase as an example
O
O
S
NH
O
HN
O
O
NH
TE
O
O
NH
2
O
HN
NH
O
NH
O
O N
N
H
2
N
H
2
N
NH
2
TE
For further information see Lars Robbel’s poster
O
O
O O
NH N
NH
NH HN
NH
H
2
N
O
HN
HN
N HN
O O
O
O
NH
2
O
Nonribosomal peptide synthetases encoded by the S. coelicolor genome
Biosynthetic System
Type II iterative polyketide synthase
Type II iterative polyketide synthase
Fatty acid synthase; Butenolide synthase
NRPS; Type I Modular PKS; FAS
Nonribosomal peptide synthetase
Nonribosomal peptide synthetase
Nonribosomal peptide synthetase
Nonribosomal peptide synthetase
Type I Modular polyketide synthase
Type I Modular polyketide synthase
Type I Iterative polyketide synthase
Type III iterative polyketide synthase
Type III polyketide synthase
Type III polyketide synthase
Sesquiterpene synthase
Sesquiterpene synthase
Squalene-Hopene cyclase
Phytoene synthase
NRPS-independent siderophore
NRPS-independent siderophore
Type II FAS
Butanolide synthase
Metabolite
Actinorhodin
Grey spore pigment
Methylenomycin
Prodiginines
Calcium-dependent antibiotics
Siderophore?
Siderophore?
Unknown
Polyene?
Unknown
Polyunsaturated fatty acid?
Tetrahydroxynaphthalene?
Unknown
Unknown
Geosmin?
Geosmin?
Hopanoids?
Isorenieratine?
Desferrioxamines?
Siderophore?
Unknown
SCB1?
Size (kb) Location
22
8
20
33
80
20
26
14
70
10
19
1
3.5
1
2
2.5
15
15
7
4
10
1
5071-5092
5314-5320
SCP1
5877-5898
3210-3249
0489-0499
7681-7691
6429-6438
6273-6288
6826-6827
0124-0129
1206-1208
7669-7671
7221
6073
5222-5223
6759-6771
0185-0191
2780-2785
5799-5801
1265-1273
6266-6267
A new S. coelicolor NRPS gene cluster cchI cchJ cchH cchB cchA
Non-ribosomal peptide synthetase ( cchH )
MbtH-like protein ( cchK )
Flavin-dependent monooxygenase ( cchB )
Formyl-tetrahydrofolate-dependent formyl transferase ( cchA)
Esterase ( cchJ )
Export functions
Ferric-siderophore import
Challis and Ravel FEMS Microbiol. Lett.
(2000) 187 , 111-114
Prediction of domain and module structure
Conserved Domain (CD) search
(http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi)
Deduced domain and module organization
Module 1 Module 2 Module 3
A E C A E C A
SH SH SH
Prediction of A-domain selectivity pocket residues
GrsA DA SV W EMFMALLTGASLYIILKDTINDFVKFEQYINQKEITVI T LPPTYVVHL-----DPERILSIQTL I T A GSATSPSLVNKWKEK--VTYIN A YGPTETT I
Ncs1-M1 DI AV W ELLAAFVGGARLVIAEHRLRGVVPHLPELMTDHRVTVA H FVPSVLEELLGWMADGGRVG-LRLV V C G GEAVPPSQRDRLLALSGARMVH A YGPTETT I
GrsA D A W T I A A I
Ncs1-M1 D I W H V G A I
Stachelhaus, Mootz and Marahiel Chem. Biol. (1999) 6, 493-505
Challis, Ravel and Townsend Chem. Biol. (2000) 7, 211-224
Empirical correlation between specificity pocket residues and substrate
Ser
Orn hTyr
Cys ( ACV)
HPG
Glu, Gln
Leu, Ile, Val
Glu ( Fengycin)
Leu ( Eucarya)
Threonine
Asp, Asn, Gln
Valine
Ala, Dab
Cyst eine
Trp, Phe
Tyr
Val, Ala ( Eucarya)
Proline
Challis, Ravel and Townsend Chem. Biol. (2000) 7, 211-224
Prediction of substrates and possible products for the S. coelicolor cryptic NRPS
Module 1 Module 2 Module 3
A E C A E C A
H
O N
OH
OH
O
NH
2
N
H
H
O
N
OH
NH
2
OH
O O
H
N
OH NH
O
2
N
H
OH
H
N
H
O
H
O
N
OH
Challis and Ravel FEMS Microbiol. Lett.
(2000) 187 , 111-114
Recap of modular PKS organisation and function: the erythromycin synthase as an example
• Three large modular enzymes (DEBS 1-
3), encoded by eryAI , eryAII , and eryAIII , assemble 6-DEB
• Each module performs one chain extension
O
Me Me
TE cyclizes
Me
OH
O
Me
Me
OH
O OH
Me
6-Deoxyerythronolide B
Recap of modular PKS organisation and function: the erythromycin synthase as an example
O
R
AT ACP
-CO
2
O
-
AT ACP
Recap of modular PKS organisation and function: the erythromycin synthase as an example
• Three large modular enzymes (DEBS 1-
3), encoded by eryAI , eryAII , and eryAIII , assemble 6-DEB
• Each module performs one chain extension
O
Me Me
TE cyclizes
Me
OH
O
Me
Me
OH
O OH
Me
6-Deoxyerythronolide B
Gene clusters directing complex metabolite biosynthesis in the S. coelicolor genome
Biosynthetic System
Type II iterative polyketide synthase
Type II iterative polyketide synthase
Fatty acid synthase; Butenolide synthase
NRPS; Type I Modular PKS; FAS
Nonribosomal peptide synthetase
Nonribosomal peptide synthetase
Nonribosomal peptide synthetase
Nonribosomal peptide synthetase
Type I Modular polyketide synthase
Type I Modular polyketide synthase
Type I Iterative polyketide synthase
Type III iterative polyketide synthase
Type III polyketide synthase
Type III polyketide synthase
Sesquiterpene synthase
Sesquiterpene synthase
Squalene-Hopene cyclase
Phytoene synthase
NRPS-independent siderophore
NRPS-independent siderophore
Type II FAS
Butanolide synthase
Metabolite
Actinorhodin
Grey spore pigment
Methylenomycin
Prodiginines
Calcium-dependent antibiotics
Siderophore?
Siderophore?
Unknown
Polyene?
Unknown
Polyunsaturated fatty acid?
Tetrahydroxynaphthalene?
Unknown
Unknown
Geosmin?
Geosmin?
Hopanoids?
Isorenieratine?
Desferrioxamines?
Siderophore?
Unknown
SCB1?
Size (kb) Location
22
8
20
33
80
20
26
14
70
10
19
1
3.5
1
2
2.5
15
15
7
4
10
1
5071-5092
5314-5320
SCP1
5877-5898
3210-3249
0489-0499
7681-7691
6429-6438
6273-6288
6826-6827
0124-0129
1206-1208
7669-7671
7221
6073
5222-5223
6759-6771
0185-0191
2780-2785
5799-5801
1265-1273
6266-6267
Bentley et al. Nature (2002) 417 , 141-147
A new S. coelicolor modular PKS cluster
Genes encoding a modular PKS
Prediction of domain and modules in CpkA
Conserved Domain (CD) search
(http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi)
Prediction of domain and modules in CpkB
Prediction of domain and modules in CpkC
Prediction of domains and modules in CpkABC
Pawlik, Kotowska, Chater, Kuczek and Takano Arch. Microbiol.
(2007) 187 , 87-99
Prediction of AT domain substrate selectivity
Haydock et al. FEBS Lett. (1995) 374, 246-248
Banskota et al. J. Antibiot. (2006) 59, 168-176
Prediction of KR domain stereoselectivity
Prediction of KR domain stereoselectivity
Caffrey ChemBioChem (2003) 4, 654-657
Reid et al. Biochemistry (2003) 42, 72-79
Prediction of substrates and possible products for the S. coelicolor cryptic PKS
OH OH or
OH O
H
Non-linear enzymatic logic can complicate things!
S
OH
N
H
S
H
N
H
OH S
N CO
2
H
Haynes and Challis, Curr. Op. Drug Discov. Develop.
(2007) 10 , 203-218
Non-linear enzymatic logic can complicate things!
Module 1 Module 3 Module 5 + 6 + 7
Load
AT ACP
Module 2
KS AT
KR
ACP KS AT
DH KR
ACP KS
Module 4 Module 7
AT
DH KR
ACP KS AT
KR
ACP KS AT
DH
ER
KR
ACP KS AT
KR
ACP TE
H
S
O
S
O
S
O
S
O
S
O
S
O
S
O
HO
CO
2
H H OH OH OH
CO
2
H
H OH
HO
3
OH
CO
2
H
H OH
HO
3
OH
CO
2
H
H OH
CO
2
H
H OH
CO
2
H H OH
OH OH
CO
2
H
NC
OH O
H
O
H
CO
2
H
BorI, Bor J
OH
H
O
H
O
CO
2
H
Haynes and Challis, Curr. Op. Drug Discov. Develop.
(2007) 10 , 203-218
Terpene synthases
OPP
OPP
OPP monoterpene synthase
C10 sesquiterpene synthase diterpene synthase
C15
C20 triterpene synthase
C30
Iterative polyketide synthases – type III PKSs
4 x
O
OH
Amycolatopsis orientalis
DpgA
HO
H
2
N
O
OH
HO
O
SCoA
OH
OH
HO
HO
H
2
N
HO
Me
O
Me
O
O
O
OH
O
O
Cl
HO
O
H
N
NH
H
HO
2
C
O
H
Cl
H
N
O
H
N
H
O
O
NH
2
H
N
O
OH
HO OH
N
H
OH
O
NHMe
OH OH
5 x
O
OH
Streptomyces griseus
RppA
HO
OH OH
OH
HO
HO
OH
OH melanin
OH OH
Conclusions
• Reasonably confident in silico predictions of domain / module organisation and substrate specificity of modular PKS / NRPS can be made
• Non-linear enzymatic logic can complicate the reliable prediction of product structure(s)
• For other types of biosynthetic system, reasonably confident predictions of substrate specificity can sometimes be made
• Prediction of chain length and substrate specificity in some iterative PKS systems, especially type III and fungal type I, remains difficult