Ligand search and data mining of Structural Genomics structures Abhinav Kumar, Herbert Axelrod, Ashley Deacon Structure Determination Core, Joint Center for Structural Genomics (JCSG), Stanford Synchrotron Radiation Laboratory, Menlo Park, CA, USA 2 1 3 The JCSG Target Pipeline The Joint Center for Structural Genomics (JCSG) The Role of the Structure Determination Core in the JCSG 1. Screen Crystals and Collect Data 2. Automatically Process Data Autoindex Each project moves from target selection through publication along the Target Pipeline. Integrate Scale Solve Trace 3. Refine and Evaluate Structures 4. Disseminate Information* Publish Web based Tools TOPSPAN (www.topsan.org) Ligand Search (smb.slac.stanford.edu/public/jcsg/cgi/jcsg_ligand_check.pl) The JCSG (www.jcsg.org) is one of the four large-scale structural genomics centers funded by NIGMS as part of the production phase of the Protein Structure Initiative (PSI). More than 2600 structures have been deposited into the PDB by the PSI centers as of 2007, of which the JCSG has contributed over 500 structures. * in collaboration with BIC 4 5 JCSG Ligand Search 6 Summary of Ligands (1606 structures) Ligands (269 structures; 140 different ligands): UNL(70), UNX(22), LLP(6), SIN(6), NDP(6), MA7(6), NAG(5), PLM(4), UNK(4), GUN(3), APC(3), SUC(3), BAL(3), GLC(3), PAF(3), APR(2), GAL(2), NCN(2), CSD(2), SAI(2), CEI(2), BIO(2), HMH(2), SAP(2), GNP(2), 144(2), NCA(2), G4P(2), MPO(2), SRT(2), ANP(2), PCP(2), BGC(2), PAJ(2), NIG(1), PRP(1), NIO(1), ABF(1), IPR(1), MTA(1), CP(1), MLT(1), DI6(1), MED(1), MLZ(1), 5GP(1), CSO(1), CDP(1), I3A(1), 2PL(1), HED(1), G1P(1), NBZ(1), CSY(1), FRU(1), PLG(1), THF(1), B1M(1), ACP(1), DU(1), MMZ(1), OHA(1), 16A(1), THT(1), M7P(1), 3GC(1), CF5(1), PEO(1), CTZ(1), ADE(1), FT6(1), KEG(1), LUM(1), XLS(1), BAM(1), ADN(1), PMP(1), ADQ(1), B33(1), DGI(1), G3H(1), OXG(1), NDS(1), SAL(1), 3SL(1), SIB(1), STH(1), FEO(1), G3P(1), OXN(1), FES(1), TYD(1), DGT(1), 8PP(1), CO2(1), MP5(1), NTM(1), PNS(1), AES(1), APK(1), UVW(1), TRE(1), PYR(1), NAI(1), TCL(1), NMN(1), MAN(1), BFD(1), HHP(1), RIP(1), RBF(1), ORO(1), SNN(1), DTP(1), ZID(1), DEP(1), UPG(1), HXA(1), AAT(1), DTY(1), DON(1), NPO(1), C2E(1), AGC(1), BDF(1), PHT(1), OSB(1), NVA(1), CRO(1), BDN(1), TNE(1), SOG(1), AGS(1), TLP(1), 1PS(1), DUT(1), CXS(1), GEQ(1), MRD(1), G6P(1) Co-factors (211 structures; 21 different co-factors): FMN(36), NAD(29), COA(18), NAP(17), PLP(15), ADP(15), FAD(15), SAM(14), ATP(9), SAH(9), AMP(9), HEM(8), ACO(7), GDP(4), FS4(3), U5P(2), MLC(1), COD(1), CNC(1), UTP(1), CTP(1) Metal Ions (647 structures; 30 different metal ions): MG(177), ZN(174), NA(102), CA(83), NI(40), MN(31), FE(26), K(16), FE2(9), CD(8), PT(8), HG(7), CO(5), SM(2), WO4(2), PR(2), AU(2), BA(1), CS(1), MW2(1), SE(1), ARS(1), ZN3(1), O4M(1), YT3(1), LI(1), MO2(1), MO3(1), VO4(1), MO6(1) Non-metal Ions (692 structures; 22 different non-metal ions): SO4(324), CL(243), PO4(118), NO3(11), IOD(10), BR(10), SCN(8), CO3(4), CAC(4), POP(3), AZI(3), SUL(2), BCT(2), ALF(2), OXL(2), PER(1), SO3(1), MLI(1), PO3(1), THJ(1), 1AL(1), NH4(1) Organics (90 structures; 26 different organics): IPA(14), EOH(13), BME(9), BEZ(5), TLA(5), SEO(5), AKG(5), ETX(4), TAR(4), PGO(4), DTT(4), OAA(2), ACE(2), DMS(2), MLA(1), DOX(1), XYL(1), MOH(1), 3OH(1), AZ1(1), PPI(1), IOH(1), FOR(1), MYR(1), GTT(1), LMT(1) Buffers (240 structures; 15 different buffers): ACT(86), ACY(47), FMT(37), CIT(27), TRS(16), EPE(15), MES(12), IMD(8), TMN(2), 10A(2), BTB(2), ICT(1), CPS(1), FLC(1), NHE(1) Precipitants (98 structures; 13 different precipitants): PEG(38), PG4(28), PGE(16), 1PE(8), P6G(7), 2PE(3), PE4(3), P33(3), PE5(2), PEF(1), BU3(1), 1PG(1), PE8(1) Salts (3 structures; 3 different salts): DPO(1), AF3(1), PPC(1) Detergents (2 structures; 1 different detergents): BOG(2) Cryos (502 structures; 5 different cryos): GOL(244), EDO(241), MPD(32), EGL(3), CRY(2) 7 8 PDB 2A3L 2OU3 1VR0 2OD6 1X92 1O8B 2OSU 1M33 1RTW 2NW9 1XKL 1LW4 2B4B 1TUF 2PUZ 2Q09 2GVC 1Y0G 1Z2L 1Y80 1KPH 1KPI 1N2H 1N2I 1BVR 1QPR 1P44 Search Results (35 hits) N Target PDB PFAM Accession 1 FB10607B 2r6v PF01613 NP_142786.1 2 3 FH7614A FJ9446A 2ou5 PF01243 PF01243 NP_349178.1 YP_508196.1 Organism Crystal Structure of FMN-binding Protein Pyrococcus (NP_142786.1) from Pyrococcus Horikoshii at Horikoshii Ot3 1.35 Å resolution Ligands EDO FMN NCA PSI JCSG Crystal Structure of NIMC/NIMA Family Protein (NP_349178.1) from Clostridium Acetobutylicum at 1.80 Å resolution Clostridium EDO JCSG Acetobutylicum FMN SO4 UNL Crystal Structure of Pyridoxamine 5'phosphate Oxidase- Related FMN-binding (YP_508196.1) From Jannaschia Sp. Ccs1 at 1.60 Å resolution Jannaschia Sp. Ccs1 FMN GOL SO4 JCSG … … … … … … … 34 SGT98480 1q45 PF00724 NP_178662.1 12-0xo-Phytodienoate Reductase Isoform 3 Arabidopsis Thaliana FMN CESG 35 TB0885A 1vp8 PF08981 NP_068944.1 Crystal Structure of Hypothetical Protein (NP_068944.1) from Archaeoglobus Fulgidus at 1.30 Å resolution Archaeoglobus Fulgidus Dsm 4304 FMN UNL JCSG . … 2ig6 Ligand Name Coformycin 5'-Phosphate 1H-Indole-3-Carbaldehyde (2R)-3-Sulfolactic Acid 10-Oxohexadecanoic Acid D-Glycero-D-Mannopyranose-7-Phosphate Beta-D-Arabinofuranose-5'-Phosphate 6-Diazenyl-5-Oxo-L-Norleucine 3-Hydroxy-Propanoic Acid (4-Amino-2-Methylpyrimidin-5-Yl)Methyl Dihydrogen Phosphate 6-Fluoro-L-Tryptophan 2-Amino-4H-1,3-Benzoxathiin-4-Ol 3-Hydroxy-2-[(3-Hydroxy-2-Methyl-5-Phosphonooxymethyl- Pyridin-4-Ylmethyl)-Amino]-Butyric Acid N-Ethyl-N-[3-(Propylamino)Propyl]Propane- 1,3-Diamine Azelaic Acid N-(Iminomethyl)-L-Glutamic Acid 3-[(4S)-2,5-Dioxoimidazolidin-4-Yl]Propanoic Acid 1-Methyl-1,3-Dihydro-2H-Imidazole-2-Thione 2-[(2E,6E,10E,14E,18E,22E,26E)-3,7,11,15,19,23,27,31- Octamethyldotriaconta-2,6,10,14,18,22,26,30- Octaenyl]Phenol Allantoate Ion Co-5-Methoxybenzimidazolylcobamide Didecyl-Dimethyl-Ammonium Didecyl-Dimethyl-Ammonium Pantoyl Adenylate Pantoyl Adenylate Trans-2-Hexadecenoyl-(N-Acetyl-Cysteamine)- Thioester 5-Phosphoribosyl-1-(Beta-Methylene) Pyrophosphate 5-{[4-(9H-Fluoren-9-Yl)Piperazin-1-Yl]Carbonyl}- 1H-Indole 10 Description 9 Unique PSI Ligands Ligand CF5 I3A 3SL OHA M7P ABF DON 3OH MP5 FT6 STH TLP B33 AZ1 NIG DI6 MMZ 8PP 1AL B1M 10A 10A PAJ PAJ THT PPC GEQ PSI CESG JCSG JCSG JCSG MCSG MCSG MCSG MCSG NESG NESG NESG NYSGXRC NYSGXRC NYSGXRC NYSGXRC NYSGXRC NYSGXRC NYSGXRC NYSGXRC SECSG TBSGC TBSGC TBSGC TBSGC TBSGC TBSGC TBSGC Target PDB CL6107A 2ICH Description Organism Putative ATTH (NP_841447.1) at 2.00 A Nitrosomonas Europaea Ligand Clostridium Acetobutylicum 3SL TM0160 1VJL Thermotoga Maritima UNL TM0449 1KQ4 Thy1-complementing Protein at 2.25 A Thermotoga Maritima FAD TM0574 1VKY S-adenosylmethionine Trna Ribosyltransferase at 2.00 A Thermotoga Maritima UNL TM1394 1VQ0 33 kDa Chaperonin (heat Shock Protein 33 Homolog) at 2.20 A Thermotoga Maritima UNL TM1464 1VKM Conserved Hypothetical Protein Possibly Involved in Carbohydrate Metabolism at 1.90 A Thermotoga Maritima Msb8 UNL TM1506 1VK9 TM1553 1VRM Hypothetical Protein at 1.58 A Hypothetical Protein at 2.70 A 2ICH Thermotoga Maritima Indole-3-Carboxaldehyde (I3A) bound to the structure of tellurite resistance protein of cog3793 (zp_00109916.1) from Nostoc Punctiforme PCC 73102 (2OU3) FB8805A (2Q9K) Unknown protein Unknown Ligands (UNL) FK9436A (2OH1) Acetyltransferase Gnat family Binding Modes of Ligands There are over 340 structures in PDB with the co-factor Flavin Mononucleotide (FMN) bound to the protein The binding poses of FMN display considerable variations due to the torsional flexibility in the molecule. However, unique binding poses can be observed in proteins belonging to specific PFAM families. PF01243 (Pyridoxamine 5'phosphate oxidase) UNL Thermotoga Maritima Msb8 UNL 1VR0 1VJL 1KQ4 1VKY Ligand Visualization Links HIC-Up: 1y30 ACY ADP AMP BR CA CL EDO FMN GLC GOL IOD MG NCA NI ORO P33 PO4 SO4 1VQ0 1VKM 1VK9 1VRM Ligand Depot: ACY ADP AMP BR CA CL EDO FMN GLC GOL IOD MG NCA NI ORO P33 PO4 SO4 GNF & TSRI (Crystallomics Core) Scott Lesley Thomas Clayton Marc Deller Polat Abdubek Julie Feuerhelm Hope Johnson Sebastian Sudek Glen Spraggon Charlene Cho Jessica Canseco Mark Knuth Dennis Carlton Kevin D. Murphy Christina Trout Daniel McMullan Heath Klock Claire Acosta Linda M. Columbus Joanna C. Hale Thamara Janaratne Linda Okach Edward Nigoghossian Aprilfawn White Bernhard Geierstanger Ylva Elias Sanjay Agarwalla Bi-Ying YehAnna Grzechnik Mimmi Brown UCSD & Burnham (Bioinformatics Core) John Wooley Adam Godzik Slawomir Grzechnik Lukasz Jaroszewski Dana Weekes Lian Duan Sri Krishna Subramanian Natasha Sefcovic Piotr Kozbial Andrew Morse Prasad Burra Tamara Astakhova Josie Alaoen Cindy Cook TSRI (NMR Core) Kurt Wüthrich Reto Horst Maggie Johnson Amaranth Chatterjee Michael Geralt Wojtek Augustyniak Pedro Serrano Bill Pedrini William Placzek Stanford /SSRL Structure Determination Core Keith Hodgson Mitchell Miller Hsiu-Ju (Jessica) Chiu Christopher Rife Silvya Oommachen Henry van den Bedem Christine Trame Ashley Deacon Debanu Das Kevin Jin Qingping Xu Scott Talafuse Ronald Reyes Scientific Advisory Board Sir Tom Blundell Univ. Cambridge Homme Hellinga Duke University Medical Center James Naismith The Scottish Structural Proteomics facility Univ. St. Andrews Soichi Wakatsuki Photon Factory, KEK, Japan Proteomics James Wells UC San Francisco 10-Oxohexadecanoic acid (OHA) bound to the structure of Ferredoxin-like Protein (JCVI_PEP_1096682647733) from an environmental metagenome (Unidentified Marine Microbe) (2OD6) NHE TB0797A 1VR0 Putative 2-phosphosulfolactate Phosphatase at 2.6 A Predicted Protein related to Wound Inducive Proteins in Plants at 1.90 A (R)-2-Hydroxy-3-Sulfopropanoic acid (3SL) bound to the structure of putative 2-phosphosulfolactatetitle 2 phosphatase from Clostridium Acetobutylicum (1VR0) 11 Ligands bound to JCSG new folds Unique Ligands Robert Stroud Center for Structure of Membrane Proteins Membrane Protein Expression Center UC San Francisco James Paulson Consortium for Functional Glycomics The Scripps Research Institute Todd Yeates UCLA-DOE Inst. for Genomics and The JCSG is supported by the NIH Protein Structure Initiative (PSI) Grant U54 GM074898 from NIGMS (www.nigms.nih.gov). Portions of this research were carried out at the Stanford Synchrotron Radiation Laboratory (SSRL). The SSRL is a national user facility operated by Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the NIH. PF01613 (Flavin reductase like domain) Number of Structures PFAM PSI Non-PSI Total PF01243 7 14 21 PF01613 2 8 10 PF04289 3 0 3 PF04289 (Unknown Function DUF447)