UK e-Science AHM 2005 Drug Discovery Grid -- A real grid application Zhang Wenju, Shen Jianhua Shanghai Institute of Materia Medica, CAS Shanghai Jiaotong University Jiangnan Institute of Computing The University of Hong Kong Agenda 1. DDGrid Introduction 2. DDGrid Architecture 3. DDGrid Resources 4. DDGrid Demo UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Background Large-scale High-throughput Virtual Screening in Silico The computational analysis of chemical databases to identify compounds appropriate for a given biological receptor in Vitro the progressive optimization of these leads to yield a compound with improved potency and physicochemical properties in vitro in Vivo eventually, improved efficacy, pharmacokinetics, and toxicological profiles in vivo. UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Process of Drug Discovery and Design Leads and Opt. 2-3 years Random Screening 10, 000 ~ 20, 000 Compounds 2-3 years Drug Candidate Pre-clinic 2-3 years Computer -Aid Drug Design Clinic (phase I, II, III) 3-4 years Time: 10-12 years Money: several billion dollars Market UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 DDGrid overview ◆ Drug Discovery Grid project aims to build a collaboration platform for drug discovery using the state-of-the-art P2P and Grid computing technology. ◆ This project intends to solve large-scale computation and data intensive scientific applications in the fields of medicine chemistry and molecular biology with the help of grid middleware developed by our team. ◆ Over one million compounds database with 3-D structure and physicochemical properties are also provided to identify potential drug candidates. Users also can build and maintain their own customized ligand database to share in this grid platform. UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 DDGrid Architecture Application Users Web Portal Application Developers Command Line Tools C++/Java API Administrators Web Services Resources Management (users, data, applications, job, workflow, sites) Data (meta-data, data sources, data integration) Services (authentication, authorization, data movement, job scheduling, resource monitoring, result collection) AMD64 Cluster Dawning 4000A Alpha Cluster Sunway Cluster UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 DDGrid Architecture UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 DDGrid Architecture UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 DDGrid Workflow Job Submit ID and Result Return Global Server (Monitoring, Work Pool, Resource Manag., Assimilate of Result) Job Dispatch Return of Result, New job request Slave Server (Local Resource Manag., Monitoring, Local Work Pool, Assimilate of Result) Job Dispatch xml Return of Result, New job request Computational Client (Docking) UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 DDGrid security 1. PKI-based security 2. All the sites involved should hold a certification issued by our CA 3. All the databases deployed and results are encrypted 4. All the message passing are SSL/TLS-enabled UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 DDGrid message passing <scheduler_request> <authenticator>3333</authenticator> <hostid>102</hostid> <rpc_seqno>2401</rpc_seqno> <platform_name>i686-pc-linux-gnu</platform_name> <core_client_major_version>2</core_client_major_version> <core_client_minor_version>19</core_client_minor_version> <idle_ncpu>16</idle_ncpu> <project_disk_usage>5315768.000000</project_disk_usage> <total_disk_usage>68417940.000000</total_disk_usage> <code_sign_key> … </code_sign_key> <projects> <project> <master_url>http://www.ddgrid.ac.cn/ddg/</master_url> <resource_share>100.000000</resource_share> </project> </projects> <result> … </result> … <host_info> … </host_info> </scheduler_request> UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 DDGrid message passing <scheduler_reply> <message priority="low">No work available</message> <project_name>Ddg</project_name> <user_name>sss</user_name> <code_sign_key> … </code_sign_key> … <workunit> … </workunit> <preferences> <low_water_days>1.2</low_water_days> <high_water_days>2.5</high_water_days> <disk_max_used_gb>0.4</disk_max_used_gb> <disk_max_used_pct>50</disk_max_used_pct> <disk_min_free_gb>0.4</disk_min_free_gb> … </preferences> … </scheduler_reply> UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 DDGrid message passing <workunit> <file_info> <number>0</number> </file_info> <file_info> <number>1</number> </file_info> <file_info> <number>2</number> </file_info> … <file_ref> <file_number>0</file_number> <open_name>tabfile</open_name> </file_ref> <file_ref> <file_number>1</file_number> <open_name>infile</open_name> </file_ref> <file_ref> <file_number>2</file_number> <open_name>sphfile</open_name> </file_ref> <command_line>-business</command_line> </workunit> UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 DDGrid message passing … <project> <scheduler_url>http://www.ddgrid.ac.cn/ddg_cgi/cgi</scheduler_url> <master_url>http://www.ddgrid.ac.cn/ddg/</master_url> <project_name>Ddg</project_name> </project> <app> <name>gridapp</name> </app> <file_info> <name>gridapp/gridapp_2.19_i686-pc-linux-gnu</name> <nbytes>260754.000000</nbytes> <max_nbytes>0.000000</max_nbytes> <executable/> <signature_required/> <file_signature> … </file_signature> <url>http://www.ddgrid.ac.cn/ddg/download/gridapp_2.19_i686-pc-linux-gnu</url> </file_info> <file_info> … </file_info> … UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 DDGrid Resources Computational and Data Resources Integration Resources aggregated (8 sites, 5 cities) SIMM Sunway 32A Cluster Beijing Molecule Inc. Sunway 256P Cluster HKU Gideon 300 Cluster SSC Dawning 4000A LeSC Mars Cluster (Test only) Shanghai Jiaotong Univ. IBM e1350 cluster Singapore Poly-tech Univ. Rock cluster Dalian Univ. of Technology Dawning 4000A Heterogeneous resources OS: IRIX, Digital Unix, Linux(IA32, x86_64) CPU:R12000, Alpha, Pentium, AMD UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 DDGrid Resources DDGrid Apps. Fixed CDB start Preprocess Dock Drug-like Analysis New CDB Exper iment CDB Gen. CDB Para. end 1. Docking pre-process software Combimark Input 2. Docking software File 1) Dock UCSF 2) gsDock SIMM 3. CDB build and maintain S/W Combilib 4. AutoDock 5. AutoGrid 6. Visualisation & structure search 7. Security-related tools UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 DDGrid Resources Chemical Databases (CDB) Each ligand record in a chemical database represents the 3D structural information of a compound. The numbers of compounds in each CDB can be in the order of tens of thousands and the database size be anywhere from tens of megabytes to gigabytes and even terabytes. 1. static databases purchased from commercial chemical company. Available Chemical Directory (ACD) Chinese natural product database (CNPD) SPECS database chemical ADME/T database, etc. 2. dynamic databases made by user own, and deployed automatically. UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Deployed commercial CDB (appr.700,000) Name of Database Description Specs Provides about 230,000 compounds CMC-3D Provides 3D models and important biochemical properties (including drug class, logP, and pKa values) for over 8,400 pharmaceutical compounds. ACD-3D Provides 200,000 3D compounds commercial available NCI-3D 213,000compounds with 2D information from the National Cancer Institute CNPD Collected 12,000 Chinese natural products with chemical structure TCMD With 9127 compounds and 3922 herbs UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 appr. 3,300,000 compounds Vendor Num. of Mol. Vendor Num. of Mol. ACB-Eurochem 98603 Maybridge 53042 Ambinter 533866 Nanosyn 68317 Asinex 293385 National Cancer Institute 223536 ChemBridge 562624 Otava 181195 ChemDiv 361859 Peakdale 9632 ComGenex 38590 Pharmeks 116355 Enamine 533111 PubChem 164031 IBScreen 452728 Ryan Scientific 64205 InterChim 288882 Sigma-Aldrich 49022 KeyOrganics 22294 Specs 307550 Life Chemicals 44762 TimTec 127173 UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 CDB example:CNPD-China Natural Products Database UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 CDB example:CNPD CNPD: The first comprehensive source of chemical, structural and bibliographic data on almost all known natural products in China. CNPD serves as information sources for chemical, physical and biological properties, literature, they are useful to scientists within the pharmaceutical industry. CNPD can be searched in flexible ways: structure, sub-structure, name, molecular formula, molecular weight, CAS register number, category, etc. CNPD: Traditional Chinese Medicine (TCM) applications are preindexed in CNPD to provide hints for lead compounds discovery. UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 CDB example:CNPD UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 CDB example:TCMD TCMD-Traditional Chinese Medicine Database TCMD is a bibliographical database of approximately 20,000 records with abstracts of TCM articles. Relevant articles are selected from among 150-200 journals from Mainland China, Taiwan, and Hong Kong (most of them are Chinese); English abstracts are written for the selected articles and other pertinent information is translated into English. UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 CDB example:TCMD UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 DDGrid applications in reality SIMM carried out anti-SARS and anti-diabetes drug research using the DDGrid 1. Anti-SARS drug research 2. Anti-diabetes drug research UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Research on Anti-SARS medicine Virtual screening from Comprehensive Medicinal Chemistry3D (CMC-3D) database which contains 7,900 compounds, found that cinanserin have distinct anti-SARS effect Department of Virology, Bernhard-Nocht-Institute for Tropical Medicine, Germany Research Department, Cantonal Hospital St Gallen, Switzerland “Basically your inhibitor turned out to be the best compound we have tested so far! ” Have applied for domestic patent 03129071.x and PCT patent pi034248 UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Research on anti-diabetes medicine Found an antidiabetes lead better than Rosiglitazone. by targeting on PPAR, through virtual screening, optimization design and synthesis and biology and pharmacology testing 10M 2400000 800,000 1M 200,000 100k 10k 10,000 500 1k 300 100 10 14 138 142 76 CADD process UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 New anti-diabetes drug Current Progress 1. Applied for patent 200410016460.X,and PCT patent 2. Security testing and pre-clinic research UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 What does the DDGrid provide? 1、 Drug Design Collaboration Platform Large-scale Virtual Screening platform sharing large CDB 2、Computational Resources Sharing SIMM/SSC/HKU/Mol. Ltd/SJTU/DUT 3、Data Resources Sharing pre-deployed commercial CDB (ACD/CNPD …) shared self-made CDB 4、Medicinal chemistry text and structure search 5、Customization and Extension UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Collaboration Selected Users of DDGrid UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Demo DDGrid Demo http://www.ddgrid.ac.cn UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 DDGrid Web Portal UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Test Case 1 Virtual Screening from 20,000 compounds Involved Sites: Shanghai Inst. of M. M. (SIMM) Alpha Cluster (32CPU) Beijing Mol. Ltd. Sunway Cluster (224CPU) The Univ. of Hong KongGideon Cluster (16CPU) Shanghai SuperComp. Centre Dawning 4000A Dalian Univ. of Tech. Dawning 4000A London e-Science Centre Mars Cluster Time consumed: 5946 sec(appr. 99 min) Data Sets (CDB): Specs UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Job scheduling UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Visualisation of Docking Result UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 CDB Structure Search UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 CDB Structure Search UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 CDB Structure Search UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Demo UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Demo UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Demo UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Demo UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Demo UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Acknowledgements This work has been supported by National High-Tech Research and Development Project of China (863 program), under contract No. 2004AA104270 Many thanks to generously resource providers: SIMM HKU SJTU Molecule Ltd. SSC DLUT Involved Persons: Shen Jianhua Ma Fanyuan Zhang Jun Zhang Wenju Chang Yan Chen Shudong Du Xuefeng Li Zhuhua Liu Fei Wan Ju Jiang Maojun … UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005 Q&A Thank you! UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005