Drug Discovery Grid -- A real grid application - National e

advertisement
UK e-Science AHM 2005
Drug Discovery Grid
-- A real grid application
Zhang Wenju, Shen Jianhua
Shanghai Institute of Materia Medica, CAS
Shanghai Jiaotong University
Jiangnan Institute of Computing
The University of Hong Kong
Agenda
1. DDGrid Introduction
2. DDGrid Architecture
3. DDGrid Resources
4. DDGrid Demo
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Background
Large-scale High-throughput Virtual Screening
 in Silico
The computational analysis of chemical databases to identify
compounds appropriate for a given biological receptor
 in Vitro
the progressive optimization of these leads to yield a compound
with improved potency and physicochemical properties in vitro
 in Vivo
eventually, improved efficacy, pharmacokinetics, and
toxicological profiles in vivo.
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Process of Drug Discovery and Design
Leads and Opt.
2-3 years
Random Screening
10, 000 ~ 20, 000
Compounds
2-3 years
Drug Candidate
Pre-clinic
2-3 years
Computer
-Aid Drug
Design
Clinic
(phase I, II, III)
3-4 years
Time: 10-12 years
Money: several billion dollars

Market
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
DDGrid overview
◆ Drug Discovery Grid project aims to build a collaboration platform for drug
discovery using the state-of-the-art P2P and Grid computing technology.
◆ This project intends to solve large-scale computation and data intensive
scientific applications in the fields of medicine chemistry and molecular
biology with the help of grid middleware developed by our team.
◆ Over one million compounds database with 3-D structure and
physicochemical properties are also provided to identify potential drug
candidates. Users also can build and maintain their own customized ligand
database to share in this grid platform.
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
DDGrid Architecture
Application
Users
Web Portal
Application
Developers
Command Line
Tools
C++/Java API
Administrators
Web Services
Resources Management (users, data, applications, job, workflow, sites)
Data (meta-data, data sources, data integration)
Services (authentication, authorization, data movement, job scheduling,
resource monitoring, result collection)
AMD64
Cluster
Dawning
4000A
Alpha
Cluster
Sunway
Cluster
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
DDGrid Architecture
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
DDGrid Architecture
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
DDGrid Workflow
Job Submit
ID and Result Return
Global Server (Monitoring, Work Pool,
Resource Manag., Assimilate of Result)
Job Dispatch
Return of Result,
New job request
Slave Server (Local Resource Manag.,
Monitoring, Local Work Pool, Assimilate of
Result)
Job Dispatch
xml
Return of Result,
New job request
Computational Client (Docking)
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
DDGrid security
1. PKI-based security
2. All the sites involved should hold a certification
issued by our CA
3. All the databases deployed and results are encrypted
4. All the message passing are SSL/TLS-enabled
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
DDGrid message passing
<scheduler_request>
<authenticator>3333</authenticator>
<hostid>102</hostid>
<rpc_seqno>2401</rpc_seqno>
<platform_name>i686-pc-linux-gnu</platform_name>
<core_client_major_version>2</core_client_major_version>
<core_client_minor_version>19</core_client_minor_version>
<idle_ncpu>16</idle_ncpu>
<project_disk_usage>5315768.000000</project_disk_usage>
<total_disk_usage>68417940.000000</total_disk_usage>
<code_sign_key> … </code_sign_key>
<projects>
<project>
<master_url>http://www.ddgrid.ac.cn/ddg/</master_url>
<resource_share>100.000000</resource_share>
</project>
</projects>
<result> … </result>
…
<host_info> … </host_info>
</scheduler_request>
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
DDGrid message passing
<scheduler_reply>
<message priority="low">No work available</message>
<project_name>Ddg</project_name>
<user_name>sss</user_name>
<code_sign_key> … </code_sign_key>
…
<workunit>
…
</workunit>
<preferences>
<low_water_days>1.2</low_water_days>
<high_water_days>2.5</high_water_days>
<disk_max_used_gb>0.4</disk_max_used_gb>
<disk_max_used_pct>50</disk_max_used_pct>
<disk_min_free_gb>0.4</disk_min_free_gb>
…
</preferences>
…
</scheduler_reply>
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
DDGrid message passing
<workunit>
<file_info> <number>0</number> </file_info>
<file_info> <number>1</number> </file_info>
<file_info> <number>2</number> </file_info>
…
<file_ref>
<file_number>0</file_number>
<open_name>tabfile</open_name>
</file_ref>
<file_ref>
<file_number>1</file_number>
<open_name>infile</open_name>
</file_ref>
<file_ref>
<file_number>2</file_number>
<open_name>sphfile</open_name>
</file_ref>
<command_line>-business</command_line>
</workunit>
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
DDGrid message passing
…
<project>
<scheduler_url>http://www.ddgrid.ac.cn/ddg_cgi/cgi</scheduler_url>
<master_url>http://www.ddgrid.ac.cn/ddg/</master_url>
<project_name>Ddg</project_name>
</project>
<app>
<name>gridapp</name>
</app>
<file_info>
<name>gridapp/gridapp_2.19_i686-pc-linux-gnu</name>
<nbytes>260754.000000</nbytes>
<max_nbytes>0.000000</max_nbytes>
<executable/>
<signature_required/>
<file_signature> … </file_signature>
<url>http://www.ddgrid.ac.cn/ddg/download/gridapp_2.19_i686-pc-linux-gnu</url>
</file_info>
<file_info> … </file_info>
…
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
DDGrid Resources
Computational and Data Resources Integration
Resources aggregated (8 sites, 5 cities)
SIMM Sunway 32A Cluster
Beijing Molecule Inc. Sunway 256P Cluster
HKU Gideon 300 Cluster
SSC Dawning 4000A
LeSC Mars Cluster (Test only)
Shanghai Jiaotong Univ. IBM e1350 cluster
Singapore Poly-tech Univ. Rock cluster
Dalian Univ. of Technology Dawning 4000A
Heterogeneous resources
OS: IRIX, Digital Unix, Linux(IA32, x86_64)
CPU:R12000, Alpha, Pentium, AMD
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
DDGrid Resources
DDGrid Apps.
Fixed CDB
start
Preprocess
Dock
Drug-like
Analysis
New
CDB
Exper
iment
CDB
Gen.
CDB
Para.
end
1. Docking pre-process software
Combimark
Input
2. Docking software
File
1) Dock UCSF
2) gsDock SIMM
3. CDB build and maintain S/W
Combilib
4. AutoDock
5. AutoGrid
6. Visualisation & structure search
7. Security-related tools
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
DDGrid Resources
Chemical Databases (CDB)
Each ligand record in a chemical database represents the 3D structural information of
a compound. The numbers of compounds in each CDB can be in the order of tens of
thousands and the database size be anywhere from tens of megabytes to gigabytes and
even terabytes.
1. static databases
purchased from commercial chemical company.
Available Chemical Directory (ACD)
Chinese natural product database (CNPD)
SPECS database
chemical ADME/T database, etc.
2. dynamic databases
made by user own, and deployed automatically.
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Deployed commercial CDB (appr.700,000)
Name of
Database
Description
Specs
Provides about 230,000 compounds
CMC-3D
Provides 3D models and important biochemical properties
(including drug class, logP, and pKa values) for over 8,400
pharmaceutical compounds.
ACD-3D
Provides 200,000 3D compounds commercial available
NCI-3D
213,000compounds with 2D information from the National
Cancer Institute
CNPD
Collected 12,000 Chinese natural products with chemical
structure
TCMD
With 9127 compounds and 3922 herbs
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
appr. 3,300,000 compounds
Vendor
Num. of Mol.
Vendor
Num. of Mol.
ACB-Eurochem
98603
Maybridge
53042
Ambinter
533866
Nanosyn
68317
Asinex
293385
National Cancer Institute
223536
ChemBridge
562624
Otava
181195
ChemDiv
361859
Peakdale
9632
ComGenex
38590
Pharmeks
116355
Enamine
533111
PubChem
164031
IBScreen
452728
Ryan Scientific
64205
InterChim
288882
Sigma-Aldrich
49022
KeyOrganics
22294
Specs
307550
Life Chemicals
44762
TimTec
127173
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
CDB example:CNPD-China Natural Products Database
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
CDB example:CNPD
CNPD: The first comprehensive source of chemical, structural and
bibliographic data on almost all known natural products in China.
CNPD serves as information sources for chemical, physical and
biological properties, literature, they are useful to scientists within the
pharmaceutical industry.
CNPD can be searched in flexible ways: structure, sub-structure, name,
molecular formula, molecular weight, CAS register number, category, etc.
CNPD: Traditional Chinese Medicine (TCM) applications are preindexed in CNPD to provide hints for lead compounds discovery.
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
CDB example:CNPD
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
CDB example:TCMD
TCMD-Traditional Chinese Medicine Database
TCMD is a bibliographical database of approximately 20,000
records with abstracts of TCM articles. Relevant articles are
selected from among 150-200 journals from Mainland China,
Taiwan, and Hong Kong (most of them are Chinese); English
abstracts are written for the selected articles and other pertinent
information is translated into English.
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
CDB example:TCMD
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
DDGrid applications in reality
SIMM carried out anti-SARS and anti-diabetes drug
research using the DDGrid
1. Anti-SARS drug research
2. Anti-diabetes drug research
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Research on Anti-SARS medicine
Virtual screening from Comprehensive Medicinal Chemistry3D (CMC-3D) database which contains 7,900 compounds,
found that cinanserin have distinct anti-SARS effect
Department of Virology, Bernhard-Nocht-Institute for Tropical Medicine, Germany
Research Department, Cantonal Hospital St Gallen, Switzerland
“Basically your inhibitor turned out to be the best compound we have
tested so far! ”
Have applied for domestic patent 03129071.x and PCT patent pi034248
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Research on anti-diabetes medicine
Found an antidiabetes lead better
than Rosiglitazone.
by
targeting on PPAR,
through virtual
screening,
optimization design
and synthesis and
biology and
pharmacology testing
10M
2400000
800,000
1M
200,000
100k
10k
10,000
500
1k
300
100
10
14
138
142
76
CADD process
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
New anti-diabetes drug
Current Progress
1. Applied for patent 200410016460.X,and
PCT patent
2. Security testing and pre-clinic research
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
What does the DDGrid provide?
1、 Drug Design Collaboration Platform
Large-scale Virtual Screening platform
sharing large CDB
2、Computational Resources Sharing
SIMM/SSC/HKU/Mol. Ltd/SJTU/DUT
3、Data Resources Sharing
pre-deployed commercial CDB (ACD/CNPD …)
shared self-made CDB
4、Medicinal chemistry text and structure search
5、Customization and Extension
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Collaboration
Selected Users of DDGrid
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Demo
DDGrid Demo
http://www.ddgrid.ac.cn
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
DDGrid Web Portal
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Test Case 1
Virtual Screening from 20,000 compounds
Involved Sites:
Shanghai Inst. of M. M. (SIMM) Alpha Cluster (32CPU)
Beijing Mol. Ltd.
Sunway Cluster (224CPU)
The Univ. of Hong KongGideon Cluster (16CPU)
Shanghai SuperComp. Centre Dawning 4000A
Dalian Univ. of Tech.
Dawning 4000A
London e-Science Centre
Mars Cluster
Time consumed:
5946 sec(appr. 99 min)
Data Sets (CDB):
Specs
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Job scheduling
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Visualisation of Docking Result
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
CDB Structure Search
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
CDB Structure Search
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
CDB Structure Search
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Demo
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Demo
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Demo
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Demo
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Demo
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Acknowledgements
This work has been supported by National High-Tech Research and
Development Project of China (863 program), under contract No.
2004AA104270
Many thanks to generously resource providers:
SIMM
HKU
SJTU
Molecule Ltd.
SSC
DLUT
Involved Persons:
Shen Jianhua
Ma Fanyuan
Zhang Jun
Zhang Wenju
Chang Yan
Chen Shudong
Du Xuefeng
Li Zhuhua
Liu Fei
Wan Ju
Jiang Maojun
…
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Q&A
Thank you!
UK e-Science AHM’05, Nottingham, Sept. 19-22, 2005
Download