“Grid Platform for Drug Discovery” Project Mitsuhisa Sato Center for Computational Physics, University of Tsukuba, Japan UK Jpana N+N 2003/10/3 1 Our Grid Project • JST-ACT program: “Grid platform for drug discovery”, funded by JST(Japan Science and Technology Corporation), 1.3 M$/ 3 years started from 2001 – Tokushima University, Toyohashi Inst. Of Tech., University of Tsukuba, Fuji Res. Inst. Corp. • ILDG: International Lattice QCD Data Grid – CCP, U. of Tsukuba, EPCC UK, SciDAC US. – Design of QCDML – QCD Meta database by web services, QCD data sharing by SRM and Globus replica … UK Jpana N+N 2003/10/3 2 High Throughput Computing for drug discovery • Exhaustive parallel conformation search and docking over Grid • Accumulation computing results into large scale database and reuse • High performance ab initio MO calculation for large molecules on clusters “Combinatorial Computing” Using Grid UK Jpana N+N 2003/10/3 3 Grid applications of our drug-discovery • Conformation search : find possible confirmations • Docking search: compute energy of combination of molecules • Quantitative Structure-Activity Relationships (SQAR) analysis: finding rules of drug design Drug libraries Conformations target Docking Search Conformation Search CONFLEX-G Grid enabled Conformation Search application UK Jpana N+N Docking Computation results (using ab initio MO calculation) MO in clusters Job submission for MO Coarse-grain MO for Grid (REMD, FMO) 2003/10/3 4 QSAR analysis Design of XML for results Web service interface CONFLEX • Algorithm: tree search – Local conformation changes – Initial conformation selection • We are implementing with OmniRPC – Tree search action is dynamic!!! Conformation search tree Corner Flap Edge Flip Gauche E=0.9 kcal/mol Stepwise Rotation Gauche + E=0.9 kcal/mol Anti E=0.0 kcal/mol UK Jpana N+N 2003/10/3 5 Gird Platform for drug discovery Univ. of Tsukuba AIST Control & monitoring •Scheduling and monitoring of computations •distributed data base request management •Design of Grid middleware Development of large-scale ab-initio MO program Database of MO calculation results request request request Toyohashi Inst. Of Tech. Tokushima Univ. wide-area network Cluster for CONFLEX development of conformation search 3D structure database for drug design program (CONFLEX) Database for CONFLEX results UK Jpana N+N 2003/10/3 6 What can Grid do? Parallel Applications, programming, and our view for grid • “Typical” Grid Applications – Parametric execution: Execute the same program with different parameters using an large amount of computing resources – master-workers type of parallel program Our View • “Typical” Grid Resources – A Cluster of Clusters: some PC Clusters are available – Dynamic resources: load and status are changed time-to-time. Grid Environment PC PC PC PC PC Cluster PC PC PC PC PC Cluster PC PC PC PC PC Cluster UK Jpana N+N 2003/10/3 7 Parallel programming in Grid – Using Globus shell (GSH) • Submit batch job scripts to remote nodes • staging and workflow – Grid MPI (MPICH-G, PACX MPI, …) • General-purpose, but difficult and error-prone • No support for dynamic resource and fault-tolerance • No support for Firewall, clusters with private network. – Grid RPC • a good and intuitive programming interface • Ninf, NetSolve, … OmniRPC UK Jpana N+N 2003/10/3 8 Overview of OmniRPC A Grid RPC system for parallel computing • Provide seamless parallel programming environment from clusters to grid. – It use “rsh” for a cluster, “GRAM” for a grid managed by Globus, “ssh” for a conventional remote nodes. – Program development and testing in PC clusters – Product run in Grid to exploit huge computing resources – User can switch configuration with “host file” without any modification • Make use of remote clusters of PC/SMP as Grid computing resource – Support for clusters in firewall and private address Host file PC <? xml version=“1.0 ?> <OmniRpcConfig> <Host name=“dennis.omni.hpcc.jp” > <Agent invoker=“globus” mxio=“on”/> <JobScheduler type=“rr” maxjob=“20”/> </Host> </OmniRpcConfig> UK Jpana N+N 2003/10/3 PC PC PC Grid Environment PC PC PC PC PC PC Cluster PC Cluster PC PC PC Cluster PC PC PC Client 9 PC PC PC Cluster Overview of OmniRPC (cont.) • Easy-to-use parallel programming interface – A gridRPC based on Ninf Grid RPC – Parallel programming using asynchronous call API – The thread-safe RPC design allows to use OpenMP in client programs • OmniRpcRequest reqs[100]; OmniRpcInit(&argc, &argv); Support Master-workers parallel programs for parametric search grid applications – Persistent data support in remote workers for applications which requires large data • int main(int argc, char **argv) { int i, A[100][100],B[100][100][100],C[100][100][1 Monitor and performance tools UK Jpana N+N 2003/10/3 10 } for(i = 0; i< 100; i++) reqs[i] = OmniRpcCallAsync(“mul”,100, B[i], A OmniRpcWaitAll(100,reqs); . OmniRpcFinalize(); return 0; OmniRPC features • need Globus? – No, you can use “ssh” as well as “globus” – It is very useful for an application people. – “ssh” can solve “firewall” problem. • Data persistence model? – Parameter search type application need to share the initial data. – OmniRPC support it. • Can use many (remote) clusters? – Yes, OmniRPC supports “cluster of clusters”. • How to use in different machine and environment ? – You can switch the configuration by “config file” without modification on source program. • Why not “Grid PRC” standard? – OmniRPC provides high level interface, to avoid “scheduling” and “faulttolerance” from users. UK Jpana N+N 2003/10/3 11 OmniRPC Home Page http://www.omni.hpcc.jp/omnirpc/ UK Jpana N+N 2003/10/3 12 Conflex from Cluster to Grid • For large bimolecules, the number of combinational trial structure will be huge! • Geometry optimization of large molecular structures requires more time to compute! • Geometry optimization phase takes more than 90% in total execution time • So far, executed on PC Cluster by using MPI Grid allows to use huge computing resources to overcome these problem! UK Jpana N+N 2003/10/3 13 Our Grid Platform Univ. of Tsukuba Dennis Cluster Dual P4 Xeon 2.4GHz 10 nodes Alice Cluster Dual Athlon 1800+ 14 nodes Toyohashi Univ. of Tech. Toyo Cluster Dual Athlon 2000+ 8 nodes Tsukuba WAN Tokushima Univ. Toku Cluster P3 1.0GHz 8 nodes SINET AIST UME Cluster Dual P3 1.4GHz 32 nodes UK Jpana N+N 2003/10/3 14 Summary of Our Grid Environment Cluster Machine overview # of Nodes Throughput (MB/s)# RTT* (ms)# Dennis Dual P4 Xeon 2.4GHz 10 - - Alice Dual Athlon 1800+ 14 0.18 11.22 Toyo Dual Athlon 1800+ 8 13.00 0.55 Toku P3 1GHz 8 24.40 0.69 UME Dual P3 1.4GHz 32 2.73 2.12 *Round-Trip Time # All measurement Dennis Cluster and Each Cluster UK Jpana N+N 2003/10/3 15 CONFLEX-G:Grid enabled CONFLEX • Parallelize molecular geometry optimization phase using Master/Worker model. • OmniRPC persistent data model (automatic initializable remote module facility) allows to reuse workers for each call. – Eliminate initializing worker program at every PRC. Selection of Initial Structure Local Perturbation Geometry Optimization Comparison & Store PC Cluster A PC PC PC PC PC PC PC PC PC Cluster B PC Cluster C Conformation Database UK Jpana N+N PC PC PC PC 2003/10/3 16 Experiment Setting • CONFLEX’s version: 402q • Test data: Two Molecular samples – C17 (51 atoms) – AlaX16a (181 atoms). • Authentication method :SSH • CONFLEX-G client program was executed on the server node of Dennis cluster • We used all nodes in clusters of our grid UK Jpana N+N 2003/10/3 17 Sample Molecules # of trial structure at one opt. phase (degree of parallelism) data C17 (51 atoms) AlaX16a (181 atoms) UK Jpana N+N Average exec. # of opt. time to opt. trial trial structures structure (s) 48 1.6 160 2003/10/3 300 18 Estimated total exec.time for all Trial structures in Dennis’s Single CPU (s) 522 835 320 96000 = 26.7(h) Comparison between OmniRPC and MPI in Dennis Cluster C17 (51 atoms, degree of parallelism 48) 10 times Speedup using OmniRPC Total execution Time (s) 1200 1000 800 Sequential MPI OmniRPC Overhead of On-Demand Initialization of worker program in OmniRPC 600 400 200 0 1 2 4 8 Number of Workers UK Jpana N+N 2003/10/3 19 16 20 Execution time of AlaX16a (181 atoms, degree of parallelism 160) Dennis+Alice+UME(112w) Alice+UME(92w) Dennis+UME(84w) Dennis+Alice(48w) UME(64w) Alice(28w) Dennis(20w) Dennis MPI(20w) 0 64 times Speedup 500 1000 1500 2000 2500 3000 3500 Total execution time (s) UK Jpana N+N 2003/10/3 20 Discussion • Performance of CONFLEX-G was observed to be almost equals to that of CONFLEX with MPI – Overheads to initialize workers was found. It will be required to imporve. • We could achieve performance improvement using multiple clusters, – A speedup of 64 on 112 workers in AlaX16a(181 atoms) – However … , In our experiment: • Each workers takes only one or two trial structures, too few! • Load in-balance occurs because exec. time of each opt. varies. • We expect more speed up for larger molecule. UK Jpana N+N 2003/10/3 21 Discussion (cont’d) • Possible improvement: – Exploit more parallelism • Parallelize the outer loop to increase the number of structure optimization at a time – Efficient Job Scheduling • Heavy jobs -> fast machines • light jobs -> slow machines – Can we estimate execution time ? – Parallelize worker program by SMP(OpenMP) • Increase the performance of worker • Reduce the number of workers UK Jpana N+N 2003/10/3 22 Summary and Future work • Conflex-G: Grid-enabled molecular confirmation search. – We used OmniRPC to make it grid-enabled. – We are actually doing product-run.. • For MO simulation (Docking), we are working on coarsegrain MO, as well as job submission – REMD (replica exchange program using NAMD) – FMO (Fragment MO) • For QSAR – Design of ML to describe computation results – Web service interface to access the database UK Jpana N+N 2003/10/3 23