Protein Folding Landscapes in a Distributed Environment All Hands Meeting, 2001 University of Virginia Andrew Grimshaw Anand Natrajan Scripps (TSRI) Charles L. Brooks III Michael Crowley SDSC Nancy Wilkins-Diehr NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Outline • CHARMM – Issues • Legion • The Run – Results – Lessons • AmberGrid • Summary NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE CHARMM • Routine exploration of folding landscapes helps in search for protein folding solution • Understanding folding critical to structural genomics, biophysics, drug design, etc. • Key to understanding cell malfunctions in Alzheimer’s, cystic fibrosis, etc. • CHARMM and Amber benefit majority (>80%) of bio-molecular scientists • Structural genomic & protein structure predictions NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Folding Free Energy Landscape Molecular Dynamics Simulations 100-200 structures to sample (r,Rgyr ) space r NATIONAL PARTNERSHIP FOR Rgyr ADVANCED COMPUTATIONAL INFRASTRUCTURE Application Characteristics • Parameter-space study – Parameters correspond to structures along & near folding path • Path unknown - could be many or broad – Many places along path sampled for determining local low free energy states – Path is valley of lowest free energy states from high free energy state of unfolded protein to lowest free energy state (folded native protein) NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Folding of Protein L • Immunoglobulin-binding protein – 62 residues (small), 585 atoms – 6500 water molecules, total 20085 atoms – Each parameter point requires O(106) dynamics steps – Typical folding surfaces require 100-200 sampling runs • CHARMM using most accurate physics available for classical molecular dynamics simulation – PME, 9 Ao cutoff, heuristic list update, SHAKE • Multiple 16-way parallel runs - maximum efficiency NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Application Characteristics • Many independent runs – 200 sets of data to be simulated in two sequential runs • Equilibration (4-8 hours) • Production/sampling (8 to 16 hours) • Each point has task name, e.g., pl_1_2_1_e NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Scientists Using Legion • Binaries for each type • Script for dispatching jobs • Script for keeping track of results • Script for running binary at site – optional feature in Legion NATIONAL PARTNERSHIP FOR • Abstract interface to resources – queues, accounting, firewalls, etc. • Binary transfer (with caching) • Input file transfer • Job submission • Status reporting • Output file transfer ADVANCED COMPUTATIONAL INFRASTRUCTURE Legion Complete, Integrated Infrastructure for Secure Distributed Resource Sharing NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Grid OS Requirements • Wide-area • High Performance • Complexity Management • Extensibility • Security • Site Autonomy • Input / Output • Heterogeneity NATIONAL PARTNERSHIP FOR • • • • • • Fault-tolerance Scalability Simplicity Single Namespace Resource Management Platform Independence • Multi-language • Legacy Support ADVANCED COMPUTATIONAL INFRASTRUCTURE Transparent System NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE npacinet NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE The Run NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Computational Issues • Provide improved response time • Access large set of resources transparently – geographically distributed – heterogeneous – different organisations NATIONAL PARTNERSHIP FOR 5 organisations 7 systems 9 queues 5 architectures ~1000 processors ADVANCED COMPUTATIONAL INFRASTRUCTURE Resources Available HP SuperDome CalTech 440 MHz PA-8700 128/128 IBM SP3 UMich 375MHz Power3 24/24 DEC Alpha UVa 533MHz EV56 32/128 IBM Blue Horizon SDSC 375MHz Power3 512/1184 NATIONAL PARTNERSHIP Sun HPC 10000 SDSC 400MHz SMP 32/64 FOR IBM Azure UTexas 160MHz Power2 32/64 ADVANCED COMPUTATIONAL INFRASTRUCTURE Scientists Using Legion • Binaries for each type • Script for dispatching jobs • Script for keeping track of results • Script for running binary at site – optional feature in Legion NATIONAL PARTNERSHIP FOR • Abstract interface to resources – queues, accounting, firewalls, etc. • Binary transfer (with caching) • Input file transfer • Job submission • Status reporting • Output file transfer ADVANCED COMPUTATIONAL INFRASTRUCTURE Mechanics of Runs Dispatch Create task Dispatch equilibration directories & equilibration & production specification NATIONAL PARTNERSHIP FOR Legion Register binaries ADVANCED COMPUTATIONAL INFRASTRUCTURE Distribution of CHARMM Work 1% 1% 2% 1% 0% 24% 71% NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SDSC IBM CalTech HP UTexas IBM UVa DEC SDSC Cray SDSC Sun UMich IBM Problems Encountered • Network slowdowns – Slowdown in the middle of the run – 100% loss for packets of size ~8500 bytes LEGION • Site failures – LoadLeveler restarts – NFS/AFS failures • Legion UMich – No run-time failures – Archival support lacking – Must address binary differences NATIONAL PARTNERSHIP FOR 01101 SDSC ADVANCED COMPUTATIONAL INFRASTRUCTURE UVa Successes • Science accomplished faster – 1 month on 128 SGI Origins @Scripps – 1.5 days on national grid with Legion • Transparent access to resources – User didn’t need to log on to different machines – Minimal direct interaction with resources • Problems identified • Legion remained stable – Other Legion users unaware of large runs • Large grid application run at powerful resources by one person from local resource • Collaboration between natural and computer scientists NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE AmberGrid Easy Interface to Grid NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Legion GUIs • Simple point-and-click interface to Grids – Familiar access to distributed file system – Enables & encourages sharing • Application portal model for HPC – AmberGrid – RenderGrid – Accounting NATIONAL PARTNERSHIP Transparent Access to Remote Resources Intended Audience is Scientists FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Logging in to npacinet NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE View of contexts (Distributed File System) NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Control Panel NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Running Amber NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Run Status (Legion) Graphical View (Chime) NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Summary • CHARMM Run – – – – Succeeded in starting big runs Encountered problems Learnt lessons for future Let’s do it again! • more processors, systems, organisations • AmberGrid – Showed proof-of-concept - grid portal – Need to resolve licence issues NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE