HPDC-Aug2001 - Legion

advertisement

Studying Protein Folding on the Grid:

Experiences Using CHARMM on NPACI

Resources under Legion

University of Virginia

Anand Natrajan

Marty A. Humphrey

Anthony D. Fox

Andrew S. Grimshaw

Scripps (TSRI)

Michael Crowley

Charles L. Brooks III

SDSC

Nancy Wilkins-Diehr http://legion.virginia.edu

anand@virginia.edu

• CHARMM

– Issues

• Legion

• The Run

– Results

– Lessons

• Portals

• Summary

Outline

CHARMM

• Routine exploration of folding landscapes helps in search for protein folding solution

• Understanding folding critical to structural genomics, biophysics, drug design, etc.

• Key to understanding cell malfunctions in

Alzheimer’s, cystic fibrosis, etc.

• CHARMM and Amber benefit majority (>80%) of bio-molecular scientists

• Structural genomic & protein structure predictions

Folding Free Energy Landscape

Molecular

Dynamics Simulations

100-200 structures to sample

(r,R gyr

) space r

R gyr

Folding of Protein L

• Immunoglobulin-binding protein

– 62 residues (small), 585 atoms

– 6500 water molecules, total 20085 atoms

– Each parameter point requires O(10 6 ) dynamics steps

– Typical folding surfaces require 100-200 sampling runs

• CHARMM using most accurate physics available for classical molecular dynamics simulation

• Multiple 16-way parallel runs - maximum efficiency

Application Characteristics

• Parameter-space study

– Parameters correspond to structures along & near folding path

• Path unknown - could be many or broad

– Many places along path sampled for determining local low free energy states

– Path is valley of lowest free energy states from high free energy state of unfolded protein to lowest free energy state (folded native protein)

Application Characteristics

• Many independent runs

– 200 sets of data to be simulated in two sequential runs

• Equilibration (4-8 hours)

• Production/sampling (8 to 16 hours)

• Each point has task name, e.g., pl_1_2_1_e

Legion

Complete, Integrated Infrastructure for Secure Distributed Resource

Sharing

Grid OS Requirements

• Wide-area

• High Performance

• Complexity

Management

• Extensibility

• Security

• Site Autonomy

• Input / Output

• Heterogeneity

• Fault-tolerance

• Scalability

• Simplicity

• Single Namespace

• Resource Management

• Platform

Independence

• Multi-language

• Legacy Support

Transparent System

npacinet

The Run

Computational Issues

• Provide improved response time

• Access large set of resources transparently

– geographically distributed

– heterogeneous

– different organisations

6 organisations

6 queue types

10 queues

6 architectures

~1000 processors

HP SuperDome

CalTech

440 MHz PA-8700

128/128

Resources Available

IBM SP3

UMich

375MHz Power3

24/24

DEC Alpha

UVa

533MHz EV56

32/128

IBM Blue Horizon

SDSC

375MHz Power3

512/1184

Sun HPC 10000

SDSC

400MHz SMP

32/64

IBM Azure

UTexas

160MHz Power2

32/64

Scientists Using Legion

• Binaries for each type

• Script for dispatching jobs

• Script for keeping track of results

• Script for running binary at site

– optional feature in

Legion

• Abstract interface to resources

– queues, accounting, firewalls, etc.

• Binary transfer (with caching)

• Input file transfer

• Job submission

• Status reporting

• Output file transfer

Mechanics of Runs

Register binaries

Dispatch specification

24%

Distribution of CHARMM Work

1%

2%

1%

1%

0%

71%

SDSC IBM

CalTech HP

UTexas IBM

UVa DEC

SDSC Cray

SDSC Sun

UMich IBM

Problems Encountered

• Network slowdowns

– Slowdown in the middle of the run

– 100% loss for packets of size ~8500 bytes

• Site failures LEGION

– LoadLeveler restarts

– NFS/AFS failures

• Legion

– No run-time failures

UMich

– Archival support lacking

– Must address binary differences

SDSC

01101

UVa

Successes

• Science accomplished faster

– 1 month on 128 SGI Origins @Scripps

– 1.5 days on national grid with Legion

• Transparent access to resources

– User didn’t need to log on to different machines

– Minimal direct interaction with resources

• Problems identified

• Legion remained stable

– Other Legion users unaware of large runs

• Large grid application run at powerful resources by one person from local resource

• Collaboration between natural and computer scientists

Portal Interface

Easy Interface to Grid

Legion GUIs

• Simple point-and-click interface to Grids

– Familiar access to distributed file system

– Enables & encourages sharing

• Application portal model for HPC

– AmberGrid

– RenderGrid

– Accounting

Transparent Access to Remote Resources

Intended Audience is

Scientists

Logging in to npacinet

View of contexts

(Distributed

File System)

Control Panel

Running

Amber

Run

Status

( Legion )

Graphical

View

( Chime )

Summary

• CHARMM Run

– Succeeded in starting big runs

– Encountered problems

– Learnt lessons for future

• AmberGrid

– Showed proof-of-concept - grid portal

– Need to resolve licence issues

Download