High Throughput High Performance Molecular Dynamics Simulations PETA Share All Hands Meeting

advertisement
High Throughput High Performance
Molecular Dynamics Simulations
PETA Share
All Hands Meeting
LSU March 3, 2008
Thomas C. Bishop
Center for Computational Science
Tulane University
New Orleans, LA
Cell Biology 001
Cell Organization and Chromatin
3000nm
Ribosome
Wikipedia
- “Molecular Biology of the Cell” Alberts B., Bray D., Lewis J.,
Raff M., Roberts K. and Watson J.D.
Folding DNA into Nucleosomes
Free DNA
Folded DNA

any sequence will fold

stability depends on ability of sequence to fold
Chromatin Structure:
Prediction & Analysis
Fine Grain and Coarse Grain
Free DNA
Atomic Model
Folded DNA
Coarse Grain
molecular dynamics
elastic deformation
1 to 1000 nucleosomes
entire genomes
Molecular Dynamics in a Nut Shell
Non-Bonded
interactions
take all
the CPU time
http://cmm.cit.nih.gov/modeling/guide_documents/molecular_mechanics_document.html
MD is a Mature Method

algorithms defined/robust

efficiently parallelized (1000 CPUS/ 3M atoms)

force fields carefully evaluated

data formats decided

tools for visualization well developed
MD Practical Issues

Pre-Simulation (1day)

Simulation (1mo)
1)Minimization
2)Equilibration
image from NAMD tutorial files
www.ks.uiuc.edu

3)Dynamics
Post Simulation Analysis (1wk)
Specific Data Notes

High Throughput

High Performance
1300 systems/24,000ea
1system/155,000
~40Gb
~200Gb
Initialize Sys. ~1min/sys
Initialize Sys. 1day
2000 step minimization
20ns Simulation
2min/sys
3 days compute time
1day/run
30 days on 64cpu
The NAMD-G Solution
i
bm
2
m
e
tr
e
ot
ch
t
ba
1.
S
b
jo
6.
Local Workstation
u
S
.
to
re
in
pu
Re
tf
tri
ile
e
ou ve
s
tp re
on
ut s
M
file tar
SS
s t an
d
3. Retrieve input and restart files
4. Execute NAMD
5. Store output and restart files
Remote HPC Machine
Mass Storage
System
images from NAMD-G presentation: Theoretical and Computational Biophysics Group at UIUC
•Images of remote HPC machine (Mercury) and mass storage system (UniTree) courtesy of the National Center for Supercomputing Applications (NCSA) and the Board of Trustees of the
University of Illinois
LONI Data Challenges

Data Location

Software Availability/Performance

Authentication & Computing Environments
Data Location

DNA, CCS , LONI

Simulation Initiation
−
build on fly vs. build locally then transfer




1 simulation build locally and transfer to compute server
1000 simulations build on fly at compute server (30Gb)
Queuing
−
data discovery (queue load, inputs, executables)
−
once queued committed to single resource?
Analysis
−
where did I put that simulation?
Software Availability/Performance
Benchmarks based on one of the sample simulations distributed with NAMD demonstrates
that careful optimization for LONI's Dell based infiniband machines provides significant
improvement over default download. This required compiling MPI libraries as well as NAMD.
Authentication & Environments

Each LONI machine is independent
common
uid/gid/password
different
/home
/work
.login
.cshrc
software load

DNA, CCS, LONI also differ
Meta-queuing solves my problems


submit from desktop run anywhere
−
anywhere includes: DNA, CCS, LONI, TERAGRID
−
based on availability and load ... automatically!
simulation protocols (workflows)
−
build system
−
run simulation
−
analyze
Questions


Data visualization ?
−
how do I do this remotely?
−
how do I let others see my data?
Data mining ?
−

comparing data from different simulations
Re-analyzing ?
−
analysis protocols
THE END
Download