notredame2012_stone - Theoretical Biophysics Group

advertisement
GPU-Accelerated Analysis of Petascale
Molecular Dynamics Simulations
John Stone
Theoretical and Computational Biophysics Group
Beckman Institute for Advanced Science and Technology
University of Illinois at Urbana-Champaign
http://www.ks.uiuc.edu/Research/vmd/
Scalable Software for Scientific Computing
University of Notre Dame, June 11, 2012
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
VMD – “Visual Molecular Dynamics”
• Visualization and analysis of:
–
–
–
–
molecular dynamics simulations
quantum chemistry calculations
particle systems and whole cells
sequence data
Poliovirus
• User extensible w/ scripting and plugins
• http://www.ks.uiuc.edu/Research/vmd/
Ribosome Sequences
Electrons in
Vibrating Buckyball
Cellular Tomography,
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Cryo-electron Microscopy
Beckman Institute,
Whole Cell
Simulations
U. Illinois
at Urbana-Champaign
Goal: A Computational Microscope
• Study the molecular machines in living cells
Ribosome: synthesizes proteins
from genetic information, target
for antibiotics
Silicon nanopore:
bionanodevice for sequencing
DNA efficiently
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
Meeting the Diverse Needs of the
Molecular Modeling Community
• Over 212,000 registered users
– 18% (39,000) are NIH-funded
– Over 49,000 have downloaded
multiple VMD releases
• Over 6,600 citations
• User community runs VMD on:
• VMD user support and
service efforts:
– 20,000 emails, 2007-2011
– Develop and maintain
VMD tutorials and topical
mini-tutorials; 11 in total
– Periodic user surveys
– MacOS X, Unix, Windows
operating systems
– Laptops, desktop workstations
– Clusters, supercomputers
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
VMD Interoperability –
Linked to Today’s Key Research Areas
• Unique in its interoperability with a broad
range of modeling tools: AMBER,
CHARMM, CPMD, DL_POLY, GAMESS,
GROMACS, HOOMD, LAMMPS, NAMD,
and many more …
• Supports key data types, file formats, and
databases, e.g. electron microscopy, quantum
chemistry, MD trajectories, sequence
alignments, super resolution light microscopy
• Incorporates tools for simulation preparation,
visualization, and analysis
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
Molecular Visualization and Analysis
Challenges for Petascale Simulations
• Very large structures (10M to over 100M atoms)
– 12-bytes per atom per trajectory frame
– One 100M atom trajectory frame: 1200MB!
• Long-timescale simulations produce huge trajectories
– MD integration timesteps are on the femtosecond timescale (1015 sec) but many important biological processes occur on
microsecond to millisecond timescales
– Even storing trajectory frames infrequently, resulting trajectories
frequently contain millions of frames
• Terabytes to petabytes of data, often too large to move
• Viz and analysis must be done primarily on the
supercomputer where the data already resides
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
Approaches for Visualization and Analysis of
Petascale Molecular Simulations with VMD
• Abandon conventional approaches, e.g. bulk download of
trajectory data to remote viz/analysis machines
– In-place processing of trajectories on the machine running the simulations
– Use remote visualization techniques: Split-mode VMD with remote frontend instance, and back-end viz/analysis engine running in parallel on
supercomputer
• Large-scale parallel analysis and visualization via distributed
memory MPI version of VMD
• Exploit GPUs and other accelerators to increase per-node
analytical capabilities, e.g. NCSA Blue Waters Cray XK6
• In-situ on-the-fly viz/analysis and event detection through direct
communication with running MD simulation
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
Parallel VMD Analysis w/ MPI
• Analyze trajectory frames,
structures, or sequences in
parallel supercomputers:
– Parallelize user-written analysis
scripts with minimum difficulty
– Parallel analysis of independent
trajectory frames
– Parallel structural analysis using
custom parallel reductions
– Parallel rendering, movie making
• Dynamic load balancing:
– Recently tested with up to 15,360
CPU cores
• Supports GPU-accelerated
clusters and supercomputers
Sequence/Structure Data,
Trajectory Frames, etc…
VMD
VMD
Data-Parallel
Analysis in
VMD
VMD
Gathered Results
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
GPU Accelerated Trajectory Analysis
and Visualization in VMD
GPU-Accelerated Feature
Speedup vs.
single CPU core
Molecular orbital display
120x
Radial distribution function
92x
Electrostatic field calculation
44x
Molecular surface display
40x
Ion placement
26x
MDFF density map synthesis
26x
Implicit ligand sampling
25x
Root mean squared fluctuation
25x
Radius of gyration
21x
Close contact determination
20x
Dipole moment calculation
15x
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
Quantifying GPU Performance and
Energy Efficiency in HPC Clusters
• NCSA “AC” Cluster
• Power monitoring
hardware on one node
and its attached Tesla
S1070 (4 GPUs)
• Power monitoring logs
recorded separately for
host node and attached
GPUs
• Logs associated with
batch job IDs
•32 HP XW9400 nodes
•128 cores, 128 Tesla C1060 GPUs
•QDR Infiniband
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
Tweet-a-Watt
• Kill-a-watt power meter
• Xbee wireless transmitter
• Power, voltage, shunt sensing
tapped from op amp
• Lower transmit rate to smooth
power through large capacitor
• Readout software upload
samples to local database
• We built 3 transmitter units
and one Xbee receiver
• Currently integrated into
AC cluster as power monitor
Imaginations unbound
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
Time-Averaged Electrostatics Analysis
on Energy-Efficient GPU Cluster
• 1.5 hour job (CPUs) reduced to
3 min (CPUs+GPU)
• Electrostatics of thousands of
trajectory frames averaged
• Per-node power consumption on
NCSA “AC” GPU cluster:
– CPUs-only: 299 watts
– CPUs+GPUs: 742 watts
• GPU Speedup: 25.5x
• Power efficiency gain: 10.5x
Quantifying the Impact of GPUs on Performance and Energy
Efficiency in HPC Clusters. J. Enos, C. Steffen, J. Fullop, M.
Showerman, G. Shi, K. Esler, V. Kindratenko, J. Stone, J. Phillips.
NIH BTRC for Macromolecular
and Bioinformaticspp. 317-324, 2010. Beckman Institute,
The Work in Progress
inhttp://www.ks.uiuc.edu/
GreenModeling
Computing,
U. Illinois at Urbana-Champaign
AC Cluster GPU Performance and
Power Efficiency Results
Application GPU
speedup
Host
watts
Host+GPU Perf/watt
watts
gain
NAMD
6
316
681
2.8
VMD
25
299
742
10.5
MILC
20
225
555
8.1
QMCPACK 61
314
853
22.6
Quantifying the Impact of GPUs on Performance and Energy Efficiency
in HPC Clusters. J. Enos, C. Steffen, J. Fullop, M. Showerman, G. Shi, K.
Esler, V. Kindratenko, J. Stone, J. Phillips. The Work in Progress in Green
Computing, 2010. In press.
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
Power Profiling: Example Log
• Mouse-over value displays
• Under curve totals displayed
• If there is user interest, we may support calls to add custom tags from
application
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
NCSA Blue Waters Early Science System
Cray XK6 nodes w/ NVIDIA Tesla X2090 GPUs
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
Time-Averaged Electrostatics Analysis on
NCSA Blue Waters Early Science System
NCSA Blue Waters Node Type
Seconds per
trajectory frame for
one compute node
Cray XE6 Compute Node:
32 CPU cores (2xAMD 6200 CPUs)
9.33
Cray XK6 GPU-accelerated Compute Node:
16 CPU cores + NVIDIA X2090 (Fermi) GPU
2.25
Speedup for GPU XK6 nodes vs. CPU XE6 nodes
GPU nodes are
4.15x faster overall
Preliminary performance for VMD time-averaged
electrostatics w/ Multilevel Summation Method on the
NCSA Blue Waters Early Science System
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
Early Experiences with Kepler
Preliminary Observations
• Arithmetic is cheap, memory references are costly
(trend is certain to continue & intensify…)
• Different performance ratios for registers, shared mem,
and various floating point operations vs. Fermi
• Kepler GK104 (e.g. GeForce 680) brings improved
performance for some special functions vs. Fermi:
CUDA Kernel
Dominant
Arithmetic
Operations
Kepler (GeForce 680)
Speedup vs.
Fermi (Quadro 7000)
Direct Coulomb summation
rsqrtf()
2.4x
Molecular orbital grid evaluation
expf(), exp2f(),
Multiply-Add
1.7x
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
Timeline Plugin:
Analyze MD Trajectories for Events
MDFF quality-of-fit
for cyanovirin-N
VMD Timeline plugin: live 2D plot linked to 3D structure
• Single picture shows changing properties across entire structure+trajectory
• Explore time vs. per-selection attribute, linked to molecular structure
• Many analysis methods available; user-extendable
Recent progress:
• Faster analysis with new VMD SSD trajectory formats, GPU acceleration
• Per-secondary-structure native contact and density correlation graphing
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
New Interactive Display & Analysis of Terabytes of Data:
Out-of-Core Trajectory I/O w/ Solid State Disks
450MB/sec
to 4GB/sec
A DVD movie per
second!
Commodity SSD, SSD RAID
• Timesteps loaded on-the-fly (out-of-core)
– Eliminates memory capacity limitations, even for multi-terabyte trajectory files
– High performance achieved by new trajectory file formats, optimized data structures, and
efficient I/O
• Analyze long trajectories significantly faster
• New SSD Trajectory File Format 2x Faster vs. Existing Formats
Immersive out-of-core visualization of large-size and long-timescale
molecular dynamics trajectories. J. Stone, K. Vandivort, and K. Schulten.
Lecture Notes
ComputerModeling
Science,
6939:1-12, 2011.
NIH BTRCin
for Macromolecular
and Bioinformatics
Beckman Institute,
http://www.ks.uiuc.edu/
U. Illinois at Urbana-Champaign
Challenges for Immersive Visualization of Dynamics
of Large Structures
• Graphical representations re-generated for each
animated simulation trajectory frame:
– Dependent on user-defined atom selections
• Although visualizations often focus on interesting
regions of substructure, fast display updates
require rapid traversal of molecular data structures
• Optimized atom selection traversal:
– Increased performance of per-frame updates by ~10x for
116M atom BAR case with 200,000 selected atoms
• New GLSL point sprite sphere shader:
– Reduce host-GPU bandwidth for displayed geometry
– Over 20x faster than old GLSL spheres drawn using
display lists — drawing time is now inconsequential
• Optimized all graphical representation generation
routines for large atom counts, sparse selections
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
116M atom BAR domain test case:
200,000 selected atoms,
stereo trajectory animation 70 FPS,
static scene in stereo 116 FPS
Beckman Institute,
U. Illinois at Urbana-Champaign
Molecular Structure Data and Global VMD State
DisplayDevice
Graphical
Representations
User Interface
Subsystem
DrawMolecule
Tcl/Python Scripting
Non-Molecular
Geometry
Mouse + Windows
Display
Subsystem
Windowed OpenGL
OpenGLRenderer
CAVE
FreeVR
Interactive MD
Scene Graph
VR “Tools”
6DOF Input
Spaceball
Position
Haptic Device
Buttons
CAVE Wand
Force
Feedback
VRPN
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Smartphone
Beckman Institute,
U. Illinois at Urbana-Champaign
VMD Out-of-Core Trajectory I/O Performance:
SSD-Optimized Trajectory Format, 8-SSD RAID
Ribosome w/ solvent
Membrane patch w/ solvent
3M atoms
20M atoms
3 frames/sec w/ HD
0.4 frames/sec w/ HD
60 frames/sec w/ SSDs
8 frames/sec w/ SSDs
New SSD Trajectory File Format 2x Faster vs. Existing Formats
VMDNIHI/O
rate ~2.1 GB/sec w/ 8 SSDs
BTRC for Macromolecular Modeling and Bioinformatics
Beckman Institute,
http://www.ks.uiuc.edu/
U. Illinois at Urbana-Champaign
Challenges for High Throughput
Trajectory Visualization and Analysis
• It is not currently possible to fully exploit full I/O
bandwidths when streaming data from SSD arrays
(>4GB/sec) to GPU global memory
• Need to eliminated copies from disk controllers to
host memory – bypass host entirely and perform
zero-copy DMA operations straight from disk
controllers to GPU global memory
• Goal: GPUs directly pull in pages from storage
systems bypassing host memory entirely
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
Improved Support for Large Datasets in VMD
• New structure building tools, file formats, and data structures enable VMD
to operate efficiently up to 150M atoms
– Up to 30% more memory efficient
– Analysis routines optimized for large structures, up to 20x faster for
calculations on 100M atom complexes where molecular structure traversal can
represent a significant amount of runtime
– New and revised graphical representations support smooth trajectory
animation for multi-million atom complexes; VMD remains interactive even
when displaying surface reps for 20M atom membrane patch
• Uses multi-core CPUs and GPUs for the most demanding computations
20M atoms:
membrane patch and solvent
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
VMD “QuickSurf” Representation
• Large biomolecular
complexes are difficult to
interpret with atomic detail
graphical representations
• Even secondary structure
representations become
cluttered
• Surface representations are
easier to use when greater
abstraction is desired, but
are computationally costly
• Existing surface display
methods incapable of
animating dynamics of
Poliovirus
large structures NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
VMD “QuickSurf” Representation
• Displays continuum of structural detail:
– All-atom models
– Coarse-grained models
– Cellular scale models
– Multi-scale models: All-atom + CG, Brownian + Whole Cell
– Smoothly variable between full detail, and reduced resolution
representations of very large complexes
Fast Visualization of Gaussian Density Surfaces for Molecular
Dynamics and Particle System Trajectories.
BTRC for Macromolecular Modeling and Bioinformatics
Beckman Institute,
M. Krone, J.NIHStone,
T. http://www.ks.uiuc.edu/
Ertl, K. Schulten. EuroVis 2012. (In-press)
U. Illinois at Urbana-Champaign
VMD “QuickSurf” Representation
• Uses multi-core CPUs and GPU acceleration to enable smooth
real-time animation of MD trajectories
• Linear-time algorithm, scales to millions of particles, as limited
by memory capacity
NIH Virus
BTRC for Macromolecular Modeling and Bioinformatics
Satellite Tobacco Mosaic
Lattice Cell SimulationsBeckman Institute,
http://www.ks.uiuc.edu/
U. Illinois at Urbana-Champaign
QuickSurf Representation of
Lattice Cell Models
Continuous particle
based model – often 70
to 300 million particles
Discretized lattice models derived
from continuous model shown in a
surface representation
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
Acknowledgements
• Theoretical and Computational Biophysics
Group, University of Illinois at UrbanaChampaign
• NCSA Blue Waters Team
• NCSA Innovative Systems Lab
• NVIDIA CUDA Center of Excellence,
University of Illinois at Urbana-Champaign
• The CUDA team at NVIDIA
• NIH support: P41-RR005969
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
GPU Computing Publications
http://www.ks.uiuc.edu/Research/gpu/
• Fast Visualization of Gaussian Density Surfaces for Molecular
Dynamics and Particle System Trajectories. M. Krone, J. Stone, T.
Ertl, and K. Schulten. In proceedings EuroVis 2012, 2012. (In-press)
• Immersive Out-of-Core Visualization of Large-Size and LongTimescale Molecular Dynamics Trajectories. J. Stone, K. Vandivort,
and K. Schulten. G. Bebis et al. (Eds.): 7th International Symposium on
Visual Computing (ISVC 2011), LNCS 6939, pp. 1-12, 2011.
• Fast Analysis of Molecular Dynamics Trajectories with Graphics
Processing Units – Radial Distribution Functions. B. Levine, J.
Stone, and A. Kohlmeyer. J. Comp. Physics, 230(9):3556-3569, 2011.
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
GPU Computing Publications
http://www.ks.uiuc.edu/Research/gpu/
•
Quantifying the Impact of GPUs on Performance and Energy Efficiency in
HPC Clusters. J. Enos, C. Steffen, J. Fullop, M. Showerman, G. Shi, K. Esler, V.
Kindratenko, J. Stone, J Phillips. International Conference on Green Computing,
pp. 317-324, 2010.
•
GPU-accelerated molecular modeling coming of age. J. Stone, D. Hardy, I.
Ufimtsev, K. Schulten. J. Molecular Graphics and Modeling, 29:116-125, 2010.
•
OpenCL: A Parallel Programming Standard for Heterogeneous Computing.
J. Stone, D. Gohara, G. Shi. Computing in Science and Engineering, 12(3):6673, 2010.
•
An Asymmetric Distributed Shared Memory Model for Heterogeneous
Computing Systems. I. Gelado, J. Stone, J. Cabezas, S. Patel, N. Navarro, W.
Hwu. ASPLOS ’10: Proceedings of the 15th International Conference on
Architectural Support for Programming Languages and Operating Systems, pp.
347-358, 2010.
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
GPU Computing Publications
http://www.ks.uiuc.edu/Research/gpu/
•
GPU Clusters for High Performance Computing. V. Kindratenko, J. Enos, G.
Shi, M. Showerman, G. Arnold, J. Stone, J. Phillips, W. Hwu. Workshop on
Parallel Programming on Accelerator Clusters (PPAC), In Proceedings IEEE
Cluster 2009, pp. 1-8, Aug. 2009.
•
Long time-scale simulations of in vivo diffusion using GPU hardware.
E. Roberts, J. Stone, L. Sepulveda, W. Hwu, Z. Luthey-Schulten. In IPDPS’09:
Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed
Computing, pp. 1-8, 2009.
•
High Performance Computation and Interactive Display of Molecular
Orbitals on GPUs and Multi-core CPUs. J. Stone, J. Saam, D. Hardy, K.
Vandivort, W. Hwu, K. Schulten, 2nd Workshop on General-Purpose
Computation on Graphics Pricessing Units (GPGPU-2), ACM International
Conference Proceeding Series, volume 383, pp. 9-18, 2009.
•
Probing Biomolecular Machines with Graphics Processors. J. Phillips, J.
Stone. Communications of the ACM, 52(10):34-41, 2009.
•
Multilevel summation of electrostatic potentials using graphics processing
units. D. Hardy, J. Stone, K. Schulten. J. Parallel Computing, 35:164-177, 2009.
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
GPU Computing Publications
http://www.ks.uiuc.edu/Research/gpu/
•
Adapting a message-driven parallel application to GPU-accelerated clusters.
J. Phillips, J. Stone, K. Schulten. Proceedings of the 2008 ACM/IEEE Conference
on Supercomputing, IEEE Press, 2008.
•
GPU acceleration of cutoff pair potentials for molecular modeling
applications. C. Rodrigues, D. Hardy, J. Stone, K. Schulten, and W. Hwu.
Proceedings of the 2008 Conference On Computing Frontiers, pp. 273-282, 2008.
•
GPU computing. J. Owens, M. Houston, D. Luebke, S. Green, J. Stone, J.
Phillips. Proceedings of the IEEE, 96:879-899, 2008.
•
Accelerating molecular modeling applications with graphics processors. J.
Stone, J. Phillips, P. Freddolino, D. Hardy, L. Trabuco, K. Schulten. J. Comp.
Chem., 28:2618-2640, 2007.
•
Continuous fluorescence microphotolysis and correlation spectroscopy. A.
Arkhipov, J. Hüve, M. Kahms, R. Peters, K. Schulten. Biophysical Journal,
93:4006-4017, 2007.
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute,
U. Illinois at Urbana-Champaign
Download