Simulating Life at the Atomic Scale

advertisement
Simulating Life at the Atomic Scale
James Phillips
Beckman Institute, University of Illinois
http://www.ks.uiuc.edu/Research/namd/
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Beckman Institute
University of Illinois at
Urbana-Champaign
Theoretical and Computational
Biophysics Group
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
NAMD: Scalable Molecular Dynamics
2002 Gordon Bell Award
ATP synthase
PSC Lemieux
Blue Waters Target Application
Illinois Petascale Computing Facility
37,000 Users, 1700 Citations
Computational Biophysics Summer School
GPU Acceleration
NVIDIA Tesla
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
NCSA Lincoln
Beckman Institute, UIUC
Computational Microscopy
Ribosome: synthesizes proteins from
genetic information, target for antibiotics
Silicon nanopore: bionanodevice
for sequencing DNA efficiently
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Molecular Mechanics Force Field
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Classical Molecular Dynamics
Energy function:
used to determine the force on each atom:
Newton’s equation represents a set of N second order differential
equations which are solved numerically via the Verlet integrator
at discrete time steps to determine the trajectory of each atom.
Small terms added to control temperature and pressure.
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Biomolecular Time Scales
Motion
Bond stretching
Time Scale
(sec)
10-14 to 10-13
Elastic vibrations
10-12 to 10-11
Rotations of surface
sidechains
10-11 to 10-10
Hinge bending
10-11 to 10-7
Max Timestep: 1 fs
Rotation of buried side 10-4 to 1 sec
chains
Allosteric transistions
10-5 to 1 sec
Local denaturations
10-5 to 10 sec
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Sizes of Simulations Over Time
BPTI
3K atoms
Estrogen Receptor
36K atoms (1996)
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
ATP Synthase
327K atoms
(2001)
Beckman Institute, UIUC
Our Solution: Parallel Computing
HP 735 cluster
14 processors
(1994)
SGI Origin 2000
128 processors (1997)
PSC Lemieux AlphaServer SC
3000 processors (2002)
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
NAMD Parallel Scaling Snapshot
!""
+,-+!./ 012.3 4'
+,-+!.5 678.9 8:8;=
ApoA1: 92K atoms
> 4?@./ 012.3 4'
> 4?@.5 678.9 8:8;<
> 4?@.5 678.9 8:8;=
!"
STMV: 1M atoms
)#*&$
!&)$'
$!(#
'"(&
#"'$
!"#'
%!#
#%&
!
!#$
ns/day
+,-+!.5 678.9 8:8;<
number
of
cores
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Parallel Programming Lab
University of Illinois at Urbana-Champaign
Siebel Center for Computer Science
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Develop abstractions in context of full-scale applications
Protein Folding
NAMD: Molecular Dynamics
Quantum Chemistry
(QM/MM)
Computational Cosmology
STM virus simulation
Parallel Objects,
Adaptive Runtime System
Libraries and Tools
Rocket Simulation
Dendritic Growth
Crack Propagation
Space-time meshes
The enabling CS technology of parallel objects and intelligent
runtime systems has led to several collaborative applications in CSE
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
TCBG Experimental Collaborations
• Nearly every collaboration relies on NAMD.
• High-end simulations push scaling efforts.
– Try to anticipate needs: Million-atom virus just worked.
• Innovative simulations generate feature requests:
What is
science
goal?
Existing
features
usable?
Find a
scalable
method.
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Make it
general
purpose.
Beckman Institute, UIUC
Adaptability Through Scripting
•
•
Tcl customizations are portable
Top-level protocols:
– Minimize, heat, equilibrate
– Simulated annealing
– Replica exchange (two modifications)
•
Long-range forces on selected atoms
– Torques and other steering forces
– Adaptive bias free energy perturbation
– Coupling to external coarse-grain model
•
Special boundary forces
– Applies potentially to every atom
– Several design iterations for efficiency
– Shrinking phantom pore for DNA
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
NAMD: Practical Supercomputing
•
37,000 users can’t all be computer experts.
– 8800 have downloaded more than one version.
– 1700 citations of NAMD reference papers.
•
One program for all platforms.
–
–
–
–
•
Desktops and laptops – setup and testing
Linux clusters – affordable local workhorses
Supercomputers – free allocations on TeraGrid
Blue Waters – sustained petaflop/s performance
User knowledge is preserved.
– No change in input or output files.
– Run any simulation on any number of cores.
•
Available free of charge to all.
Phillips et al., J. Comp. Chem. 26:1781-1802, 2005.
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Our Goal: Practical Acceleration
• Broadly applicable to scientific computing
– Programmable by domain scientists
– Scalable from small to large machines
• Broadly available to researchers
– Price driven by commodity market
– Low burden on system administration
• Sustainable performance advantage
– Performance driven by Moore’s law
– Stable market and supply chain
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Acceleration Options for NAMD
• Outlook in 2005-2006:
– FPGA reconfigurable computing (with NCSA)
• Difficult to program, slow floating point, expensive
– Cell processor (NCSA hardware)
• Relatively easy to program, expensive
– ClearSpeed (direct contact with company)
• Limited memory and memory bandwidth, expensive
– MDGRAPE
• Inflexible and expensive
– Graphics processor (GPU)
• Program must be expressed as graphics operations
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
GPU vs CPU: Raw Performance
– Calculation: 450 GFLOPS vs 32 GFLOPS
– Memory Bandwidth: 80 GB/s vs 8.4 GB/s
G80 = GeForce 8800 GTX
G71 = GeForce 7900 GTX
GFLOPS
G70 = GeForce 7800 GTX
NV40 = GeForce 6800 Ultra
NV35 = GeForce FX 5950 Ultra
NV30 = GeForce FX 5800
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
CUDA: Practical Performance
November 2006: NVIDIA announces CUDA for G80 GPU.
• CUDA makes GPU acceleration usable:
–
–
–
–
–
Developed and supported by NVIDIA.
No masquerading as graphics rendering.
New shared memory and synchronization.
No OpenGL or display device hassles.
Multiple processes per card (or vice versa).
Fun to program (and drive)
• TCBG and collaborators make it useful:
– Experience from VMD development
– David Kirk (Chief Scientist, NVIDIA)
– Wen-mei Hwu (ECE Professor, UIUC)
Stone et al., J. Comp. Chem. 28:2618-2640, 2007.
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Typical CPU Architecture
L2 Cache
L1 I
L3 Cache
L1 D
Dispatch/Retire
FPU FPU ALU
Memory Controller
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Minimize the Processor
No large caches or multiple execution units
L1 I
L1 D
Dispatch/Retire
FPU
Do integer arithmetic on FPU
Memory Controller
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Maximize Floating Point
8 FP pipelines per SIMD unit
L1 I
L1 D
Shared data cache
Dispatch/Retire
Single instruction stream
FPU FPU FPU FPU
One thread per FPU allows
branches and gather/scatter.
FPU FPU FPU FPU
Memory Controller
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Add More Threads
Pipeline 4 threads per
FPU to hide 4-cycle
instruction latency.
All 32 threads in a
“warp” execute the
same instruction.
FPU FPU FPU FPU
FPU FPU FPU FPU
Divergent branches
allowed through
predication.
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Add Even More Threads
Multiple warps in a “block”
hide main memory latency and
can synchronize to share data.
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Add More Threads Again
Multiple blocks on a
single multiprocessor
hide both memory
and synchronization
latency.
FPU
FPU
FPU
FPU
FPU
FPU
FPU
FPU
All blocks execute a
“kernel” function
independently without
synchronization or
memory coherency.
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Add Cores to Suit Customer
Kernel is invoked on a
“grid” of uniform blocks.
Blocks are dynamically
assigned to available
multiprocessors and run
to completion.
Synchronization occurs
when all blocks complete.
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Support Fine-Grained Parallelism
• Threads are cheap but desperately needed.
–
–
–
–
–
How many can you give?
512 threads will keep all 128 FPUs busy.
1024 threads will hide some memory latency.
12,288 threads can run simultaneously.
Up to 2×1012 threads per kernel invocation.
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
VMD – “Visual Molecular Dynamics”
•
•
•
Visualization and analysis of molecular dynamics simulations, sequence data,
volumetric data, quantum chemistry simulations, particle systems, …
User extensible with scripting and plugins
http://www.ks.uiuc.edu/Research/vmd/
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Molecular Modeling: Ion Placement
• Model structures are initially
constructed in vacuum
• Solvent (water) and ions are
added as necessary to
reproduce the required
biological conditions
• Computational requirements
scale with the size of the
simulated structure
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Electrostatic Potential Maps
•
Electrostatic potentials
evaluated on 3-D lattice:
•
Applications include:
– Ion placement for structure
building
– Time-averaged potentials
for simulation
– Visualization and analysis
Isoleucine tRNA synthetase
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Direct Summation Algorithm
• Each lattice point accumulates electrostatic
potential contribution from all atoms:
potential[j] += atom[i].charge / rij
Lattice point j
being evaluated
rij: distance
from lattice[j]
to atom[i]
atom[i]
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Ion Placement via Direct Sum
• 110 CPU-hours on Altix
• 1.35 hours on GPU
• 27 minutes on three GPUs
Satellite Tobacco Mosaic Virus (STMV)
Ion Placement
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
CUDA Acceleration in VMD
Electrostatic field
calculation, ion placement
20x to 44x faster
Molecular orbital
calculation and display
100x to 120x faster
Imaging of gas migration
pathways in proteins with
implicit ligand sampling
20x to 30x faster
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
NAMD Lincoln Cluster Performance
(8 Intel cores and 2 NVIDIA Telsa GPUs per node)
STMV (1M atoms) s/step
~2.8
2 GPUs = 24 cores
4 GPUs
8 GPUs
16 GPUs
CPU cores
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
NAMD Petascale Preparations
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Blue Waters Architecture






IBM Power 7
Peak Perf ~10 PF
Sustained ~1 PF
300,000+ cores
1.2+ PB Memory
18+ PB Disc






8 cores/chip
4 chips/MCM
8 MCMs/Drawer
4 Drawers/SuperNode
1024 cores/SuperNode
Linux OS
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Challenges and Opportunities




Support systems >= 100 Million atoms
Performance requirements for 100 Million atom
Scale to over 300,000 cores
Power 7 Hardware
−
−


PPC architecture
Wide node at least 32 cores with 128 HT threads
Blue Waters Torrent interconnect
Doing research under NDA
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Planned Petascale Simulations
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Thanks to NIH, NSF, DOE, and 15 years of
NAMD and Charm++ developers and users.
James Phillips
Beckman Institute, University of Illinois
http://www.ks.uiuc.edu/Research/namd/
NIH Resource for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/
Beckman Institute, UIUC
Download