Anton_Koutsoyannis

advertisement
Anton, a Special-Purpose
Machine for Molecular Dynamics
Simulation
By David E. Shaw et al
Presented by Bob Koutsoyannis
The Anton Legacy
• Anton van Leeuwenhoek “Father of Microscopy”
• First to see bacteria and other micro organisms
• Objective: Improve
the tools available to
scientists to further
our understanding of
organisms & diseases
Anton the Machine
• Specialized Massively Parallel Machine
being built to improve Molecular Dynamic
Simulations.
• In the works to be completed by 2009
• Biological processes spatially distributed
among many nodes in a 3D torus.
• MD specific hardware
• Novel parallel algorithms
Molecular Dynamics Simulation
• Models the motions and interactions of
molecular systems
– Proteins
– Cell Membranes
– DNA
– (atomic level
simulations)
Motivation
• Life Saving…
• Used to visualize biochemical phenomena
that cannot be seen in lab experiments.
– Protein Folding
– Protein, Protein interactions
– Protein, Drug interaction
• Key for Developing Drugs
What makes one MD simulator
better than the Next?
• Time Scale
– Being able to simulate the interaction between
molecules for more than a nanosecond.
• Problem Size
– Why is a millisecond of simulation out of the
scope of our current technology?
– Consider 200,000 molecules
• 1012 time steps to simulate a millisecond
– Each time step requires intense arithmetic computation
on all 200,000 molecules
What makes one MD simulator
better than the Next?
• Other Projects Addressing MD Sims
– Folding@Home
• Network of 200,000 PC’s
• Large sample for independent molecular sims
• But no millisecond simulations
– FASTRUN, MDGRAPE, MD Engine
• Good with larger molecular system sims
• Have strong arithmetic units
• Still limited by communication bottlenecks
MD Simulator Requirements
• Force Calculation
•
(getting an idea of the level of computation needed)
M1
•
M2
Molecular mechanics force fields used to
model the total PE of a system.
• Input: X,Y,Z
Outputs: Force Quantities
MD Simulator Requirements
• Force Calculation
•
(getting an idea of the level of computation
needed)
• For every time step, the force
fields must be updated.
• FFT, Convolution, Inverse FFT
(Computationally expensive
operations)
• For 200,000 molecules/step…
• 1) Need a huge number of
arithmetic processing elements
MD Simulator Requirements
• Integration
•
(getting an idea of the level of computation
needed)
• For every time step, updates of
atomic positions and velocities
must be made.
• Global actions and Constraints
must be enforced on the entire
system (temperature, pressure,
optimizations.)
MD Simulator Requirements
• Parallelization
•
(getting an idea of the level of computation
needed)
• For every time step, every atom
must communicate within its
cutt-off radius with every other
atom.
• 2) A lot of inter-processor
communication that can be
scaled well is needed.
MD Simulator Requirements
• Parallelization
•
(getting an idea of the level of computation
needed)
• Whole System is broken down
into boxes (processing nodes)
• Each node handles the bonded
interactions within
• NT method for non-bonded
interactions (much more
common).
• NT method for Atom Migration
Why Specialized Hardware?
• 1) Need a huge number of
arithmetic processing
elements
• 2) A lot of inter-processor
communication that can be
scaled well is needed.
• 3) Memory is not an issue
– With 25,000 atoms (64bytes
each) total=1.6MB over 512
nodes
=3.2KB/node which is < most L1
Memory
Needs
Computation
Communication
Why Specialized Hardware?
Memory
• Consider Moore’s Law on
10X improvement in 5 years
vs. Anton’s 1000X in 1 year.
Communication
Needs
• Can great discoveries wait?
• Can use custom pipelines
with more precision, increased
Computation
datapath logic speed, over
less silicon area.
• Have Tailored ISA’s for geometric calculations+
• Programmability for accommodating various force fields and
integration algorithms
• Dedicated memory for each particle to accumulate forces
Communication Latency
• Low-latency, high-bandwidth
network within and between
ASICs.
• Push based communication
with counters (reduce wait).
• Set of Autonomous Direct
Memory Access (DMA) Engines
allowing for greater overlap of
communication and computation.
• Admission Control Features
Updating
force field
This node may
update for them
Subsystems of Anton
1. High-Throughput
Interaction
Subsystem (HTIS)
2. Flexible Subsystem
3. Communication
Subsystem
4. Memory Subsystem
High-Throughput Interaction Subsystem
– Executes Non-bonded MD interaction calculations (Charge
Spreading & Force Interpolation)
– Accumulates forces on each particle as data streams through.
– ICB Controls flow of data through the HTIS, programmable ISA
extensions, acts as a buffering, pre-fetching, synchronization, and
write back controller
Flexible Subsystem
•Initiates Force Computation Phase
•Calculates bonded force terms
•Force correction terms
•All integration tasks
Constraint Calculations (temp & pressure)
Pos. Vel. Updates
Atom Migration
All Maintenance Activities (boot, diagnostic, self-test,
loading sims, switching contexts, logging, check
pointing, error reporting).
Flexible Subsystem
• General Purpose Core w/ Caches
• Remote Access Unit
– Autonomous data transfers
• Geometry Cores
– MD calculations bonded
• Correction Pipeline
– Computes force correction terms
• Racetrack
– Local, internal connect for flex subsys components
• Ring Interface Unit
– Flex subsys to transfer packets to/from communication subsystem.
Communications Subsystem
• Routing 48-bit address space
• 16-bit node identifier
32-bit of address per node
• Flow Control
Memory Subsystem
• Provided access to ASIC DRAM
• Supports accumulation and synchronization
Simulation Evaluations
• 500X NAMD
80-100X Desmond
100X Blue Matter
Efficiency
• Increase system
simulation size
leads to increase
in efficiency.
Accuracy
• Force Error measured in relative rms force error
• Energy Drift
Download