Anton, a Special-Purpose Machine for Molecular Dynamics Simulation By David E. Shaw et al Presented by Bob Koutsoyannis The Anton Legacy • Anton van Leeuwenhoek “Father of Microscopy” • First to see bacteria and other micro organisms • Objective: Improve the tools available to scientists to further our understanding of organisms & diseases Anton the Machine • Specialized Massively Parallel Machine being built to improve Molecular Dynamic Simulations. • In the works to be completed by 2009 • Biological processes spatially distributed among many nodes in a 3D torus. • MD specific hardware • Novel parallel algorithms Molecular Dynamics Simulation • Models the motions and interactions of molecular systems – Proteins – Cell Membranes – DNA – (atomic level simulations) Motivation • Life Saving… • Used to visualize biochemical phenomena that cannot be seen in lab experiments. – Protein Folding – Protein, Protein interactions – Protein, Drug interaction • Key for Developing Drugs What makes one MD simulator better than the Next? • Time Scale – Being able to simulate the interaction between molecules for more than a nanosecond. • Problem Size – Why is a millisecond of simulation out of the scope of our current technology? – Consider 200,000 molecules • 1012 time steps to simulate a millisecond – Each time step requires intense arithmetic computation on all 200,000 molecules What makes one MD simulator better than the Next? • Other Projects Addressing MD Sims – Folding@Home • Network of 200,000 PC’s • Large sample for independent molecular sims • But no millisecond simulations – FASTRUN, MDGRAPE, MD Engine • Good with larger molecular system sims • Have strong arithmetic units • Still limited by communication bottlenecks MD Simulator Requirements • Force Calculation • (getting an idea of the level of computation needed) M1 • M2 Molecular mechanics force fields used to model the total PE of a system. • Input: X,Y,Z Outputs: Force Quantities MD Simulator Requirements • Force Calculation • (getting an idea of the level of computation needed) • For every time step, the force fields must be updated. • FFT, Convolution, Inverse FFT (Computationally expensive operations) • For 200,000 molecules/step… • 1) Need a huge number of arithmetic processing elements MD Simulator Requirements • Integration • (getting an idea of the level of computation needed) • For every time step, updates of atomic positions and velocities must be made. • Global actions and Constraints must be enforced on the entire system (temperature, pressure, optimizations.) MD Simulator Requirements • Parallelization • (getting an idea of the level of computation needed) • For every time step, every atom must communicate within its cutt-off radius with every other atom. • 2) A lot of inter-processor communication that can be scaled well is needed. MD Simulator Requirements • Parallelization • (getting an idea of the level of computation needed) • Whole System is broken down into boxes (processing nodes) • Each node handles the bonded interactions within • NT method for non-bonded interactions (much more common). • NT method for Atom Migration Why Specialized Hardware? • 1) Need a huge number of arithmetic processing elements • 2) A lot of inter-processor communication that can be scaled well is needed. • 3) Memory is not an issue – With 25,000 atoms (64bytes each) total=1.6MB over 512 nodes =3.2KB/node which is < most L1 Memory Needs Computation Communication Why Specialized Hardware? Memory • Consider Moore’s Law on 10X improvement in 5 years vs. Anton’s 1000X in 1 year. Communication Needs • Can great discoveries wait? • Can use custom pipelines with more precision, increased Computation datapath logic speed, over less silicon area. • Have Tailored ISA’s for geometric calculations+ • Programmability for accommodating various force fields and integration algorithms • Dedicated memory for each particle to accumulate forces Communication Latency • Low-latency, high-bandwidth network within and between ASICs. • Push based communication with counters (reduce wait). • Set of Autonomous Direct Memory Access (DMA) Engines allowing for greater overlap of communication and computation. • Admission Control Features Updating force field This node may update for them Subsystems of Anton 1. High-Throughput Interaction Subsystem (HTIS) 2. Flexible Subsystem 3. Communication Subsystem 4. Memory Subsystem High-Throughput Interaction Subsystem – Executes Non-bonded MD interaction calculations (Charge Spreading & Force Interpolation) – Accumulates forces on each particle as data streams through. – ICB Controls flow of data through the HTIS, programmable ISA extensions, acts as a buffering, pre-fetching, synchronization, and write back controller Flexible Subsystem •Initiates Force Computation Phase •Calculates bonded force terms •Force correction terms •All integration tasks Constraint Calculations (temp & pressure) Pos. Vel. Updates Atom Migration All Maintenance Activities (boot, diagnostic, self-test, loading sims, switching contexts, logging, check pointing, error reporting). Flexible Subsystem • General Purpose Core w/ Caches • Remote Access Unit – Autonomous data transfers • Geometry Cores – MD calculations bonded • Correction Pipeline – Computes force correction terms • Racetrack – Local, internal connect for flex subsys components • Ring Interface Unit – Flex subsys to transfer packets to/from communication subsystem. Communications Subsystem • Routing 48-bit address space • 16-bit node identifier 32-bit of address per node • Flow Control Memory Subsystem • Provided access to ASIC DRAM • Supports accumulation and synchronization Simulation Evaluations • 500X NAMD 80-100X Desmond 100X Blue Matter Efficiency • Increase system simulation size leads to increase in efficiency. Accuracy • Force Error measured in relative rms force error • Energy Drift