NCSA is the Leading Edge Site for the National Computational Science Alliance www.ncsa.uiuc.edu National Computational Science Alliance Scientific Applications Continue to Require Exponential Growth in Capacity = Recent Computations by NSF Grand Challenge Research Teams = Next Step Projections by NSF Grand Challenge Research Teams = Long Range Projections from Recent Applications Workshop 1014 Turbulent Convection in Stars ASCI in 2004 NSF in 2004 (Projected) M B 12 E 10 Y 2000 NSF M T Leading Edge O E R S Y 1010 QCD Computational Cosmology 100 year climate model in hours 1995 NSF Capability Atomic/Diatomic Interaction Molecular Dynamics for Biological Molecules 108 108 1010 1012 1014 1016 1018 1020 MACHINE REQUIREMENT IN FLOPS From Bob Voigt, NSF National Computational Science Alliance The Promise of the Teraflop From Thunderstorm to National-Scale Simulation Simulation by Wilhelmson, et al.; Figure from Supercomputing and the Transformation of Science, Kaufmann and Smarr, Freeman, 1993 National Computational Science Alliance Accelerated Strategic Computing Initiative is Coupling DOE Defense Labs to Universities • Access to ASCI Leading Edge Supercomputers • Academic Strategic Alliances Program • Data and Visualization Corridors http://www.llnl.gov/asci-alliances/centers.html National Computational Science Alliance Comparison of the DoE ASCI and the NSF PACI Origin Array Scale Through FY99 Los Alamos Origin System FY99 5-6000 processors NCSA Proposed System FY99 6x128 and 4x64=1024 processors www.lanl.gov/projects/asci/bluemtn /Hardware/schedule.html National Computational Science Alliance NCSA Combines Shared Memory Programming with Massive Parallelism CM-5 CM-2 Future Upgrade Under Negotiation with NSF National Computational Science Alliance The Exponential Growth of NCSA’s SGI Shared Memory Supercomputers SN1 1000 Origin 100 Power Challenge 10 Challenge Jan-01 Jan-00 Jan-99 Jan-98 Jan-97 Jan-96 Jan-95 1 Jan-94 SGI Processors 10000 Doubling Every Nine Months! National Computational Science Alliance TOP500 Systems by Vendor 500 Other Japanese Number of Systems 400 Other DEC Intel Japanese TMC Sun DEC Intel HP 300 TMC IBM Sun Convex HP 200 Convex SGI IBM SGI 100 CRI TOP500 Reports: http://www.netlib.org/benchmark/top500.html Jun-98 Nov-97 Jun-97 Nov-96 Jun-96 Nov-95 Jun-95 Nov-94 Jun-94 Nov-93 0 Jun-93 CRI National Computational Science Alliance Why NCSA Switched From Vector to RISC Processors NCSA 1992 Supercomputing Community 150 Cray Y-MP4 / 64 March, 1992 - February, 1993 100 Average Performance, Users > 0.5 CPU Hour Peak Speed Y-MP1 50 Peak Speed MIPS R8000 300 280 260 240 220 200 180 160 140 120 100 80 60 40 0 20 Number of Users Average Speed 70 MFLOPS Average User MFLOPS National Computational Science Alliance Replacement of Shared Memory Vector Supercomputers by Microprocessor SMPs Top500 Installed SC’s 500 MPP SMP/DSM PVP 400 300 200 100 TOP500 Reports: http://www.netlib.org/benchmark/top500.html Jun-98 Jun-97 Jun-96 Jun-95 Jun-94 Jun-93 0 National Computational Science Alliance Top500 Shared Memory Systems SMP + DSM Systems Vector Processors TOP500 Reports: http://www.netlib.org/benchmark/top500.html Jun-98 Nov-97 Jun-97 Nov-96 Jun-96 0 Nov-95 100 Jun-95 Jun-98 Nov-97 Jun-97 Nov-96 Jun-96 Nov-95 Jun-95 Nov-94 Jun-94 Nov-93 0 200 Nov-94 100 USA Jun-94 200 300 Nov-93 Europe Japan USA Jun-93 Number of Systems 300 Jun-93 Number of Systems PVP Systems Microprocessors National Computational Science Alliance Simulation of the Evolution of the Universe on a Massively Parallel Supercomputer 12 Billion Light Years 4 Billion Light Years Virgo Project - Evolving a Billion Pieces of Cold Dark Matter in a Hubble Volume 688-processor CRAY T3E at Garching Computing Centre of the Max-Planck-Society http://www.mpg.de/universe.htm National Computational Science Alliance Limitations of Uniform Grids for Complex Scientific and Engineering Problems Gravitation Causes Continuous Increase in Density Until There is a Large Mass in a Single Grid Zone 512x512x512 Run on 512-node CM-5 Source: Greg Bryan, Mike Norman, NCSA National Computational Science Alliance Use of Shared Memory Adaptive Grids To Achieve Dynamic Load Balancing 64x64x64 Run with Seven Levels of Adaption on SGI Power Challenge, Locally Equivalent to 8192x8192x8192 Resolution Source: Greg Bryan, Mike Norman, John Shalf, NCSA National Computational Science Alliance Extreme and Large PIs Dominant Usage of NCSA Origin January thru April, 1998 10 to 100 100 to 1k 1 to 10 100k to 1 M 100000 1k to 10k 10000 1000 100 10 Rank 181 166 151 136 121 106 91 76 61 46 31 16 1 1 CPU-Hours Burned 1000000 10k to 100k National Computational Science Alliance Disciplines Using the NCSA Origin 2000 CPU-Hours in March 1995 Molecular Biology Industry Other Particle Physics Physics Astronomy Chemistry Engineering CFD Materials Sciences National Computational Science Alliance Solving 2D Navier-Stokes Kernel Performance of Scalable Systems Preconditioned Conjugate Gradient Method With Multi-level Additive Schwarz Richardson Pre-conditioner (2D 1024x1024) 7 Origin-DSM Origin-MPI 6 NT-MPI Gigaflops 5 SP2-MPI T3E-MPI 4 SPP2000-DSM 3 2 1 60 50 40 30 20 10 0 0 Processors Source: Danesh Tafti, NCSA National Computational Science Alliance A Variety of Discipline Codes Single Processor Performance Origin vs. T3E Single Processor MFLOPS 160 140 QMC 120 RIEMANN 100 Laplace 80 QCD 60 PPM PIMC 40 ZEUS 20 0 Origin T3E National Computational Science Alliance Alliance PACS Origin2000 Repository Kadin Tseng, BU, Gary Jensen, NCSA, Chuck Swanson, SGI John Connolly, U Kentucky Developing Repository for HP Exemplar http://scv.bu.edu/SCV/Origin2000/ National Computational Science Alliance High-End Architecture 2000Scalable Clusters of Shared Memory Modules • NEC SX-5 Each is 4 Teraflops Peak – 32 x 16 vector processor SMP – 512 Processors – 8 Gigaflop Peak Processor • IBM SP – 256 x 16 RISC Processor SMP – 4096 Processors – 1 Gigaflop Peak Processor • SGI Origin Follow-on – 32 x 128 RISC Processor DSM – 4096 Processors – 1 Gigaflop Peak Processor National Computational Science Alliance Emerging Portable Computing Standards • • • • HPF MPI OpenMP Hybrids of MPI and OpenMP National Computational Science Alliance Basket of Applications Average Performance as Percentage of Linpack Performance 1800 22% 1600 1400 Linpack 1200 Apps. Ave. Applications Codes: 25% 1000 800 14% 19% 600 33% 26% 400 200 CFD Biomolecular Chemistry Materials QCD 0 T90 C90 SPP2000 SP2160 Origin 195 PCA National Computational Science Alliance Harnessing Distributed UNIX Workstations University of Wisconsin Condor Pool Condor Cycles CondorView, Courtesy of Miron Livny, Todd Tannenbaum(UWisc) National Computational Science Alliance Workstations Shipped (Millions) NT Workstation Shipments Rapidly Surpassing UNIX 1.4 1.2 UNIX 1 NT 0.8 0.6 0.4 0.2 0 1995 1996 1997 Source: IDC, Wall Street Journal, 3/6/98 National Computational Science Alliance First Scaling Testing of ZEUS-MP on CRAY T3E and Origin vs. NT Supercluster “Supercomputer performance at mail-order prices”-- Jim Gray, Microsoft access.ncsa.uiuc.edu/CoverStories/SuperCluster/super.html • Alliance Cosmology Team • Andrew Chien, UIUC • Rob Pennington, NCSA 140 8 120 7 T3E Origin NT/Intel 6 200 180 160 140 0 120 0 100 1 80 20 NT 2 Origin 40 60 3 40 60 4 20 80 5 0 GFLOPS 100 T3E Single Processor Speed on ZEUS-MP (MFLOPS) Zeus-MP Hydro Code Running Under MPI Processors National Computational Science Alliance NCSA NT Supercluster Solving Navier-Stokes Kernel Single Processor Performance: MIPS R10k Intel Pentium II 117 MFLOPS 80 MFLOPS 7 60 NT MPI Origin MPI Origin SM Perfect 40 5 Gigaflops 30 20 4 3 Processors 70 60 50 40 30 20 60 50 40 30 0 20 0 10 1 0 10 10 2 0 Speedup 50 NT MPI Origin MPI Origin SM 6 Processors Preconditioned Conjugate Gradient Method With Multi-level Additive Schwarz Richardson Pre-conditioner (2D 1024x1024) Danesh Tafti, Rob Pennington, Andrew Chien NCSA National Computational Science Alliance Near Perfect Scaling of Cactus 3D Dynamic Solver for the Einstein GR Equations 120 Origin NT SC Scaling 100 80 60 40 Ratio of GFLOPs Origin = 2.5x NT SC 20 Cactus was Developed by Paul Walker, MPI-Potsdam UIUC, NCSA 120 100 80 60 40 20 0 0 Processors Danesh Tafti, Rob Pennington, Andrew Chien NCSA National Computational Science Alliance NCSA Symbio - A Distributed Object Framework Bringing Scalable Computing to NT Desktops • Parallel Computing on NT Clusters – Briand Sanderson, NCSA – Microsoft Co-Funds Development • Features – Based on Microsoft DCOM – Batch or Interactive Modes – Application Development Wizards • Current Status & Future Plans – Symbio Developer Preview 2 Released – Princeton University Testbed http://access.ncsa.uiuc.edu/Features/Symbio/Symbio.html National Computational Science Alliance The Road to Merced http://developer.intel.com/solutions/archive/issue5/focus.htm#FOUR National Computational Science Alliance