Science Impact of TeraGrid Ralph Roskies, Scientific Director Pittsburgh Supercomputing Center roskies@psc.edu April 6,2009 1 High Performance Computing is Transforming Physics Research (Slides 2-7 hopefully all in John’s overviewI intend to delete these slides (ralph)??) • TeraGrid ties together (using high performance networks) the high end computational resources (supercomputing, storage, visualization, data collections, science gateways) provided by NSF for the nation’s researchers, • Supported by computing and technology experts, many who have science PhDs and speak the users’ language. • World-class facilities, on a much larger scale than ever before, present major new opportunities for physics researchers to carry out computations that would have been infeasible just a few years ago. 2 TeraGrid Map 3 Hardware must be heterogeneous • Different capabilities • Different vendors • Potential for great burden on people trying to use more than one system. 4 Integrated View for Users • Single signon, • Single application form for access (it’s free- more later) • Single ticket system (especially useful for problems between systems) • Coordinated user support (find experts at any site) • Simplified data movement; (e.g. compute in one place, analyze in another) • Makes data sharing easy 5 Diversity of Resources (not exhaustive) • Very Powerful Tightly Coupled Distributed Memory – Trk2a-Texas (TACC)- Ranger (62,976 cores, 579 Teraflops, 123 TB memory) – Trk2b-Tennessee (NICS)- Kraken (Cray XT5, 66,048 cores, 608 teraflops, over 1 petaflop later in 2009). • Shared Memory – NCSA- Cobalt, Altix, 8 TF, 3 TB shared memory – PSC- Pople- Altix, 5 Tflop, 1.5 TB shared memory • Clusters with Infiniband – NCSA-Abe- 90 Tflops – TACC-Lonestar- 61 Tflops – LONI-Queen Bee 51 Tflops • Condor Pool (Loosely Coupled) – Purdue- up to 22,000 cpus • Visualization Resources – Purdue-TeraDRE-48 node nVIDIA GPUs – TACC-Spur- 32 nVIDIA GPUs • Various Storage Resources 6 Resources to come • Recognize that science is being increasingly data driven (LHC, LSST, climate, weather, medical…) • PSC- large shared memory system • Track2D being competed – A data-intensive HPC system – An experimental HPC system – A pool of loosely coupled grid computing resources – An experimental, high-performance grid test-bed • Track1 System at NCSA- 10 Pflop peak, 1 Pflop sustained on serious applications in 2011 7 Impacts of TeraGrid on Scientific Fields • HPC makes some fields possible as we know theme.g. cosmology, QCD • HPC adds essential realism to fields like biology, fluid dynamics, materials science, earthquake and atmospheric science • HPC is beginning to impact fields like social science and machine learning • In each case, it is not only powerful hardware – TeraGrid support enables users to use the hardware effectively – Development of new algorithms also fuels the progress 8 Cosmology and Astrophysics • Three significant figure accuracy predictions of the age of the universe, fraction of dark matter etc. would have seemed ridiculous just a few years ago. • Small (1 part in 105) spatial inhomogeneities 380,000 years after the Big Bang, as revealed by COBE and later WMAP Satellite data, get transformed by gravitation into the pattern of severe inhomogeneities (galaxies, stars, voids etc.) that we see today. • Must use HPC to evolve the universe from that starting point to today, to compare with experiment. • Is the distribution of galaxies and voids appropriate? • Does lensing agree with observations.? 9 Number of particles/year Courtesy Tiziana di Matteo, CMU 10 Kritsuk et al- UCSD Turbulence in Molecular Clouds • Reported last year on Mike Norman work, which requires adaptive mesh refinement (AMR) to zoom in on dense regions to capture the key physical processesgravitation, shock heating and radiative cooling of gas. • AMR particularly tricky with magnetic fields. Kritsuk et al developed new algorithm (PPML) for the MHD aspects and compared it to their older version ZEUS, as well as those of FLASH (Chicago), RAMSES(Saclay) • Found that turbulence obeys Kolmogoroff scaling even at Mach 6. • Need large shared memory capabilities for generating initial conditions, (AMR very hard to load-balance on distributed memory machines); then the largest distributed memory machines for the simulation; visualization; • Long term archival storage for configurations – biggest run (20483) produced 35 TB of data (at PSC). Much data movement between sites (17TB to SDSC). TeraGrid helped make major improvements in the scaling and efficiency of the code (ENZO), and in the visualization tools which are being stressed at these volumes. 11 Further astrophysics insights • FLASH group (Lamb- Chicago) – Used ANL visualization to understand implications how turbulence wrinkles a combustive flame front (important for understanding explosion of Type 1a supernovae). Found that turbulence behind the flame front is inhomogeneous and nonsteady, in contrast to the assumptions made by many theoretical models of turbulent burning. • Erik Schnetter (LSU) – Black hole mergers lead to potential gravitational wave signals for LIGO (NSF’s most expensive facility?) – Enabled by recent algorithmic advances • Mark Krumholz (UCSC) et al – Appear to have solved long-standing puzzle about formation of massive stars. Stars form as mass accreted from infalling gas. With spherical geometry, for stars >20 solar masses, outward pressure from photons should halt this infall. Including 2D with rotation, raise limit to about 40 Msun. But stars with masses as high as 120 Msun have been observed. Krumholz shows that 3-D models allow instabilities, which then allows more massive stars to form. 3D simulations much more demanding than 2D. Only feasible with major compute resources such as Datastar at SDSC. 12 Wide outreach • Benjamin Brown (U. Colorado & JILA) using a visualization tool (VAPOR) developed at NCAR in collaboration with the UC Davis and Ohio State and TACC’s Ranger to help the Hayden Planetarium produce a movie about stars. • The movie, which will reach an estimated audience of one million people each year, is slated to be released in 2009. • The sequences will include simulated “flybys” through the interior of the Sun, revealing the dynamos and convection that churn below the surface. Advantage of working at TACC is that one can do the visualization at the same place as the simulation; obviated moving 5 TB to Colorado. 13 Lattice QCD- MILC collaboration • Improved precision on “standard model”, required to uncover new physics. • Need larger lattices, lighter quarks • Large allocations • Frequent algorithmic improvements • UseTeraGrid resources at NICS, PSC, NCSA, TACC; DOE resources at Argonne, NERSC, specialized QCD machine at Brookhaven, cluster at Fermilab Will soon store results with The International Lattice Data Grid (ILDG), an international organization which provides standards, services, methods and tools that facilitates the sharing and interchange of lattice QCD gauge configurations among scientific collaborations (US, UK, Japan, Germany, Italy, France, and Australia) .http://www.usqcd.org/ildg/ 14 Gateways-Nanoscale Electronic Structure (nanoHUB, Klimeck, Purdue) • Challenge of designing microprocessors and other devices with nanoscale components. • Group is creating new content for simulation tools, tutorials, and additional educational material. Gateway enables on-line simulation through a web browser without the installation of any software. • nanoHUB.org hosts more than 90 tools, had >6200 users, ran>300,000 simulations, supported 44 classes, in 2008. (not all on TeraGrid- use their part of Purdue Condor cluster- TG numbers are 83 users, 1100 jobs) • Nanowire tool allows exploration of nanowires in circuits e.g. impact of fluctuations on robustness of circuit. • Largest codes operate at the petascale (NEMO3D, OMEN), using 32,768 cores of Ranger, 65,536 cores of Kraken with excellent scaling. • Communities develop the Gateways- TG helps interface that to TG resources. nanowire tool TG contributions •Improved security for Gateways; •Helped improve reliability of Condor-G code •Will benefit from improved metascheduling capabilities •Uses resources at NCSA, PSC, IU,ORNL and Purdue 15 Geosciences (SCEC) • Goal is understanding earthquakes and to mitigate risks of loss of life and property damage. • Spans the gamut from huge number of small jobs, to midsize jobs to the largest simulations. (Last year talked about largest simulations). • For largest runs, where they examine high frequency modes (short wave-length, so higher resolution) of particular interest to civil engineers, often need a preprocessing shared memory phase, followed by distributed memory runs using the Track2 machines at TACC, NICS. 2000-60,000 cores of Ranger, Kraken. • To improve the velocity model that goes into the large simulations, need mid-range core counts jobs doing full 3-D tomography (Tera3D); DTF and other clusters (e.g. Abe); Need large data available on disk (100 TB) Excellent example of coordinated ASTA supportCUI (SDSC) and Urbanic (PSC) interface with consultants at NICS, TACC, &NCSA to smooth migration of code. Improved performance 4x. Output is large data sets stored at NCSA, or SDSC’s GPFS, IRODS. Moving to DOE machine at Argonne. TG provided help with essential data transfer. 16 SCEC-PSHA • Using the large scale simulation data, estimate probablistic seismic hazard (PSHA) curves for sites in southern California (probability that ground motion will exceed some threshold over a given time period). • Used by hospitals, power plants etc. as part of their risk assessment. • For each location, need roughly 840,000 parallel short jobs (420,000 rupture forecasts, 420,000 extraction of peak ground motion). • Managing these requires effective grid workflow tools for job submission, data management and error recovery, using Pegasus (ISI) and Dagman (U of Wisconsin Condor group). • Targeting 20 locations this year. Southern California hazard map, -probability of ground motion >0.1g in next 50 years 17 CFD and Medicine (arterial flow) George Karniadakis- Brown • Strong relationship between blood flow pattern and formation of arterial disease such as atherosclerotic plaques • Disease develops preferentially in separated and re-circulating flow regions such as vessel bifurcations • 1D results feed 3D simulations, providing flow rate and pressure for boundary conditions • Very clever multiscale approach • Couples resources weakly in real time, but requires co-scheduling • MPICH-G2 used for intra-site and inter-site communications. 1D model 3d simulation 3d simulation 3d simulation 3d simulation 3d simulation 18 Medical Impact • Today, being used for validation and quantification of some of the pathologies • With realistic geometries, part of promise of patient specific treatment See also Oden Dynamic Data-Driven System for Laser Treatment of Cancer- automated laser surgery. 19 Biological Science • Huge impact of TeraGrid • Primarily large-scale molecular dynamics (MD) simulations (classical Newton laws) that elucidate how structure leads to function. • Major effort in scaling codes (e.g. AMBER, CHARMM, NAMD) to large distributed memory computers- very fruitful interaction between applications scientists and computer scientists (e.g. Schulten and Kale) • When breaking chemical bonds, need quantum mechanical methods (QM/MM), often best done on large sharedmemory systems • Generate very large datasets, so data analysis now becoming a serious concern. Considerable discussion of developing a repository of MD biological simulations, 20 but no agreements yet on formats. Aquaporins - Schulten group,UIUC • Aquaporins are proteins which conduct large volumes of water through cell walls while filtering out charged particles like hydrogen ions (protons). • Start with known crystal structure, simulate over 100,000 atoms, using NAMD • Water moves through aquaporin channels in single file. Oxygen leads the way in. At the most constricted point of channel, water molecule flips. Protons can’t do this. Aquaporin Mechanism Animation pointed to by 2003 Nobel chemistry prize announcement for structure of aquaporins (Peter Agre) The simulation helped explain how the structure led to the function Actin-Arp2/3 branch junction Greg Voth-Utah Actin-Arp2/3 branch junction – key piece of the cellular cytoskeleton, helping to confer shape and structure to most types of cells. – cannot be crystallized to obtain high-resolution structures. – working with leading experimental groups, MD simulations are helping to refine the structure of the branch junction. – 3M atoms, linear scaling to 4000 processors on Kraken, The all-atom molecular dynamics simulations form the basis for developing new coarse-grained models of the branch junction to model larger scale systems. 23 HIV-1 Protease Inhibition: Gail Fanucci (U. Florida)& Carlos Simmerling (Stony Brook) HIV-1 protease – Essential enzyme in life cycle of HIV-1 – Popular drug target for antiviral therapy – It has been hypothesized that mutations outside the active site affect the mobility of 2 gate-keeper molecular flaps near the active site, and this affects inhibitor binding Tagged two sites on flaps, and used electron paramagnetic resonance measurements to measure distance between them; excellent agreement with MD simulations, Provides a molecular view of how mutations affect the conformation. Wild type- black & 2 mutants 24 Similar slides for • Schulten, UIUC The Molecular Basis of Clotting • McCammon, UCSD Virtual Screening Led to Real Progress for African Sleeping Sickness • Baik, Indiana U. Investigating Alzheimer’s 25 Mechanical Engineering Numerical Studies of Primary Breakup of Liquid Jets in Stationary and Moving Gases Madhusudan Pai & Heinz Pitsch, Stanford; Olivier Desjardins, Colorado •Liquid jet breakup in automobile internal combustion engines and aircraft gas turbine combustors controls fuel consumption and formation of engine pollutants. •Immense economic and environmental significance •Predicting the drop size distribution that results from liquid jet breakup is an important unsolved problem •Current simulations (liquid Weber number ~3000, and Reynolds number ~5000), requires upwards of 260M cells and typically about DNS of a diesel jet (left) and a liquid jet in 2048 processors for detailed simulations. crossflow (right) 26 • Physically more realistic simulations will require liquid Weber and Reynolds numbers 10x higher. ??how does the computational complexity grow? • For validation, they compare with scaling of experiment e.g. something scales as weber1/2 • Used Queen Bee (LONI) for code development and scaling, Ranger for production. Highly accurate direct numerical simulation (DNS) to develop parameters that will be used in larger scale studies (LES) of engines. 27 Materials Science Spider Silk’s Strength (Markus Buehler, MIT) • The specific configuration of structural proteins and the hydrogen bonds that bind them together in spider silk makes the lightweight material as strong as steel, even though the “glue” of hydrogen bonds that hold spider silk together at the molecular level is 100 to 1,000 times weaker than the powerful glue of steel’s metallic bonds. • Used SDSC’s IBM Blue Gene (6144 processors) to simulate how spider silk compounds react at the atomic level to structural stresses. • Discovered what governs the rupture strength of H-bond assemblies, confirmed by direct large-scale fullatomistic MD simulation studies of beta-sheet structures in explicit solvent • This could help engineers create new materials that mimic spider silk’s lightweight robustness. Could also impact research on muscle tissue and amyloid fibers found in the brain which have similar beta-sheets structures, composed of hierarchical assemblies of Hbonds Ross Walker (SDSC) •implemented a needed parallel version of some serial restraint codes in NAMD and LAMMPS, for efficient implementation on BG. •Advised on appropriate numbers of BG processors to use; •Helped with visualization. 28 SIDGrid- Social Science Gateway Rick Stevens et al, Argonne • SIDGRID provides access to “multimodal” data - streaming data that change over time. e.g. as human subject views a video, heart rate, eye movement, and a video of the subject’s facial expressions are captured. Data are collected many times per second, sometimes at different timescales, and synchronized for analysis, resulting in large datasets. • The Gateway provides sophisticated analysis tools to study these datasets • SIDGrid uses TeraGrid resources for computationally intensive tasks including media transcoding (decoding and encoding between compression formats), pitch analysis of audio tracks, and functional Magnetic Resonance Imaging (fMRI) image analysis. • A new application framework has been developed to enable users to easily deploy new social science applications in the SIDGrid portal. SIDGrid launches thousands of jobs in a week. • Opening possibilities to community; • Gateway cited in publications in analysis of neuroimaging data, and in computational linguistics. TeraGrid staff will incorporate metascheduling capabilities, improve security models for community accounts, incorporate datasharing capabilities, and upgrade workflow tools 29 New communities- Machine Learning Thomas Sandholm, CMU, Poker – Poker is a game with imperfect knowledge – developing what appears to be the best computer poker capability – needs large shared memory Rob Farber & Harold Trease, PNNL Facial Recognition •import, interpret, database millions of images per second •far-faster-than-realtime facial recognition •near-linear scaling across 60,000 cores (Ranger) 30 New communities Virtual Pharmacy Clean Room Environment Steve Abel, Steve Dunlop, Purdue University • Created a realistic, immersive 3-D virtual pharmacy clean room for training pharmacy students, pharmacists and pharmacy technicians • Enables evaluation of clean room design and work flow by engineering researchers. • The 3-D model can be employed in multi-walled virtual environments. Eventual incorporation of force-feedback and haptic (touch and feel) technologies • 160 students used the room in 2008; almost unanimously, report the experience has given them a better understanding of, and made them more comfortable with, the clean room environment and procedures. TG Purdue staff helped the team in using TG’s distributed rendering service TeraDRE to render elements of the virtual clean room, including a flythrough movie in less than 48 hours, (would take five months on a single computer). 31 Data-Driven Hurricane prediction Fuqing Zhang, Penn State, with NOAA, Texas A&M collaborators • Tracked hurricanes Ike and Gustav in real-time • Used ensemble forecasting, and 40,000 cores of Ranger to update predictions. • First time use of data streamed directly from NOAA planes inside the storm. 32 Impacts of TeraGrid on Scientific Fields • HPC makes some fields possible as we know theme.g. cosmology, QCD • HPC adds essential realism to fields like biology, fluid dynamics, materials science, earthquake and atmospheric science • HPC is beginning to impact fields like social science and machine learning • In each case, it is not only powerful hardware – TeraGrid support enables users to use the hardware effectively – Development of new algorithms also fuels the progress 33 Transforming How We Do Science • TeraGrid coordination among sites, making the necessarily heterogeneous resources into one system, leads to much higher researcher productivity. • Faster turnaround leads to greater researcher productivity and changes the questions we ask in all disciplines. • Visualization aids understanding • Gateways open the field to many more researchers • High speed networks allow much greater span of collaborative activities, and better use of distributed heterogeneous resources 34