1 Who needs a supercomputer? Professor Allan Snavely Professor Snavely, University of California University of California, San Diego and San Diego Supercomputer Center SAN DIEGO SUPERCOMPUTER CENTER UCSD Aren’t computers fast enough already? This talk argues computer’s are not fast enough already Nor do supercomputers just naturally get faster as a result of Moore’s Law. We explore implications of: Moore’s Law Amdahl’s Law Einstein’s Law Supercomputers are of strategic importance, enabling a “Third Way” of doing science-by-simulation Example: Terashake Earthquake simulation Viable National Cyberinfrastructure requires centralized supercomputers Supercomputing in Japan, Europe, India, China Why SETI@home + Moore’s Law does not solve all our problems PMaC Performance Modeling and Characterization Lab San Diego Supercomputer Center The basic components of a computer Your laptop has these: PMaC Performance Modeling and Characterization Lab San Diego Supercomputer Center Supercomputers (citius, altius , fortius) Supercomputers are just “faster, higher, stronger”, than your laptop, more and faster processors etc. capable of solving large scientific calculations PMaC Performance Modeling and Characterization Lab San Diego Supercomputer Center An army of ants approach In Supercomputers such as Blue Gene, DataStar, thousands of CPUs cooperate to solve scientific calculations PMaC Performance Modeling and Characterization Lab San Diego Supercomputer Center Computers live a billion seconds to our every one! Definitions: Latency is distance measured in time Bandwidth is volume per unit of time Thus, in their own sense of time, the latencies and bandwdiths across the machine room span 11 orders of magnitude! (from Nanoseconds to Minutes.) To a supercomputer, getting data from disk is like sending a rocketship to Saturn! PMaC Performance Modeling and Characterization Lab San Diego Supercomputer Center Moore’s Law Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months. Moore’s law has had a decidedly mixed impact, creating new opportunities to tap into exponentially increasing computing power while raising fundamental challenges as to how to harness it effectively. Things Moore never said: PMaC “computers double in speed every 18 months” “cost of computing is halved every 18 months” “cpu utilization is halved every 18 months” Performance Modeling and Characterization Lab San Diego Supercomputer Center Moore’s Law 100,000,000 10,000,000 Transistors R10000 Pentium 1,000,000 i80386 i80286 100,000 R3000 R2000 i8086 10,000 i8080 i4004 1,000 1970 1975 1980 1985 1990 1995 2000 2005 Year Moore’s Law: the number of transistors per processor chip by doubles every 18 months. PMaC Performance Modeling and Characterization Lab San Diego Supercomputer Center Snavely’s Top500 Laptop? Among other startling implications of Moore’s Law is the fact that the peak performance of the typical laptop would have placed it as one of the 500 fastest computers in the world as recently as 1995. Shouldn’t I just go find another job now? No, because Moore’s Law has several more subtle implications and these have raised a series of challenges to utilizing the apparently ever-increasing availability of compute power; these implications must be understood to see where we are today in High Performance superComputing (HPC). PMaC Performance Modeling and Characterization Lab San Diego Supercomputer Center The Vonn Neumann bottleneck Scientific calculations involve operations upon large amounts of data, and it is in moving data around within the computer that the trouble begins. As a very simple pedagogical example consider the expression A+B=C The computer has to load A and B, “+” them together, and store C “+” is fast by Moore’s Law, load and store is slow by Einstein’s Law PMaC Performance Modeling and Characterization Lab San Diego Supercomputer Center Supercomputer “Red Shift” While the absolute speed of all computer subcomponents have been changing rapidly, they have not all been changing at the same rate. While CPUs get faster they spend more time sitting around waiting for data PMaC Performance Modeling and Characterization Lab San Diego Supercomputer Center Amdahl’s Law The law of diminishing returns When a task has multiple parts, after you speed up one part a lot, the other parts come to dominate the total time An example from cycling: On a hilly closed-loop course you cannot ever average more than 2x your uphill speed even if you go downhill at the speed of light! For supercomputers this means even though processors get faster the overall time to solution is limited by memory and interconnect speeds (moving the data around) PMaC Performance Modeling and Characterization Lab San Diego Supercomputer Center Red Shift and the Red Queen It takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that! Corollary: Allan’s laptop is not a balanced system! System utilization is cut in half every 18 months? Fundamental R&D in latency hiding, high bandwidth network, Computer Architecture PMaC Performance Modeling and Characterization Lab San Diego Supercomputer Center 3 ways of science Experiment Theory Simulation PMaC Performance Modeling and Characterization Lab San Diego Supercomputer Center How Dangerous is the Southern San Andreas Fault? The SCEC TeraShake simulation is a result of immense effort from the Geoscience community for over 10 years Focus is on understanding big earthquakes and how they will impact sedimentfilled basins. Simulation combines massive amounts of data, high-resolution models, large-scale supercomputer runs Major Earthquakes on the San Andreas Fault, 1680-present 1906 M 7.8 1857 M 7.8 1680 M 7.7 ? PMaC TeraShake results provide new information enabling better Estimation of seismic risk Emergency preparation, response and planning Design of next generation of earthquake-resistant structures Such simulations provide potentially immense benefits in saving both many lives and billions in economic losses Performance Modeling and Characterization Lab San Diego Supercomputer Center TeraShake Animation PMaC Performance Modeling and Characterization Lab San Diego Supercomputer Center SDSC and Data Intensive Computing Data (more BYTES) Data-oriented Science and Engineering Environment TeraShake Brain mapping Home, Lab, Campus, Desktop Traditional HPC environment Compute (more FLOPS) PMaC Performance Modeling and Characterization Lab San Diego Supercomputer Center The Japanese Earth Simulator Took U.S. HPC Community by surprise in 2002 – “Computenik” For 2 years had more flops capacity than top 5 U.S. systems Approach based on specialized HPC design Still has more data moving capacity Sparked “space race” in HPC, Blue Gene surpassed for flops 2005 PMaC Performance Modeling and Characterization Lab San Diego Supercomputer Center Summary “Red Shift” means the promise implied by Moore’s Law is largely unrealized for scientific simulation that by necessity operate on large data Consider “The Butterfly Effect” Supercomputer Architecture is a hot field Challenges from Japan, Europe, India, China Large centralized, specialized compute engines are a vital national strategic resources Grids, utility programing, SETI@home etc. do not meet all the needs of largescale scientific simulation for reason that should now be obvious Consider a galactic scale PMaC Performance Modeling and Characterization Lab San Diego Supercomputer Center