Introductory Seminar on Research talk 2011

Introduction to Research 2011 Ashok Srinivasan Florida State University www.cs.fsu.edu/~asriniva Part of the machine room at ORNL The Cell processor powers the Roadrunner at LANL Images from ORNL, IBM, NVIDIA NVIDIA GPUs power Tianhe-1A in China Outline  Research High Performance Computing  Applications and Software   Multicore processors  Massively parallel processors  Computational nanotechnology  Simulation-based policy making Potential Research Topics Research Areas  High Performance Computing, Applications in Computational Sciences, Scalable Algorithms, Mathematical Software  Current topics: Computational Nanotechnology, HPC on Multicore Processors, Massively Parallel Applications  New Topics: Simulation-based policy analysis  Old Topics: Computational Finance, Parallel Random Number Generation, Monte Carlo Linear Algebra, Computational Fluid Dynamics, Image Compression Importance of Supercomputing  Fundamental scientific understanding   Solution of bigger problems   Automobile crash tests Solutions with time constraints   Climate modeling More accurate solutions   Nano-materials, drug design Disaster mitigation Study of complex interactions for policy decisions  Urban planning Some Applications  Increasing relevance to industry   In 1993, fewer than 30% of top 500 supercomputers were commercial, now, 57% are commercial A variety of application areas Commercial         Finance and insurance Medicine Aerospace and Automobiles Telecom Oil exploration Shoes! (Nike) Potato chips! Toys! Scientific        Weather prediction Earthquake modeling Epidemic modeling Materials Energy Computational biology Astro-physics Supercomputing Power The amount of parallelism too is increasing, with the high end having over 200,000 cores Geographic Distribution    North America has over half the top 500 systems However, Europe and East Asia too have a significant share China is determined to be a supercomputing superpower   Two of its national supercomputing centers have top-five supercomputers Japan has the top machine and two in the top five  Planning a $ 1.3 billion exascale supercomputer in 2020 Asian Supercomputing Trends Challenges in Supercomputing  Hardware can be obtained with enough money  But obtaining good performance on large systems is difficult     Some DOE applications ran at 1% efficiency on 10,000 cores They will have to deal with a million threads soon, and with a billion at the exa-scale Don’t think of supercomputing as a means of solving current problems faster, but as a means of solving problems we earlier thought we could not solve Development of software tools to make use of the machines easier Architectural Trends  Massive parallelism    10K processor systems will be commonplace Large end already has over 500K processors Single chip multiprocessing   All processors will be multicore Heterogeneous multicore processors       Cell used in the PS3 GPGPU 80-core processor from Intel Processors with hundreds of cores are already commercially available Distributed environments, such as the Grid But it is hard to get good performance on these systems Accelerating Applications with GPUs  Over a hundred cores per GPU  Hide memory latency with thousands of threads  Can accelerate a traditional computer to a teraflop  GPU cluster at FSU  Quantum Monte Carlo applications  Algorithms  Linear algebra, FFT, compression, etc Small Discrete Fourier Transforms (DFT) on GPUs  GPUs are effective for large DFTs, but not small DFTs  However, they can be effective for a large number of small DFTs  Useful for AFQMC  We use the asymptotically slow matrix-multiplication based DFT for very small sizes  We combine it with mixed-radix for larger sizes  We use asynchronous memory transfer to deal with host-device data transfer overhead Comparison of DFT Performance Comparison of 512 simultaneous DFTs without host-device data transfer N 4 8 12 16 20 24 Mixed Radix Time: µs/DFT 0.043 0.214 0.550 1.14 1.96 3.19 M atrix M ultipl ication Time: µs/DFT 0.038 0.206 0.716 1.95 3.09 6.71 Cooley Tukey Time: µs/DFT 0.115 0.353 1.96 CUFFT Time: µs/DFT 18.3 23.5 45.8 35.4 47.6 46.4 FFTW Time: µs/DFT 2.87 3.41 4.78 6.81 11.2 17.1 3-D DFTs 2-D DFTs N 4 8 12 16 20 24 Mixed Radix Time: µs/DFT 0.621 4.04 12.4 34.8 71.9 138 M atrix M ultipl ication Time: µs/DFT 0.578 3.43 12.7 42.9 77.5 172 Cooley Tukey Time: µs/DFT 1.06 6.01 58.2 CUFFT Time: µs/DFT 50.1 84.7 327 836 566 678 FFTW Time: µs/DFT 3.57 12.2 38.3 92.6 230 513 Petascale Quantum Monte Carlo  Originally a DOE funded project involving collaboration between ORNL, UIUC, Cornell, UTK, CWM, and NCSU  Now funded by ORAU/ORNL  Scale Quantum Monte Carlo applications to petascale (one million gigaflops) machines  Load balancing, fault tolerance, other optimizations Load Balancing  In current implementations, such as QWalk and QMCPack, cores send excess walkers to cores with fewer walkers  In the new algorithm (alias method), cores may send more than their excess, and receive walkers even if they originally had an excess  Load can be balanced with each core receiving from at most one other core  Also optimal in maximum number of walkers received  Total number of walkers sent may be twice the optimal Performance Comparison Comparisons with QWalk Mean number of walkers migrated Maximum number of receives Process-Node Affinity Node allocation is not necessarily ideal for minimizing communication Process-node affinity can, therefore, be important Allocated nodes for a 12,000 core run on Jaguar Load Balancing with Affinity Renumbering the nodes improves load balancing and AllGather time Basic load balancing Load balancing after renumbering Results on Jaguar Potential Research Topics  High Performance Computing on Multicore Processors    Algorithms, Applications, and libraries on GPUs Applications on Massively Parallel Processors  Quantum Monte Carlo applications  Load balancing and communication optimizations Simulation-based policy decisions  Combine scientific computing with models of social interactions to help make policy decisions

Introductory Seminar on Research talk 2011

Related documents

Products

Support

Introductory Seminar on Research talk 2011

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib