7th Korea-Japan Workshop on Computational Mechanics COSEIK & JSCES April 10, 2015 Development of Large-Scale Scientific & Engineering Applications on Post-Peta/Exascale Systems Kengo Nakajima1) Takahiro Katagiri2) 1) Ph.D, Professor, Information Technology Center, The University of Tokyo(2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan, E-mail: nakajima@cc.u-tokyo.ac.jp) 2) Ph.D, Associate Professor, Information Technology Center, The University of Tokyo(2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan, E-mail: katagiri@cc.u-tokyo.ac.jp) ppOpen-HPC is an open source infrastructure for development and execution of large-scale scientific applications on post-peta-scale (pp) supercomputers with automatic tuning (AT). ppOpen-HPC is focusing on parallel computers based on many-core architectures, and it consists of various types of libraries, which cover general procedures for scientific computation. Source code developed on a PC with a single processor is linked with these libraries, and the parallel code generated is optimized for post-peta-scale systems. In this presentation, recent achievements and progress of the “ppOpen-HPC” project are overviewed. Key Words : ppOpen-HPC, Post-Peta-Scale Systems, Automatic Tuning 1. Overview of ppOpen-HPC ppOpen-HPC[1,2] is an open source infrastructure for ppOpen-AT ppOpen-SYS development and execution of optimized and reliable simulation code on post-peta-scale (pp) parallel computers Libraries in ppOpen-APPL, ppOpen-MATH, and based on many-core architectures, and it consists of various ppOpen-SYS are called from user's programs written in types of libraries, which cover general procedures for scientific Fortran and C/C++ with MPI and OpenMP. In ppOpen-HPC, computation. Source code developed on a PC with a single we are focusing on five types of discretization methods for processor is linked with these libraries, and the parallel code scientific computing, which are FEM, FDM, FVM, BEM, and generated is optimized for post-peta-scale systems. The target DEM (Fig.2). ppOpen-APPL is a set of libraries covering post-peta-scale system is the Post T2K System of the various types of procedures for these five methods. Automatic University of Tokyo based on many-core architectures, such as tuning (AT) enables a smooth and easy shift to further Intel MIC/Xeon Phi. It will be installed in FY.2016 and its development on future architectures through the use of peak performance is expected to be 30 PFLOPS. ppOpen-HPC ppOpen-AT. supports approximately 2,000 users of the supercomputer multigrid, visualization, loose coupling, and etc., while system in the University of Tokyo, enabling them to switch ppOpen-SYS includes system software libraries related to from homogeneous multicore clusters to a post-peta-scale node-to-node communication and fault-tolerance. ppOpen-MATH is a set of libraries for system based on many-core architectures. User’s Program ppOpen-HPC is a five-year project (FY.2011-2015) and a part of the “Development of System Software Technologies for Post-Peta-Scale High Performance Computing” project funded by JST/CREST (Post-Peta CREST) [3]. ppOpen-APPL FEM FDMii FVM BEM ppOpen-MATH MG ii GRAPH VIS MP ppOpen-AT STATIC ii DYNAMIC ppOpen-SYS COMM ii FT ppOpen-HPC is developed by the University of Tokyo, Kyoto University, Hokkaido University, and JAMSTEC. The expertise of members covers a wide range of disciplines related to scientific computing. ppOpen-HPC includes the ppOpen-HPC following four components (Fig.1): ppOpen-APPL ppOpen-MATH Optimized Application with Optimized ppOpen-APPL, ppOpen-MATH Fig.1 Overview of ppOpen-HPC DEM 2. ppOpen-AT ppOpen-AT Difference Method in Era of 200+ Thread Parallelisms, automatically and adaptively generates optimum implementation for efficient memory accesses in Annual Meeting on Advanced Computing System and Infrastructure (ACSI), Tsukuba, Japan, 2015. procedures of scientific computing in each component of ppOpen-APPL (Fig.3). A directive-based special AT language is being developed. Figure 4 shows an example of application of ppOpen-AT to 3D FDM code of seismic simulations developed on ppOpen-APPL/FDM for Intel Xeon/Phi. ppOpen-AT utilizes well-known loop FEM FDM Finite Element Method Finite Difference Method FVM Finite Volume Method transformation techniques. The AT framework is carefully designed to minimize the software stack in order to satisfy the requirements of a many-core architecture. The results of evaluations conducted using ppOpen-AT indicate BEM that maximum speedup factors greater than 550% are obtained Discrete Element Method Fig.2 Target Applications of ppOpen-HPC when it is applied in eight nodes of the Intel Xeon Phi (Fig.4). Before User ① ppOpen-APPL /* Release-time Knowledge Automatic ppOpen-AT Code Directives ② Generation ppOpen-APPL / * Library Developer 3. ppOpen-MATH/MP and Coupling DEM Boundary Element Method ppOpen-MATH/MP is a coupling software applicable to the Selection models employing various discretization. So as to demonstrate ppOpen-AT Auto-Tuner applicability of ppOpen-MATH/MP, we utilized it to coupling ⑤ of NICAM (atmospheric model, semi-unstructured FVM) and wide applicability, ppOpen-MATH/MP is designed so that Auto-tuned Kernel ⑥ Execution users can implement their own interpolation code. In addition Library User COCO (ocean model, structured FDM) coupling. For realizing to NICAM-COCO coupling, we have implemented NICAM ③ ④ Library Call Candidate Candidaten 3 Candidate 2 Candidate 1 Runtime Execution Time :Target Computers Fig.3 Procedures of ppOpen-AT and IO component coupling. We developed an IO program that converts the icosahedral grid to the lat-lon grid and is executed in parallel with NICAM. Fig.5 is a schematic of the coupling system. 4. Public Release of ppOpen-HPC The libraries developed for ppOpen-HPC are open for public use, and can http://ppopenhpc.cc.u-tokyo.ac.jp/. be downloaded ppOpen-HPC will at be installed on various types of supercomputers, and will be utilized for research and development that requires large-scale supercomputer systems. We are now focusing on development Fig.4 Effect of AT on ppOpen-APPL/FDM NICAM of ppOpen-HPC for Intel Xeon/Phi architecture. REFERENCES IO component [1] Nakajima, K., ppOpen-HPC: Open Source Infrastructure for Development and Execution of Large-Scale Scientific Icosahedral grid ppOpenMATH/MP Applications on Post-Peta-Scale Supercomputers with Automatic Tuning (AT), ATIP '12 Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for COCO High-Performance Computing: Does Asia Lead the Way?, Latitude-Longitude grid ACM Digital Library (ISBN: 978-1-4503-1644-6), 2012. [2] ppOpen-HPC: http://ppopenhpc.cc.u-tokyo.ac.jp/ [3] Post-Peta CREST: http://postpeta.jst.go.jp/en/ [4] Katagiri, T. et al., Towards Auto-tuning for the Finite Tri-Polar grid Fig.5 Overview of ppOpen-MATH/MP