WTEC Panel on High End Computing in Japan Site visits: March 29 - April 3, 2004 Study Commissioned By: National Coordination Office Department of Energy National Science Foundation National Aeronautics and Space Administration WTEC Overview • Provides assessments of research and development • This was one of 55 international technology assessments done by WTEC • WTEC Process Write proposals for NSF “umbrella” grants Put together a coalition of sponsors Recruit a panel of experts Conduct the study with on-site visits Publish a report Full text reports at wtec.org 7/14/2004 WTEC High End Computing in Japan 2 Purpose & Scope of this Study • Gather information on current status and future trends in Japanese high end computing Govt agencies, research communities, vendors Focus on long-term HEC research in Japan • Compare Japanese and U.S. HEC R&D • Provide review of ES development process and operational experience Include user experience and its impact on computer science and computational science communities Report on follow-on projects • Determine HEC areas amenable for Japan-U.S. cooperation to accelerate future advances 7/14/2004 WTEC High End Computing in Japan 3 WTEC HEC Panel Members Al Trivelpiece (Panel Chair) Peter Paul Former Director Oak Ridge National Laboratory Deputy Director, S&T Brookhaven National Laboratory Rupak Biswas Kathy Yelick Group Lead, NAS Division NASA Ames Research Center Computer Science Professor University of California, Berkeley Jack Dongarra Horst Simon (Advisor) Director, Innovative Computing Lab University of Tennessee & Oak Ridge National Laboratory Director, NERSC Lawrence Berkeley National Lab Dan Reed (Advisor) Praveen Chaudhari (Advisor) Computer Science Professor University of North Carolina, Chapel Hill Director Brookhaven National Laboratory 7/14/2004 WTEC High End Computing in Japan 4 Sites Visited (1) 1. Earth Simulator Center 2. Frontier Research System for Global Change 3. National Institute for Fusion Science (NIFS) 4. Japan Aerospace Exploration Agency (JAXA) 5. University of Tokyo 6. Tokyo Institute of Technology 7. National Institute of Advanced Industrial S&T (AIST) 8. High Energy Accelerator Research Org. (KEK) 9. Tsukuba University 10. Inst. of Physical and Chemical Research (RIKEN) 11. National Research Grid Initiative (NAREGI) 12. Research Org. for Information Sci. & Tech. (RIST) 13. Japan AtomicWTEC Energy Institute (JAERI) 5 7/14/2004 High EndResearch Computing in Japan Sites Visited (2) 14. Council for Science and Technology Policy (CSTP) 15. Ministry of Education, Culture, Sports, Science, and Technology (MEXT) 16. Ministry of Economy, Trade, and Industry (METI) 17. 18. 19. 20. 21. Fujitsu Hitachi IBM-Japan Sony Computer Entertainment Inc. (SECI) NEC 7/14/2004 WTEC High End Computing in Japan 6 HEC Business and Government Environment in Japan Government Agencies • Council for Science & Tech. Policy (CSTP) – – – • Cabinet Office, PM resides over monthly meetings Sets strategic directions for S&T Rates proposals submitted to MEXT, METI and others Ministry of Education, Culture, Sports, Science, and Technology (MEXT) – – • Funds most of S&T R&D activities in Japan Funded the Earth Simulator Ministry of Economy, Trade, & Industry (METI) – – – Administers industrial policy Funds R&D projects with ties to industry Not interested in HEC, except for grids 7/14/2004 WTEC High End Computing in Japan 8 Business and Government • New Independent Administrative Institution (IAI) model Some research institutes had already converted Universities were being converted during our visit Govt. funds institution as whole; control own budget Funding being cut annually as well • Commercial viability of vector supers is problematic. Only NEC still committed to this architectural model • Commodity PC clusters increasingly prevalent All three Japanese vendors have cluster products 7/14/2004 WTEC High End Computing in Japan 9 Business Partnerships • Each of the Japanese vendors is partnered with a US vendor NEC and Cray ? Fujitsu and Sun Microsystems Hitachi and IBM 7/14/2004 WTEC High End Computing in Japan 10 HEC Hardware in Japan Architecture/Systems Continuum Loosely Coupled • Commodity processor with commodity interconnect Clusters • Pentium, Itanium, Opteron, Alpha, PowerPC • GigE, Infiniband, Myrinet, Quadrics, SCI NEC TX7 Fujitsu IA-Cluster • Commodity processor with custom interconnect SGI Altix • Intel Itanium 2 Cray Red Storm • AMD Opteron Fujitsu PrimePower • Sparc based • Custom processor with custom interconnect Tightly Coupled 7/14/2004 Cray X1 NEC SX-7 Hitachi SR11000 WTEC High End Computing in Japan 12 Fujitsu PRIMEPOWER HPC2500 Peak (128 nodes): High Speed Optical Interconnect85 Tflop/s system 128Nodes 4GB/s x4 SMP Node 8‐128CPUs SMP Node 8‐128CPUs SMP Node 8‐128CPUs ・・・・ SMP Node 8‐128CPUs Crossbar Network for Uniform Mem. Access (SMP within node) <System Board> <DTU Board> 8.36 GB/s per system boardD D T T 133 GB/s total U U D D T T U U CPU CPU CPU CPU CPU CPU memory CPU CPU ・・・ CPU CPU CPU CPU memory CPU CPU Channel Channel CPU CPU Adapter Adapter … … System Board x16 to Channels DTU : /Data Transfer Unit Gflop/s 7/14/2004 WTEC High End 5.2 Computing inproc Japan 41.6 Gflop/s system board 1.3 GHz Sparc 666 Gflop/s node based architecture to High Speed Optical Interconnect PCIBOX <System Board> to I/O Device 13 Fujitsu IA-Cluster: System Configuration System Configuration Compute Node Control Node - FUJITSU PRIMERGY (1U) Compute Nodes - PRIMERGY BX300 Max. 20 blades in a 3U chassis - PRIMERGY RXI600 IPF(1.5GHz): 2~4CPU Giga Ethernet Switch InfiniBand or Myrinet Switch Compute Network InfiniBand or Myrinet for Compute Network Control Network 7/14/2004 Compute Network InfiniBand, Myrinet WTEC High End Computing in Japan 14 Latest Installation of FUJITSU HPC Systems User Name Configuration Japan Aerospace Exploration Agency (JAXA) PRIMEPOWER 128CPU x 14(Cabinets) (9.3 Tflop/s) Japan Atomic Energy Research Institute (ITBL Computer System) PRIMEPOWER 128CPU x 4 + 64CPU (3 Tflop/s) Kyoto University PRIMEPOWER 128CPU(1.5 GHz) x 11 + 64CPU (8.8 Tflop/s) Kyoto University (Radio Science Center for Space and Atmosphere ) PRIMEPOWER 128CPU + 32CPU Kyoto University (Grid System) PRIMEPOWER 96CPU Nagoya University (Grid System) PRIMEPOWER 32CPU x 2 National Astronomical Observatory of Japan (SUBARU Telescope System) PRIMEPOWER 128CPU x 2 Japan Nuclear Cycle Development Institute PRIMEPOWER 128CPU x 3 Institute of Physical and Chemical Research (RIKEN) IA-Cluster (Xeon 2048CPU) with InfiniBand & Myrinet (8.7 Tflops) National Institute of Informatics (NAREGI System) IA-Cluster (Xeon 256CPU) with InfiniBand PRIMEPOWER 64CPU Tokyo University IA-Cluster (Xeon 64CPU) with Myrinet (The Institute of Medical Science) PRIMEPOWER 26CPU x 2 7/14/2004 WTEC High End Computing in Japan Osaka University (Institute of Protein Research) IA-Cluster (Xeon 160CPU) with InfiniBand 15 HITACHI’s HPC system 100,000 Peak Performance [GFLOPS] Single CPU peak performance 8GFlops (Fastest in the world) 1,000 1 0.1 Integrated Array Processor system 0.01 M-200H IAP S-820 B A Vector-Scalar Combined type POWER4+ AIX 5L S-3600 40 5 Vector-Scalar combined type 180 60 10 D 140 80 20 M-280H IAP C S-3800 S-810 F1 G1 E1 S-3800 480 First Japanese Vector Supercomputer H1 SR2201 Single CPU peak performance 3GFlops 100 10 SR8000 First commercially available distributed memory parallel processor 10,000 SR11000 First HPC machine combined with vector processing and scalar processing 20 120 15 M-680 IAP Vector VOS3/HAP,HI-OSF/1-MJ Automatic Vectorization Scalar HI-UX/MPP Parallel Auto Parallelization (MPP type) Automatic Pseudo Vectorization '77'78 '79 '80 '81 '82'83 '84 '85 '86 '87 '88 '89 '90 '91 '92'93'94 '95 '96 '97 '98 '99 '00 '01 '02 '03 ‘04‘05 7/14/2004 WTEC High End Computing in Japan 16 SR8000 Pseudo Vector Processing (PVP) Vector Arithmetic Unit Pipelining Pseudo Vector PVP Feature Arithmetic Unit Floating-point Registers (FPRs) Vector Register Preload (H/W Ctl) Load Preload (S/W Ctl) Cache Prefetch (S/W Ctl) Pipelining MS 7/14/2004 Problems of conventional RISC - Reduction of performance for large scale simulations because of cache-overflow - Sustained : Under 10% of peak MS Prefetch - Read data from main memory to cache before calculation - Accelerate sequential data access Preload - Read data from main memory to Floating Registers before calculation - Accelerate stride memory access and indirectly addressed memory access WTEC High End Computing in Japan 17 Hitachi SR11000 109 Gflop/s / node(6.8 Gflop/s / p) IBM uses 32 in their machine 2-6 planes • Based on IBM Power 4+ • SMP with 16 processors/node Node • IBM Federation switch Hitachi: 6 planes for 16 proc/node IBM uses 8 planes for 32 proc/node • Pseudo vector processing features Minimal hardware enhancements • Fast synchronization • No preload like SR 8000 • Hitachi’s Compiler effort is separate from IBM Automatic vectorization, no plans for HPF • 3 customers for the SR 11000, National Institute for Material Science Tsukuba - 64 nodes (7 Tflop/s) Okasaki Institute for Molecular Science - 50 nodes (5.5 Tflops) Institute for Statistic Math Institute - 4 nodes 7/14/2004 WTEC High End Computing in Japan 18 SR11000 Pseudo Vector Processing (PVP) Vector Arithmetic Unit Pseudo Vector PVP Feature Arithmetic Unit Pipelining Vector Floating-point Registers (FPRs) Register Load Cache Preload (H/W Ctl) Pipelining MS 7/14/2004 Problems of conventional RISC - Reduction of performance for large scale simulations because of cache-overflow - Sustained : Under 10% of peak Prefetch (S/W Ctl) (H/W Ctl) Prefetch - Read data from main memory to cache before calculation - Accelerate sequential data access MS WTEC High End Computing in Japan 19 SR11000 Next Model • Continuing IBM partnership • Power5 processor • Greatly enhanced memory bandwidth - Flat Memory Interleaving • Hardware Barrier Synchronisation Register 7/14/2004 WTEC High End Computing in Japan 20 NEC HPC Products High-End Capability Computing Parallel Vector Processors SX-6/7 Series Middle - Small Size Capacity Computing IA-64 SERVER TX7 TX7 SERIES Express5800/ 1160Xa Parallel PC- Clusters IA-32 Workstations 7/14/2004 Express 5800/50 Series WTEC High End Computing in Japan Express 5800 Parallel Linux Cluster 21 TX7 Itanium² Server • up to 32 Itanium² Processors • up to 128 GB of RAM • Linux operating system with NEC enhancements • more than 100GF on Linpack • file server functionality for SX 7/14/2004 • cc-NUMA architecture • employs a chipset and crossbar switch developed in-house by NEC • achieves near uniform high-speed memory access. WTEC High End Computing in Japan 22 SX-Series Evolution NEXT 2001 GENERATION SX 1998 1994 SX-6 Series - SINGLE-CHIP VECTOR PROCESSOR -GREATER SCALABILITY 1989 SX-5 Series -HIGH SUSTAINED PERFORMANCE -Large Capacity SHARED MEMORY 1983 SX-4 Series -CMOS INNOVATIVE TECHNOLOGY -ENTIRELY AIR-COOLING SX-3 Series -SHARED MEMORY・MULTI-FUNCTION PROCESSOR -UNIX OS The Latest Technology Always in SX-Series SX Series -THE FIRST COMPUTER IN THE WORLD SURPASSING 1GFLOPS 7/14/2004 WTEC High End Computing in Japan 23 NEC SX-7/160M5 Total Memory 1280 GB Peak performance 1412 Gflop/s # nodes 5 # PE per 1 node 32 Memory per 1 node 256 GB Peak performance per PE 8.83 Gflop/s # vector pipe per 1PE 4 Data transport rate between nodes • SX-6: 8 proc/node 8 GFlop/s, 16 GB processor to memory 7/14/2004 Rumors of SX-8 8 GB/sec8 CPU/node 26 Gflop/s / proc • SX-7: 32 proc/node 8.825 GFlop/s, 256 GB, processor to memory WTEC High End Computing in Japan 24 Special Purpose: GRAPE-6 • The 6th generation of GRAPE (Gravity Pipe) Project • Gravity (N-Body) calculation for many particles with 31 Gflop/s / chip • 32 chips / board - 0.99 Tflop/s / board • 64 boards of full system is installed in University of Tokyo 63 Tflop/s • On each board, all particles data are set onto SRAM memory, and each target particle data is injected into the pipeline, then acceleration data is calculated No software! • Gordon Bell Prize at SC for a number of years (Prof. Makino, U. Tokyo) 7/14/2004 WTEC High End Computing in Japan 25 Sony PlayStation2 • Emotion Engine: • 6 Gflop/s peak • Superscalar MIPS 300 MHz core + vector coprocessor + graphics/DRAM About $200 70M sold • PS1 100M sold • 8K D cache; 32 MB memory not expandable OS goes here as well • 32 bit fl pt; not IEEE • 2.4GB/s to memory (.38 B/Flop) • Potential 20 fl pt ops/cycle FPU w/FMAC+FDIV VPU1 w/4FMAC+FDIV VPU2 w/4FMAC+FDIV EFU w/FMAC+FDIV High-Performance Chips Embedded Applications • The driving market is gaming (PC and game consoles) Motivation for almost all the technology developments. Demonstrate that arithmetic is quite cheap. • Today there are three big problems with these apparent nonstandard "off-the-shelf" chips. Most of these chips have very limited memory bandwidth and little if any support for inter-node communication. • Integer or only 32 bit floating point No software support to map scientific applications to these processors; minimal general-purpose programming tools. Poor memory capacity for program storage • Not clear that they do much for scientific computing. Developing "custom" software is much more expensive than developing custom hardware. 7/14/2004 WTEC High End Computing in Japan 27 TOP500 Data % of Total Performance for the US and Japan in the Top500 70% 60% Sum of Rmax Over Time 1000000 50% 100000 10000 All USA 30%1000 JAPAN OTHER 100 20% 10 0% 03 06 02 06 01 06 00 06 99 06 98 06 97 06 96 06 95 06 94 06 1 93 06 10% Date 93 06 93 11 94 06 94 11 95 06 95 11 96 06 96 11 97 06 97 11 98 06 98 11 99 06 99 11 00 06 00 11 01 06 01 11 02 06 02 11 03 06 03 11 Sum of Rmax 40% 7/14/2004 WTEC High End Computing in Japan Date US Japan 28 USA Japan Germany UK Canada France China Top 20 Computers Where They are Located Rank J-93 N-93 J-94 N-94 J-95 N-95 J-96 N-96 J-97 N-97 J-98 N-98 J-99 N-99 J-00 N-00 J-01 N-01 J-02 N-02 J-03 N-03 J-04 1 TMC Fujitsu Intel Fujitsu Fujitsu Fujitsu Hitachi Hitachi/T Intel Intel Intel Intel Intel Intel Intel IBM IBM IBM NEC NEC NEC NEC NEC 2 TMC TMC Fujitsu Intel Intel Intel Fujitsu Fujitsu Hitachi/T CraySGI CraySGI CraySGI SGI IBM IBM Intel IBM HP IBM HP HP HP Cal DC 3 TMC TMC TMC TMC Intel Intel Intel Hitachi Fujitsu CraySGI CraySGI CraySGI CraySGI SGI SGI IBM Intel IBM HP HP Linux NQ Self made HP 4 TMC TMC TMC TMC CraySGI CraySGI Intel Intel Hitachi Hitachi/T CraySGI SGI Hitachi CraySGI IBM SGI IBM Intel HP IBM IBM Dell IBM/LLNL 5 NEC TMC Fujitsu Fujitsu Fujitsu Fujitsu Intel Intel CraySGI CraySGI CraySGI CraySGI CraySGI Hitachi Hitachi IBM Hitachi IBM IBM Linux NQ IBM HP Dell 6 NEC NEC Fujitsu Fujitsu TMC IBM CraySGI Intel CraySGI Hitachi Hitachi/T IBM SGI CraySGI Hitachi IBM SGI HP HP HP IBM/Q Linux N IBM 7 TMC NEC Fujitsu Fujitsu Fujitsu IBM Fujitsu CraySGI CraySGI Fujitsu CraySGI CraySGI CraySGI SGI Cray Hitachi IBM Hitachi Intel HP Fujitsu Linux NQ Fujitsu 8 Intel Intel TMC TMC TMC IBM IBM Fujitsu CraySGI Fujitsu CraySGI IBM IBM CraySGI Cray IBM NEC SGI IBM HPTi HP IBM IBM/ LLNL 9 CraySGI Intel TMC TMC CraySGI NEC IBM Fujitsu CraySGI CraySGI CraySGI CraySGI CraySGI CraySGI Hitachi Hitachi IBM IBM IBM IBM HP IBM HP 10 CraySGI TMC Hitachi Hitachi CraySGI TMC NEC Fujitsu CraySGI CraySGI CraySGI CraySGI CraySGI IBM Cray Cray IBM IBM IBM IBM HP IBM/Q Dawning 11 CraySGI TMC Hitachi Hitachi IBM Fujitsu NEC CraySGI IBM CraySGI CraySGI CraySGI IBM CraySGI IBM Cray Cray IBM IBM IBM HPTi Fujitsu Linux N 12 CraySGI TMC NEC CraySGI IBM Fujitsu IBM CraySGI Intel CraySGI NEC CraySGI Hitachi IBM SGI Fujitsu Hitachi NEC IBM IBM IBM HP Linux NQ 13 CraySGI Intel NEC CraySGI IBM TMC TMC CraySGI CraySGI CraySGI NEC CraySGI CraySGI CraySGI Cray Hitachi IBM IBM Hitachi IBM IBM IBM IBM 14 CraySGI CraySGI CraySGI CraySGI Fujitsu Fujitsu Fujitsu IBM CraySGI CraySGI Hitachi Hitachi/T CraySGI CraySGI Cray Cray Hitachi IBM Hitachi IBM IBM lenovo IBM 15 CraySGI CraySGI CraySGI NEC Fujitsu Fujitsu Fujitsu IBM Intel CraySGI Fujitsu CraySGI CraySGI Fujitsu Cray IBM Cray Cray SGI Intel IBM HP IBM 16 CraySGI CraySGI CraySGI NEC Fujitsu CraySGI TMC IBM CraySGI CraySGI CraySGI CraySGI CraySGI IBM IBM IBM Fujitsu IBM IBM IBM IBM IBM IBM/Q 17 CraySGI CraySGI Hitachi NEC Intel CraySGI Fujitsu NEC Fujitsu IBM CraySGI CraySGI CraySGI Hitachi Hitachi IBM Hitachi Hitachi IBM Atipa Intel HPTi IBM 18 CraySGI CraySGI NEC Hitachi CraySGI CraySGI Fujitsu NEC Fujitsu Intel CraySGI NEC Hitachi/T CraySGI Cray Hitachi IBM IBM IBM HP IBM IBM IBM 19 CraySGI CraySGI NEC NEC TMC Fujitsu CraySGI NEC Intel CraySGI Fujitsu NEC CraySGI CraySGI IBM SGI Cray Hitachi NEC IBM Atipa Cray IBM 20 CraySGI CraySGI Intel NEC TMC Fujitsu CraySGI NEC CraySGI CraySGI CraySGI CraySGI IBM CraySGI Cray IBM IBM Cray IBM IBM HP Cray Cray 7/14/2004 WTEC High End Computing in Japan 29 Efficiency is Declining Over time • Analysis of top 100 machines in 1994 and 2004 • Shows the # of machines in the top 100 that achieve a given efficiency on the Linpack benchmark Efficiency of Machines in Top 100 # of machines in top 100 120 • In 1994 40 machines had >90% efficiency 1994 2004 100 80 • In 2004 50 have < 50% efficiency 60 40 20 0 100 7/14/2004 90 80 70 60 50 40 30 20 WTECEfficiency High End(%) Computing in Japan 10 30 ESS Impact on Climate Modeling • NERSC IBM SP3: 1 simulated year per compute day on 112 processors • ORNL/NCAR IBM SP4: ~2 simulated years per compute day on 96 processors • ORNL/NCAR IBM SP4: 3 simulated years per compute day on 192 processors • ESS: 40 simulated years per compute day on unknown number of processors (probably ~128) • Cray X1 rumor: 14 simulated years per compute day on 128 procs. Source: Michael Wehner 7/14/2004 WTEC High End Computing in Japan 31 Technology Transfer from Research •Government projects encouraged new architectures. •New technologies were commercialized. • • • • Numerical Wind Tunnel cp-pacs Earth Simulator Grape, MDM, eHPC, … 7/14/2004 → Fujitsu VPP500 → Hitachi SR2201 → NEC SX-6 → ?(MD-engine) WTEC High End Computing in Japan 32 Hardware Summary • The commercial viability of "traditional" supercomputing architectures with vector processors and high-bandwidth memory subsystems is problematic. NEC only remaining in Japan • Clusters are replacing traditional highbandwidth systems 7/14/2004 WTEC High End Computing in Japan 33 HEC Software in Japan Software Overview • Emphasis on vendor software Fujitsu, Hitachi, NEC Earth Simulator software • Languages and compilers Persistent effort in High Performance Fortran Including HPF/JA extensions • Use of common libraries Little academic work for supercomputers: vendors supply tools Support for clusters 7/14/2004 WTEC High End Computing in Japan 35 Achievements HPF on the Earth Simulator • PFES • Oceanic General Circulation Model based on Princeton Ocean Model • Achieved 9.85TFLOPS with 376 nodes • 41% of the peak performance • Impact3D • Plasma fluid code using Total Variation Diminishing (TVD) scheme • Achieved 14.9 TFLOPS with 512 nodes • 45% of the peak performance 7/14/2004 WTEC High End Computing in Japan 36 HPF/JA Extensions • HPF research in language and compilers • HPF 2.0 extends HPF 1.0 for irregular apps • HPF/JA further extends HPF for performance REFLECT: placement of near-neighbor communication LOCAL: communication not needed for a scope Extended ON HOME: partial computation replication • Compiler doesn’t need full interprocedural communication and availability analyses • HPF/JA was a consortium effort by vendors NEC, Hitachi, Fujitsu 7/14/2004 WTEC High End Computing in Japan 37 Vectorization and Parallelization on the Earth Simulator (NEC) Interconnection Network Inter-node Parallelization HPF ・・・ MPI HPF Intra-node Parallelization Open MP 共有メモリ AP AP AP 共有メモリ Main Memory Automatic parallelization Vectorization AP AP AP AP AP Processor Node 7/14/2004 WTEC High End Computing in Japan 38 Hitachi Automatic Vectorization = COMPAS + PVP Parallelized with parallel libraries (HPF,MPI,PVM,etc.) Internode COMPAS (Automatic parallelization) PVP (Automatic pseudo vectorization) Nod e IP Example of applied image Inter-node parallelization (With parallel libraries) Intra-node elementwise parallel processing (COMPAS) b in IP (With PVP) Vector processing DO i=1,l DO j=1,m DO k=1,n Inner DO loop 7/14/2004 PVP: Pseudo Vector Processing COMPAS: CO-operative Micro-Processors WTEC High End Computingininsingle JapanAddress Space IP : Instruction Processor 39 Conclusions • Longer sustained effort on HPF than in the US Part of the Earth Simulator Vision Successful on two of the large codes, including GB prize Languages extensions were also needed • MPI is dominant model for internode communication Although larger nodes on Vector/Parallel means smaller degree of MPI parallelism Combined with automatic vectorization within nodes • Other familiar tools developed outside Japan: numerical libraries, debuggers, etc. 7/14/2004 WTEC High End Computing in Japan 40 Grid Computing in Japan Kathy Yelick U.C. Berkeley and Lawrence Berkeley National Laboratory Outline • Motivation for Grid Computing in Japan E-Business, E-Government, Science • Summary of grid efforts Labs, Universities, • Grid Research Contributions Hardware Middleware Applications • Funding summary 7/14/2004 WTEC High End Computing in Japan 42 Grid Motivation • e-Japan: create a "knowledge-emergent society," where everyone can utilize IT • In 2001, Japan internet usage was at the lowest level among major industrial nations • Four strategies to address this: Ultra high speed network infrastructure Facilitate electronic commerce Realize electronic government • Key is information sharing across agencies and society Nurturing high quality human resources • Training, support of researchers, etc. 7/14/2004 WTEC High End Computing in Japan 43 Overview of Grid Projects in Japan • Super-SINET (NII) • National Research Grid Initiative (NAREGI) • Campus Grid(Titech) • Grid Technology Research Center (AIST) • Information Technology Based Lab (ITBL) • Applications: VizGrid (JAIST) BioGrid (Osaka-U) Japan Virtual Observatory (JVO) 7/14/2004 WTEC High End Computing in Japan 44 SuperSINET: All Optical Production Research Network • Operational since Jan. 2002 • 10Gbps Photonic • • • 7/14/2004 WTEC High End Computing in Japan Backbone GbEther Bridges for peer-connection 6,000+km dark fiber 100+ e-e lambda and 300+Gb/s 45 NAREGI: National Research Grid Initiative • Funded by MEXT: Ministry of Education, Culture, Sports,Science and Technology • 5 year project (FY2003-FY2007) • 2 B Yen(~17M$) budget in FY2003 • Collaboration of National Labs. Universities and Industry in the R&D activities • Applications in IT and Nano-science • Acquisition of Computer Resources underway 7/14/2004 WTEC High End Computing in Japan 46 NAREGI Goals 1.Develop a Grid Software System: R&D in Grid Middleware and Upper Layer Prototype for future Grid Infrastructure in scientific research in Japan 2.Provide a Testbed 100+Tflop/s expected by 2007 Demonstrate High-end Grid Computing Environment can be applied to Nano-science Simulations over the Super SINET 3.Participate in International Collaboration U.S., Europe, Asian Pacific 4.Contribute to standards activities, e.g., GGF 7/14/2004 WTEC High End Computing in Japan 47 NAREGI Phase 1 Testbed ~3000 CPUs ~17 Tflops TiTech Campus Grid Osaka Univ. BioGrid AIST SuperCluster Kyushu Univ. Super-SINET Small Test App Clusters Tohoku Univ. Small Test App Clusters (10Gbps) AIST Small Test App Clusters KEK Small Test App Clusters Kyoto Univ. Small Test App Clusters 7/14/2004 Comp. Nano-science Center (IMS) ~10 Tflops WTEC High End Computing in Japan ISSP Small Test App Clusters Center for GRID R&D (NII) ~5 Tflops 48 AIST Super Cluster for Grid R&D P32: IBM eServer325 10,200mm Myrinet Opteron 2.0GHz, 6GB 2way x 1074 node Myrinet 2000 8.59TFlops/peak M64: Intel Tiger 4 Madison 1.3GHz, 16GB 4way x 131 node Myrinet 2000 10,800mm 2.72TFlops/peak F32: Linux Networx P32 M64 Xeon 3.06GHz, 2GB 2way x 256+ node GbE total 14.5TFlops/peak, 3188 CPUs 3.13TFlops/peak 7/14/2004 WTEC High End Computing in Japan 53 NAREGI Grid Software Stack WP6: Grid-Enabled Apps WP3: Grid Visualization WP4: Packaging WP2: Grid Programming -Grid RPC -Grid MPI WP3: Grid PSE WP3: Grid Workflow WP1: SuperScheduler WP1: Grid Monitoring & Accounting (Globus,Condor,UNICOREOGSA) WP1: Grid VM WP5: High-Performance & Secure Grid Networking 7/14/2004 WTEC High Computing in Japan Note: WP = End “Work Package” 54 R&D in Grid Software and Networking Area (Work Packages) • WP-1: Lower and Middle-Tier Software for Resource Management: Matsuoka (Titech), Kohno(ECU), Aida (Titech) • WP-2: Grid Programming Middleware: Sekiguchi (AIST), Ishikawa(AIST) • WP-3: User-Level Grid Tools & PSE: Miura (NII), Sato (Tsukuba-u), Kawata(Utsunomiya-u) • WP-4: Packaging and Configuration Management: Miura (NII) • WP-5: Networking, Security & User Management Shimojo (Osaka-u), Oie ( Kyushu Tech.), Imase(Osaka-u) • WP-6: Grid-enabling tools for Nanoscience Apps. Aoyagi (Kyushu-u) 7/14/2004 WTEC High End Computing in Japan 55 WP-1: Lower and Middle-Tier Software for Resource Management • Unicore Condor Globus Interoperability Adoption of ClassAds Framework • Meta-scheduler Scheduling Schema, Workflow Engine, Broker Function • Grid Information Service Attaches to multiple monitoring frameworks User and job auditing and accounting • Self-Configurable Management & Monitoring • GridVM (Lightweight Grid Virtual Machine) Support for co-scheduling, resource Control Node (IP) virtualization Interfacing with OGSA (Open Grid Services Architecture) 7/14/2004 WTEC High End Computing in Japan 56 WP-2:Grid Programming GridRPC/Ninf-G2 • GridRPC: Programming with Remote Procedure Calls (RPC) on the Grid GridRPC API standardization by GGF Ninf-G is a reference implementation of GridRPC Implemented on Globus Toolkit (C and Java APIs) Used by groups outside Japan Numerical Library IDL FILE Client 3. invoke Executable 1. interface request 2. interface reply 7/14/2004 GRAM MDS Client side IDL Compiler 4. connect back generate Remote Executable fork retrieve Interface Information LDIF File Server side WTEC High End Computing in Japan 57 WP-2:Grid Programming GridMPI • GridMPI: Programming with MPI on the Grid Environment to run MPI applications efficiently in the Grid. Flexible and heterogeneous process invocation on each compute node GridADI and Latency-aware communication topology: • Optimizes communication over non-uniform latency • Hides the differences of lower-level communication libraries Extremely efficient implementation based on MPI on Score (Not MPICHI-PM) MPI Core RIM SSH RSH GRAM Grid ADI Vendor MPI IMPI Latency-aware Communication Topology P-to-P Communication TCP/IP 7/14/2004 PMv2 Others WTEC High End Computing in Japan Vendor MPI Other Comm. Library 58 WP-3: User-Level Grid Tools & PSEs • Grid Workflow Workflow Language Definition GUI(Task Flow Representation) • Visualization Tools Real-time volume visualization on the Grid • PSE /Portals Server Simulation or Storage Raw Data 3D Objects 3D Object Generation Images Rendering UI Problem Solving Environment PSE Portal Workflow Super-Scheduler 7/14/2004 Rendering Client Storage Multiphysics/Coupled Simulation Application Pool Collaboration with Nano-science Applications Group 3D Object Generation WTEC High End Computing in Japan PSE Toolkit PSE App-pool Info Service Application Server 59 WP-4: Packaging and Configuration Management • Collaboration with WP1 management • Activities Selection of packagers to use Interface with autonomous configuration management (WP1) Test Procedure and Harness Testing Infrastructure c.f. NSF NMI packaging and testing 7/14/2004 WTEC High End Computing in Japan 60 WP-5: Network Measurement, Management & Control • Traffic measurement on SuperSINET • Optimal QoS Routing based on user policies and network measurements • Robust TCP/IP Control for Grids • Grid CA/User Grid Account Management and Deployment Grid Application Grid Application Super-scheduler Grid Application User Policy Information DB Network Information DB Grid Network Management Server Network Control Entity Network Control Entity Measurement Entity Dynamic bandwidth Control and QoS routing Measurement Entity Multi-Points real-time measurement High-speed managed networks 7/14/2004 WTEC High End Computing in Japan 61 ITBL Grid Applications Plan to Use Mixture of Computational Technologies Environmental Circulation Simulation for Pollutant Materials VPP300 (Vector Parallel Computer) Wind Field Calculation Atmospheric Environment Simulation Two-Dimensional Data at Ground SurfaceStampi 3D Wind Field Data Real Time Viz. Multi‐Vision case1 case2 ......... case4 Stampi Marine Environment Simulation Terrestrial Environment Simulation Several hundreds of the simulations based on the possible release parameters are conducted quickly by using parallel computers. Atmospheric Dispersion Simulations Two-Dimensional Data at Sea Surface COMPAQ α (High-Performance PC) AP3000 (Scalar Parallel Computer) Radioactive Source Estimation System Japan Meteorological Agency Numerical Weather Prediction Data Prediction Data at Monitoring Points Observation Data Statistical Analysis Estimation Result Fluid-Particle Hybrid Simulation for Tokamak Plasmas Large-scale Hartree-Fock Calculation SPring8 Electronic fluid /Electro-Magnetic field Ion Particles Control Diagonalization Orthonormalizarion Pool of task distribution Stampi 7/14/2004 Vector Machine WTEC High End Computing in Japan Scalar Machine Vector Machine Integral handling Partial accumulation Fij<-Fij+Dkl*qijkl 62 Scalar Machine Grid for the Bell Detector SuperSINET backbone of the Belle network e+e- Bo Bo Tohoku U. ~ 1TB/day (planned) 400 GB/day ~45 Mbps The Belle detector NFS 10Gbps Osaka U. USA Korea Taiwan Etc. Nagoya U. 7/14/2004 1TB/day ~100Mbps 170 GB/day U. Tokyo KEK computing center Tokyo Institute of Technology WTEC High End Computing in Japan 63 Grid Applications: Fusion Grid Real Experiment Fusion Grid ITBL VR visualization Using super-computer Connection between experiment and simulation 7/14/2004 Numerical Experiment WTEC High End Computing in Japan 64 Adaptation of Nano-science Applications to Grid Environment • Analysis of Nanoscience Applications Parallel Structure Granularity Resource Requirement Latency Tolerance • Coupled Simulation RISM: Reference Interaction Site Model FMO: Fragment Molecular Orbital Method 7/14/2004 RISM Solvent distribution Mediator FMO Solute structure Mediator In-sphere correlation Cluster (Grid) SMP SC WTEC High End Computing in Japan 65 ITBL Computer resource pool Job Riken Grid RIKEN Output files Super Combined User User User Cluster User User Front end Web portal server Globus L.M. Job P. Job MDfiles Job Input Jobfiles Input Input files Input files P. Job L.M. Job Output files MD Job Output files 7/14/2004 WTEC High End Computing in Japan Output files 66 Grid Summary • More emphasis on Grids than expected More government support More application involvement Higher level tools • Computational, data, business grids included • Research contributions from Japan on: Clusters computing Grid Middleware • Heavily involved in international collaborations 7/14/2004 WTEC High End Computing in Japan 67