Perspective on Extreme Scale Computing in China Depei Qian Sino-German Joint Software Institute (JSI) Beihang University Co-design 2013, Guilin, Oct. 29, 2013 Outline Related R&D programs in China HPC system development Application service environment Applications Related R&D programs in China HPC-related R&D Under NSFC NSFC Key initiative “Basic algorithms for high performance scientific computing and computable modeling” 2011-2018 180 million RMB Basic algorithms and high efficient implementation Computable modeling Verification by solving domain problems HPC-related R&D Under 863 program 3 Key projects in the last 12 years High performance computer and core software (2002-2005) High productivity computer and Grid service environment (2006-2010) High productivity computer and application environment (2011-2016) 3 Major projects Multicore/many-core programming support (2012-2015) High performance parallel algorithms and parallel coupler development for earth systems study (2010-2013) HPC software support for earth system modeling (20102013) HPC-related R&D Under 973 program 973 program High performance scientific computing Large scale scientific computing Aggregation and coordination mechanisms in virtual computing environment Highly efficient and trustworthy virtual computing environment There is no national long-term R&D program on extreme scale computing Coordination between different programs needed Shift of 863 program emphasis 1987: Intelligent computers, following the 5th generation computer program in Japan 1990: from intelligent computers to high performance parallel computers 1999: from individual HPC system to the national HPC environment 2006: from high performance computers to high productivity computers History of HPC development under 863 program 1990: parallel computers identified as priority topic of the 863 program 1993: Dawning 1, 640MIPS, SMP 1995: Dawning 1000, 2.5GFlops, MPP Downing company established in 1995 1996: Dawning 1000A, cluster system National Intelligent Computer R&D Center established First product-oriented system of Dawning 1998: Dawning 2000, 100GFlops, cluster History of HPC development under 863 program 2000: Dawning 3000, 400GFlops, cluster, 2002: Lenovo DeepComp 1800, 1TFlops, cluster First system commercialized Lenovo entered the HPC market 2003: Lenovo DeepComp 6800, 5.3TFlops, cluster 2004: Dawning 4000A, 11.2TFlops History of HPC development under 863 program 2008: Lenovo DeepComp 7000 Dawning 5000A Dawning 6000 3PFlops, Heterogeneous system CPU+GPU TH-1A 4.7PFlops, Heterogeneous CPU+GPU 2011: Sunway-Bluelight 230TFlops, cluster 2010: 150TFlops, Heterogeneous cluster IPFlops+100TFlops Based on domestic processor 2013: TH-2 Heterogeneous system with CPU+MIC 863 key projects on HPC and Grid: 2002-2010 “High performance computer and core software” 4-year project, May 2002 to Dec. 2005 100 million Yuan funding from the MOST More than 2Χ associated funding from local government, application organizations, and industry Major outcomes: China National Grid (CNGrid) “High productivity Computer and Grid Service Environment” Period: 2006-2010 (extended to now) 940 million Yuan from the MOST and more than 1B Yuan matching money from other sources Current 863 key project “High productivity computer and application environment” 2011-2015 (2016) 1.3B YUAN investment secured Develop leading level high performance computers Transfer CNGrid into an application service environment Develop parallel applications in selected areas Projects launched The first round of projects launched in 2011 High productivity computer (1) HPC applications (6) 100PF by the end of 2015 Fusion simulation Simulation for aircraft design Drug discovery Digital media Structural mechanics for large machinery Simulation of electro-magnetic environment Parallel programming framework (1) Application service environment will be supported in the second round Emphasis on application service support Technologies for new mode of operation HPC system development Major challenges Power consumption Performance obtained by the applications Programmability Resilience Major obstacles memory walls Power walls I/O walls … Power consumption The limiting factor to implementation of extreme scale computers Impossible to increase performance by expanding system scale only Cooling of the system is difficult and affects reliability of the system Energy cost is a heavy burden and prevent acceptance of extreme scale computers by end users Performance obtained by applications Systems installed at general purpose computing centers Serving a large population of users supporting a wide range of applications LinPack is not everything Need to be efficient for both generalpurpose and special-purpose computing Need to support both computing-intensive and data-intensive applications Programmability Must handle Concurrency/locality Heterogeneity of the system Legacy programs porting Lower the skill requirement for application developers Resilience Very short MTBF for extreme scale systems Long-time continuous operation System must self-heal/recover from hardware faults/failures System must detect and tolerate errors in software Constrained design principle We must set strong constrains to the extreme scale system implementation Power consumption Systems scale <100,000 processors <200 cabinets Cost 50GF/W or less before 2020 5GF/W in 2015 <300 million dollars (or <2 B YUAN) We can only design and implement extreme scale system with those constrains How to address the challenges? Architectural support Technology innovation Hardware and software coordination Architectural support Using the most appropriate architecture to achieve the goal Making trade-offs between performance, power consumption, programmability, resilience, and cost Hybrid architecture (TH-1A & TH-2) HPP architecture (Dawning 6000/Loonson) General purpose + high density computing (GPU or MIC) Enable different processors to co-exist Support global address space Multi-level of parallelism Multi-conformation and Multi-scale adaptive architecture (SW/BL) Cluster implemented with Intel processor for supporting commercial software Homogeneous system implemented with domestic multicore processors for computing-intensive applications Support parallelism at different levels Classification of current major architectures Classifying architectures using “homogeneity/heterogeneity” and “CPU only/CPU+Accelerator” Homo-/Hetero refers to the ISA CPU only CPU+Acc Homogeneous Sequoia K-computer Sunway/BL Stampede TH-2 Heterogeneous Dawning 6000/HPP TH-1A (AMD+Loonson) Dawning 6000/Nebulae, Tsubame 2.0 Comparison of different architectures power performance Programmability resilience /productivity Homo/CPU poor/ good/excel good/good only fair lent vary Heter/CPU poor only vary Homo/CPU fair +ACC good fair/fair good/excel good/poor? vary lent Heter/CPU good good/excel fair/poor? vary +ACC lent TH-1A architecture Hybrid system architecture Computing sub-system Service sub-system Communication networks Storage sub-system Monitoring and diagnosis sub-system Service sub-system Compute sub-system Monitor and diagnosis sub-system CPU + GPU CPU + GPU CPU + GPU CPU + GPU CPU … + GPU Operation node Operation node Communication sub-system Storage sub-system MDS OSS OSS OSS … OSS Dawning/Loonson HPP (Hyper Parallel Processing) architecture DATA Int OS Int RTs OS RTs OS APP CPUs OS CPU APP CPUs OS CPU MEMs MEM MEMs MEM Int OS OS OS CPU CPU MEM MEM MEM I/O I/O I/O ... CPU Hypernode Hyper node composed of AMD and Loonson processors Separation of OS & appl. processors Multiple interconnect H/W global synchronization HPP Controller Hypernode I/O HPP Controller I/O Global Sync Sunway BlueLight Architecture Remote users Main features • SW1600 CPU: 16 cores/975~1100MHz/1 24.8~ 140.8Gflops; • Fat-tree based interconnection, QDR 4×10Gbps high speed serial transmission between nodes, MPI message latency of 2μs; • SWCC/C++/Fortran/ UPC/MPICC/Scientific library; • Storage: 2PB, theoretical I/O bandwidth: 200GB/s, IOR(~60GB/s); Remote users National Grid Local users Internet Firewall Intranet Firewall VPN Cloud services Local users Login nodes VPN TCP/IP network Job manage nodes Security Service System Service Blue Light Compter IO nodes DataBase Service Console Global I/O Network Subnetwork manager Online storage Nearline Storage Data Center Offline Storage Storage manager System manage Technology innovations Innovation at different levels New processor architectures new memory devices 3D stacking New cache architectures High performance interconnect Heter. Many-core, accelerators, re-configurable Address memory wall Device Component system All optical network Silicon photonics High density system design Low power design SW1600 processor features CPU SW1600 Release time Aug,2010 Processor cores 16 Peak performance 140.8GFlops@1.1GHz Clock frequency 0.975~1.1GHz Process generation 65nm Power 35~70W a general-purpose multi-core processor power efficient, achieve 2.0GFlops/W Next generation processor is under development FT-1500 CPU SparcV9,16 cores,4 SIMD 40nm, 1.8GHz Performance: 144GFlops Typical power: ~65W Heterogeneous Compute Node (TH-2) Similar ISA, different ALU 2 Intel Ivy Bridge CPU + 3 Intel Xeon Phi 16 Registered ECC DDR3 DIMMs, 64GB 3 PCI-E 3.0 with 16 lanes PDP Comm. Port Dual Gigabit LAN Peak Perf. : 3.432Tflops Dual Gigabit LAN Comm. Port PDP 16X PCIE MIC GDDR5 Memor y GE 16X PCIE DMI CPU PCH QPI 16X PCIE 16X PCIE CPLD CPU IPMB Interconnection network (TH-2) Compute node Fat-tree topology using 13 576-port top level switches Optical-electronic576-port Switch 0 hybrid transport tech. Proprietary network protocol 576-port Switch 12 Compute node Interconnection network(TH-2) High radix router ASIC: NRC Feature size: 90nm Die size: 17.16mm x 17.16mm Package: FC-PBGA 2577 pins Throughput of single NRC: 2.56Tbps Network interface ASIC: NIC Same Feature size and package Die size: 10.76mm x 10.76mm 675 pins, PCI-E G2 16X High density system design (SW/BL) computing node node complex High density assembly, 2 computing nodes+network interface Supernode Basic element, one processor +memory 256 nodes (processors), tightly coupled interconnect cabinet 1024 computing nodes (4 supernodes) system supernode Node complex Multi/manyComputing node core processor Low power design Low power design at different levels Low power management Low power processors Low power interconnect High efficient cooling High efficient power supply Fine-grain real-time power consumption monitor System status sensing Multi-layer power consumption control Low power programming Default system tools like debugging and tuning? Code power consumption modeling Sampling the code power consumption as code performance Feedback to programming Power supply (SW/BL) AC380V +Vin DCUPS AC1 SG CN T DC300V DCUPS 12V +V +V +V -V -V -V TD K-Lam bda -Vin DCDC SW-3 板级电源 核心器件 "N+1"热备份 10KV 配电 变配电部分 一次电源 机舱二次电源 输入1 REC AC10KV 高压移相变压 双路切换 12V主电源 +Vin AC1 AC T DK-Lambda -Vin AC2 OFF SG CNT AC10KV AC10KV AC240V 10KV :240V DC300V SW DC12V N+1备份 可控整流 众核处理器 SW-5 DC/DC 电源 +V +V +V -V -V -V 0.9V 300 W 12相变换 输入2 E:\SZ7_xxx工程\PROTEL\SZ_VII_DY.ddb - Documents\SZ7\P_Chain_02.Sch 3 4 5 14/E5 1000A 母线 6 7 14/E6 10401 14/E2 14/E2 9402 2 10402 1 1000A 母线 1000A 母线 1000A 母线 SW-3 "4+1" 主电源 双电源转换 UJPD1 DC12V 双电源转换 UJPD2 SW-3 SG CNT DPNC UPS1 500 KVA DC12V +Vin TDK-Lambda -Vin SG CNT +V +V +V -V -V -V UPD1 SW-3 输入 DC300V +Vin TDK-Lambda -Vin SG CNT UPS2 500 KVA SW-3 +V +V +V -V -V -V UPD2 DPNC DC12V YJV4*120+75 YJV4*120+75 BU6 -Vin +V +V +V -V -V -V BU5 TDK-Lambda BU4 +Vin BU3 SG CN T 两路交流输入 9401 AC2 BU2 DC UPS Conversion efficiency 77% Highly reliable Power monitoring associated +V +V +V -V -V -V TD K-Lam bda -Vin BU1 +Vin SW-3 +Vin TDK-Lambda -Vin SG CNT +V +V +V -V -V -V DC12V WPD1 SW-3 +Vin -Vin SG CNT TDK-Lambda +V +V +V -V -V -V DPNC WPD2 WPD3 WPD4 WPD5 WPD6 W01 W21 K101 W01 W21 K101 W02 W22 K102 W02 W22 K102 W03 W23 K103 W03 W23 K103 .... .... .... .... .... .... W18 W38 K109 W18 W38 K109 W19 W39 K110 W19 W39 K110 外围设备 外围设备 外围设备 外围设备 外围设备 外围设备 DC12V SW-3 SW-3 DPNC 幅 FHCA 面 A3 Efficient Cooling (TH-2) Close-coupled chilled water cooling Customized Liquid Cooling Unit High Cooling Capacity: 80kW Use city cooling system to supply cooling water to LCUs Efficient Cooling (SW/BL) Water cooling to the board (node complex) Energy-saving Environment-friendly High room temperature Low noise HW/SW coordination Using combination of hardware and software technologies to address the technical issues Achieving performance while maintaining flexibility Compilation support Parallel programming framework Performance tools HW/SW coordinated reliability measures User level checkpointing Redundancy based reliability measure Software stack of TH-2 Compiler for many-core Features Libc for computing kernel Support storage hierarchy Programming model for many-core acceleration Collaborative cache date prefetch Instruction prefetch optimization Static/dynamic instruction scheduling optimization 常 规 优 化 过 程 间 优 化 C++ 循 环 嵌 套 优 化 全 局 优 化 Fortran 众 核 线 程 调 度 优 化 针对 异构 众核 优化 … … 编 程 模 型 与 优 化 协 同 cache 前端 C SBMD Support C, Fortran and SIMD extension 异构融合的基础编译器 访 存 优 化 SIMD 多 层 次 寄 存 器 分 配 优 化 动 静 结 合 的 调 度 优 化 数 据 访 问 指 令 代 理 优 化 热 点 函 数 重 排 与 垫 塞 轻 量 级 局 存 动 态 分 配 优 化 中间表示转换与代码生成 运算控制核心 机器描述 运算控制核心 汇编代码生成 运算核心 机器描述 运算核心 汇编代码生成 汇编器 链接器 反汇编器 加速线程支撑库 运算控制核心 加速线程库 运算控制核心 基础库 运算核心 加速线程库 中断/异常管理 异步/掩码支持 线程创建/回收 线程调度/控制 线程识别 中断触发 运算核心 基础库 数据传输 任务执行 异构程序加载器 纯运算控制核心模式 异构混合模式 纯运算核心模式 Basic math lib for many-core Basic math lib based on many-core structure Technical features Standard function call interface Customized optimization Support accuracy analysis IEEE 754 Basic function lib SIMD extended function lib Fortran function lib SIMD扩展函数库 基础函数库 标 准 三 角 函 数 双 曲 函 数 指 数 函 数 浮点异常控制 基础算法 对 数 函 数 数 值 运 算 函 数 精度控制 数 值 处 理 函 数 Fortran函数库 判 断 类 函 数 贝 塞 儿 函 数 误 差 函 数 性能优化 SIMD算法 ISO C99 基础数学库系统 数 学 函 数 接 口 规 范 Parallel OS Technical features Unified architecture for heterogeneous many-cores Low overhead virtualization High efficient resource management Parallel application development platform Covering program development, testing, tuning, parallelization and code translation Collaborative tuning framework Tolls for parallelism analysis and parallelization Integrated translation tools for multiple source codes 协同开发调优框架 自动管理 项目管理 文件管理 模板管理 配置管理 编译管理 执行管理 环境管理 开发场景 编辑器 帮 助 系 统 并行调试 多种编程模型调试 并 行 应 用 开 发 静态调试模式 动态调试模式 一体化调优模型 算法语言调优 自动SIMD向量化 并行语言调优 基础语言调优 性能监测 一体化调优 扩展功能 并行识别与自动并行化 二进制翻译 平 台 应用服务支撑 应用服务中间件 编译执行服务 用户 授权 调试服务 并行/基础编译 模型插件 实例管理 作业执行管理 引擎服务 SWGDB 容器 数据 管理 调优服务 策略优化 自动向量化 迭代优化 联合优化 参数化调优 数据采集 微架构级命令环境 Parallel programming framework Hide the complexity of programming millions of cores Integrate high efficient implementation of fast parallel algorithms Provide efficient data structures and solver libraries Support software engineering concept for code extensibility. Applic ations Materials, Climate, nuclear energy… High Performance Computing Applications Infrastructure middl eware Program wall: Think parallel Write sequential 100 times Superco mputer Peta-scale flops 100P flops Infrastructure: Four types computing Structured Mesh H P C Finite Element Unstructured Mesh Combinatory Geometry JASMIN:(J Adaptive Structured Meshes applications INfrastructure) 并行自适应结构网格支撑软件框架 PHG: Parallel Hierarchical Grid infrastructure 并行自适应有限元计算软件平台 JAUMIN: J Adaptive Unstructured Meshes applications INfrastructure 并行自适应非结构网格支撑软件框架 JCOGIN: J mesh-free COmbinatory Geometry INfrastructure 并行三维无网格组合几何计算支撑软件 框架 Reliability design High-quality components, strict screening test Water cooling to prolong the lifetime of components High density assembly, reduce the length of wires, improve data transfer reliability Multiple error correction codes to deal with instantaneous errors Redundant design for memory, computing node, networks, I/O, power supply, and water cooling Hardware monitoring (SW/BL) Basis for reliability, availability, maintainability of the system 4 3 6 5 8 7 16 14 15 12 13 10 11 9 32 30 31 28 29 26 27 24 25 22 23 20 21 18 19 17 48 46 47 44 45 42 43 40 41 38 39 36 37 34 35 33 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 PowerConnect 3048 49 50 51 52 监控交换机 SD PowerEdge 4350 万兆主交换机 维 护 服 务 卡 串口 维 护 服 务 卡 维 护 控 制 器 以太网交换模块 BMC ARM BMC 并口 ARM BMC FPGA 并口 以太网 控制器 以太网 控制器 以太网 ARM 控制器 串口 并口 FPGA C P U C P U C P U …… C P U …… C P U …… ……… 维 护 服 务 卡 维 护 服 务 卡 C P U C P U 串口 计 算 交换 以太网 算 超 交换 计 超 节 算 节 点 超 点 C 节 P C 点 C P U U P U C P U 维 护 控 制 器 以太网交换模块 以太网 计 交换 以太网 串口 FPGA C P U 2 1 RPS PWR LED Mode Diag BMC ARM BMC 并口 BMC FPGAARM 串口 IBA Switch IBA Switch IBA Switch 以太网 控制器 以太网 控制器 以太网 并口 ARM 串口 FPGA 并口 FPGA 控制器 互互 连连 网网 络络 插插 件件 IBA Switch ... Monitor major components Maintenance Diagnosis Dedicated management network 系统控制台 系统控制台 ... 应急 系统 环境 监控 ... IBA Switch IBA Switch 互 连 网 络 插 件 High availability (SW/BL) SW/HW coordinated multi-level faulttolerant architecture Local fault suppression, fault isolation, fault components replacement, fault recovering 应用层 受控容 错手段 用户应用 被动容错 保留 恢复 作业 回卷 作业 降级 开工 容错 主动容错 服务 修复 双机 接管 主动 迁移 RAC 控制层 容错总控 接插件环境 容错接插件 系统 信息库 故障数据 预警信息 容错策略 基础支撑 系统维护 心跳检测 RAS 硬件系统 节点 网络 Delivered system: TH-1A Tianhe: Galaxy in Chinese Hybrid arch. :CPU & GPU Peak performance 4.7PF Linpack 2.57PF Power consumption 4.04MW Items Processors Configuration 14,336 XEON CPUs + 7,168 nVIDIA GPUs + 2,048FT CPUs Memory 262TB in total Interconnect Proprietary high-speed interconnect network Storage Global shared parallel storage system, 2PB Racks 120 Compute racks+14 Storage racks + 6 Communication racks 52 Delivered system: Dawning 6000 Hybrid system Service unit (Nebula) 3PF peak performance 1.27PF Linpack performance 2.6 MW Computing unit Experiment on using Loonson processor Delivered system: Sunway BlueLight Installed in September, 2011 at the National Supercomputing Center in Jinan. Implemented completely with domestic 16-core ShenWei processor SW1600 8704 ShenWei processors in total Peak performance: 1.07PFlops (with 8196 processor) Linpack performance: 796TFlops (with 8196 processor) Power consumption: 1074KWatt. (with Linpack execution) Delivered system: TH-2 TH-2 specifications Hybrid Architecture Xeon CPU & Xeon Phi Items Configuration Processors 32000 Intel Xeon CPUs + 48000 Xeon Phis + 4096 FT CPUs Peak performance is 54.9PFlops, HPL 33.86PFlops Interconnect Proprietary high-speed interconnection network TH Express-2 Memory 1.4PB in total Storage Global shared parallel storage system, 12.4PB Cabinets 125+13+24=162 compute/communication/storage Cabinets Power 17.8 MW (1902MFlops/W) Cooling Closed Air cooling system Application service environment China National Grid (CNGrid) 14 sites SCCAS (Beijing, major site) SSC (Shanghai, major site ) NSC-TJ (Tianjin) NSC-SZ (Shenzhen) NSC-JN (Jinan) Tsinghua University (Beijing) IAPCM (Beijing) USTC (Hefei) XJTU (Xi’an) SIAT (Shenzhen) HKU (Hong Kong) SDU (Jinan) HUST (Wuhan) GSCC (Lanzhou) The CNGrid Operation Center (based on SCCAS) CPU/GPU Storage CNGrid sites THU IAPCM NSCTJ SCCAS NSCJN GSCC SDU SSC XJTU USTC HUST NSCSZ SIAT HKU SCCAS 157TF/300TF 1.4PB SSC 200TF 600TB NSC-TJ 1PF/3.7PF 2PB NSC-SZ 716TF/1.3PF 9.2PB NSC-JN 1.1PF 2PB THU 104TF/64TF 1PB IAPCM 40TF 80TB USTC 10TF 50TB XJTU 5TF 50TB SIAT 30TF/200TF 1PB HKU 23TF/7.7TF 130TB SDU 10TF 50TB HUST 3TF 22TB GSCC 13TF/28TF 40TB CNGrid GOS Architecture Other Domain Specific Applications GSML Workshop. Cmd Line Tools IDE Debugger Compiler GSML Composer HPCG App & Mgmt Portal Gsh & cmd tools GSML Browser Tool/App VegaSSH System Mgmt Portal Core, System and App Level Services GOS Library (Batch, Message, File, etc) GOS System Call (Resource mgmt,Agora mgmt, User mgmt, Grip mgmt, etc) HPCG Backend Axis Handlers for Message Level Security CA Service metainfo mgmt File mgmt BatchJob mgmt Account mgmt MetaSchedule Message Service Dynamic DeployService Grip DataGrid GridWorkflow DB Service Work Flow Engine System Tomcat(5.0.28) + Axis(1.2 rc2) Agora Security Resource Space J2SE(1.4.2_07, 1.5.0_07) Res AC & Sharing Grip Instance Mgmt User Mgmt Agora Mgmt Core Res Mgmt OS (Linux/Unix/Windows) Naming Grip Runtime ServiceController Other RController Tomcat(Apache)+Axis, GT4, gLite, OMII Java J2SE Grid Portal, Gsh+CLI, GSML Workshop and Grid Apps Other 3rd software & tools Hosting Environment PC Server (Grid Server) CNGrid GOS deployment CNGrid GOS deployed on 11 sites and some application Grids Support heterogeneous HPCs: Galaxy, Dawning, DeepComp Support multiple platforms Unix, Linux, Windows Using public network connection, enable only HTTP port Flexible client Web browser Special client GSML client CNGrid: Resources 14 sites >3PF aggregated computing power >15PB storage CNGrid: Service and Users >450 services >2800 users China commercial Aircraft Corp Bao Steel automobile institutes of CAS universities …… CNGrid:applications Supporting >700 projects 973, 863, NSFC, CAS Innovative, and Engineering projects Application Villages Support domain applications Industrial product design optimization New drug discovery Digital media Introducing Cloud Computing concept CNGrid—as IaaS and partially PaaS Application villages—as SaaS and partially PaaS Build up business models for HPC applications Applications CNGrid applications Grid applications Drug Discovery Weather forecasting Scientific data Grid and its application in research Water resource Information system Grid-enabled railway freight Information system Grid for Chinese medicine database applications HPC and Grid for Aerospace Industry (AviGrid) National forestry project planning, monitoring and evaluation HPC applications Computational chemistry Computational Astronomy Parallel program for large fluid machinery design Fusion ignition simulation Parallel algorithms for bio- and pharmacy applications Parallel algorithms for weather forecasting based on GRAPES 10000+ core scale simulation for aircraft design Seismic imaging for oil exploration Parallel algorithm libraries for PetaFlops systems China’s status in the related fields Significant progress in developing HPC systems and HPC service environment Lack of long-term strategic study and plan Still far behind in many aspects Lack of kernel technologies Processors, memory, interconnect, system software, algorithms… Especially weak in applications Need multi-disciplinary research Shortage in cross-disciplinary talents Sustainable development is crucial Lack of regular budget for e-Infrastructure Always competing funding with other disciplines Pursuing international Cooperation We wish to cooperate with international HPC communities Joint work on grand challenge problems Climate change New energy Environment protection Disaster mitigation Jointly address challenges towards Extreme scale systems Low power system design and implementation Performance obtained by applications Heterogeneous system programming Resilience of large systems Thank you!