Networked exascale Supercomputing ASREN and the HPC community can make it happen Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December 1-11th 2014 , Muscat, Oman Please, maximize my effective throughput Connect HPC resources at Fusionopolis with the storage and genomics pipeline in the Biopolis Matrix Building Pg 2 Pg 3 Tests started with Mellanox METRO-X early 2013 and were followed up with trials using the Obsidian Strategics Longbow C400. Today the sites are connected with 2x40gbps connections running native InfiniBand and reaching approx. 98.4% of maximum theoretical possible throughput. Pg 4 The big picture: NSCC Singapore’s National SuperComputer Centre • Joint A*STAR, NUS, NTU, SUTD and NRF project; RFS Q3 2015 • NSCC – – – – Calls for new 1-2+ PetaFLOP Supercomputer Recurrent investment every 3 to 5 years Pooling up and high tier compute resources at A*STAR and IHLs Co-investment from primary stakeholders • Science, Technology and Research Network (STAR-N) – High bandwidth network to connect distributed compute resources – Provides high speed access to users, both public and private, anywhere – Supports transfer of large data sets both locally and internationally Pg 5 The quest to maximize effective throughput TCP/IP’s curse: CPU overhead Source: IBTA, the InfiniBand Trade Association Pg 6 InfiniBand’s magic potion: RDMA Source: IBTA, the InfiniBand Trade Association Pg 7 The undeniable virtues of RDMA 47% system CPU overhead and idle time in a TCP/IP environment versus 12% in an RDMA environment In other words 88% CPU efficiency in the user space with RDMA versus 53% with TCP/IP Source: Mellanox Pg 8 HPC’s road to InfiniBand • 1999: Intel, IBM, Sun, HP, Microsoft, Compaq and Dell agree on the original InfiniBand standard to solve a looming problem of a PCI (Peripheral Component Interconnect) bottleneck • 2003: Virginia Tech builds an InfiniBand cluster ranked number three on the SC Top500 at the time. • IB becomes increasingly popular for cluster interconnects as it beats Ethernet on both price and latency. • November 2014: 225 of the Top 500 use InfiniBand, up 8.7% YoY. • The Ethernet camp tries to counter with RoCE (RDMA over Converged Ethernet) and now RoCEv2 for the data centre space.. Pg 9 HPC’s choice: InfiniBand link layer Pg 10 The Ultimate InfiniBand Jailbreak • HPC’s and Infiniband were suffocating within the Data Center walls. • Range extenders like the Mellanox MetroX gave Infiniband and consequently HPC’s and data centres themselves more breathing room and ways to expand on metro level. • Obsidian Strategics took the final step: It took Data Centre walls away completely. InfiniBand connections can cross continents and circle the globe. • The ultimate step: BGFC makes InfiniBand routeable and opens the possibility to permeate the globe giving rise to an Infininet. Internet gave us classrooms without walls. Infininet will give us supercomputing without walls Pg 11 …proved itself in a spectacular way Pg 12 Pg 13 Galaxy14 Network Topology SC14 100gbps linking A*STAR in Singapore to the A*STAR booth at SC14 in New Orleans via Singaren, the Tata Communications transpacific cables TGN-IA and TGN-P to Seattle, Century Link to New Orleans and Scinet on the SC14 conference grounds. Pg 14 A*STAR’s vision: Infinicortex a Supercomputer of Supercomputers Professor Tan Tin Wee and Dr. Marek Michalewicz proposed to demonstrate something totally new, never done before, Very High speed transcontinental transmission of native Long Distance Infiniband between High Performance Computing (HPC) centres continents apart and have them operate as one, tackling the biggest computational challenges and opening a possible avenue to exascale supercomputing where the most vexing problem is power and heat generation. This is not cloud computing, this is not Grid Pg 15 The four elements that made this possible • Very high speed transmission as made possible by ACA100 – Asia connects America at 100gbps, challenge issued by Yves Poppe, then at Tata Communications, at APAN 37 in Bandung, Indonesia. • InfiniBand over trans-pacific distances – Made possible with Obsidian Strategics InfiniBand range extenders. • Galaxy of Supercomputers – Supercomputer interconnect topology and graph theory work by Y.Deng, M.Michalewicz and L.Orlowski. – InfiniBand subnetting using the BGFC protocol and the new Obsidian Crossbow InfiniBand router. • Application layer – File transfer optimization based on the development of Dsync+ for simple file transfers all the way to complex work flows with ADIOS (Adaptable I/O System) developed by Dr. Scott Klasky and his team at Oak Ridge National Laboratories. Pg 16 InfiniBand range extender and router Longbow Device Crossbow Device Developed by Obsidian Strategics based in Edmonton, Canada Crossbow plus Longbows give rise to Galaxy and open the door to an Infininet Pg 17 Pg 18 Galaxy of Supercomputers • Supercomputers located at different geolocations connected into a Super-Graph or ‘Nodes of Super-Network’ • Supercomputers may have arbitrary interconnect topologies • Galaxy is based on a topology with small diameter and lowest possible link number. • In terms of graph representation it is an embedding of graphs representing Supercomputers’ topologies into a graph representing the Galaxy topology. Pg 19 Pg 20 Making the vision a reality • Testing within Singapore completed using dark fibre between two A*CRC sites and also with the National University of Singapore using Singaren’s new SLIX over 80km. Convincing results led us to deploy two 40gbps InfiniBand connections between our Biopolis and Fusionopolis sites. • InfiniBand over Ethernet testing with Tokyo Institute of Technology’s Tsubame-KFC successfully completed using Singaren, APAN and JGN-X. • InfiniBand over IP testing completed with the NCI (National Computational Infrastructure) at the Australian National Unversity in Canberra using existing Singaren, APAN and AARnet infrastructure. • 10gbps dedicated link between Singapore and the USA for layer 2 ‘native’ InfiniBand testing with ORNL and others starting end October. • Rather spectacular results of the 100gbps trial and demos between Singapore and New Orleans at SC14 Pg 21 Proving the point Pg 22 Extract from the presentation prepared by Jakub Chrzeszczyk and Andrew Howard, NCI, Australia Pg 23 Pg 24 Pg 25 Pg 26 Long Distance InfiniBand: a potential R&E networking game changer So far, the global HPC needs were presented at TERENA, APAN and GLIF and resonate with the visions of the global R&E networking community. Adoption of native Infiniband as a commonly used layer 2 transmission protocol would give NREN’s a rare opportunity to gain back the lead in innovation and clearly differentiate themselves from commercial networks. The HPC community is faced with a continuing exponential growth of data generated and current NREN internetworking capacity is already insufficient considering only the needs of genomics data interchange. To reach exascale computing, a distributed approach is probably required if only to cope with power requirements and disaster recovery Pg 27 The HPC community’s call to ASREN Let us lead the world in building an Infinet! Let us lead the world towards exascale computing HPC’s need NREN’s and NREN’s need HPC’s – The majority of global R&E traffic originates from the HPC community. – Supercomputing is essential to the economic development in all advanced industrial sectors as well as academic research and education. – The HPC community constitutes by far the most demanding constituency globally as they continue to push relentlessly the bandwidth and switching capacity envelopes on all scales. This for the simple reason that the incredible ‘big data’ growth with associated hunger for computing power, storage and associated electrical power and cooling will continue unabated. – Reaching the exaflop scale in supercomputing will very likely require a distributed approach to be sustainable and include the Infinicortex concept. Pg 28 Circle the globe at 100gbps with ACE-100? • Prof. Tan Tin Wee, Chairman of A*STAR Computational Resource Centre, pointed out that with ANA-100 now a reality and ACA-100 coming, the only missing piece to circle the globe would be ACE100: Asia connects Europe. • I had a vision of bits racing around the world, 100,000,000,000 of them every second, 100gbps, as fast as light can travel through fibre, transmitting a continuous stream of copies of Jules Vernes’ ‘Around the world in eighty days’. SC15: the Phileas Fogg challenge Pg 29 We hope to see you in Singapore at Organised by A*STAR Computational Resource Centre (A*CRC), An international conference on supercomputing, exascale and beyond in Singapore and Asia March 17-20, 2015, Singapore http://supercomputingfrontiers2015.com/?page_id=23033 Pg 30 Thank You Creativity requires the courage to let go of certainties. Erich Fromm Pg 31