CSCS cracks extreme software challenges with Allinea DDT on Europe’s #1 Supercomputer CASE STUDY CSCS partnership with Allinea Software helps Switzerland’s scientists exploit massive supercomputer CLIENT Swiss National Supercomputing Centre (CSCS) MISSION To achieve world-leading results for science and society from Europe’s most powerful supercomputer CHALLENGE Developing and extending software to exploit leadership system capabilities RESULTS Debugging complex software faster is saving time and enabling larger and more accurate scientific simulations to be run SUMMARY QUOTE “Saving time on resolving software problems is key. We want to help people focus on their core activity – whether their science, or improvements on their existing codes. Allinea DDT helps us to do that,” says Jean-­Guillaume Piccinali, Scientific Computing Support Specialist. Founded in 1991, CSCS is the pride of Switzerland’s supercomputing community. The Centre, based in Lugano, is widely used by Swiss and international researchers and industry. Its flagship supercomputer, Piz Daint, is a 5272-­node Cray XC30 supercomputer with Intel Sandybridge processors and NVIDIA K20X GPUs. Piz Daint has a theoretical peak performance of 7.8 petaflops – and has been Europe’s most powerful system for almost 18 months. This extreme-­scale computing resource runs software from a wide variety of disciplines, including climate science, geoscience, chemistry, physics, biology and materials research. Many of the software codes are customized or written by the scientists that use Piz Daint to take advantage of the GPU accelerators or the scale of the system – and CSCS is determined to ensure that it offers users the right environment to accelerate research and deliver first class science. CSCS relies on a variety of software stacks: OpenACC, CUDA and the Cray Programming Environment, containing various compilers, tools and libraries. Regardless of the environment chosen by the user, the application must remain consistent in terms of scientific results. TRAINING COMPUTATIONAL SCIENTISTS FOR THE FUTURE The team at CSCS recognized reliable and bulletproof tools were needed to solve potential migration problems – such as bugs found or created when porting to GPUs – and to ensure that the applications are reliable. To tackle software bugs, they chose Allinea DDT, the debugger from Allinea Software with a reputation and capability that matched their needs. Extremely scalable, Allinea DDT lets users debug their applications that use MPI and OpenMP, and also the two key programming languages for the NVIDIA GPUS in PizDaint – CUDA and OpenACC. CSCS has been promoting the tool to users, to ensure that they are used, and used well. “Tools achieve great things if we expose their value to the community, so that they are actively used when the need arises,” says Jean-Guillaume Piccinali, Scientific Computing Support Specialist. DELIVERING REAL SCIENCE - ROBUSTLY Piccinali gives an example where what could have become a long, painful process was resolved within a short period of time by the use of Allinea DDT. RAMSES is an open source code used to model astrophysical systems, featuring self-gravitating, magnetized, compressible, radiative fluid flows. It is based on the Adaptive Mesh Refinement (AMR) technique and makes intensive use of the GPUs to accelerate the computation. “When we ran RAMSES on Piz Daint the execution would abort over and over again. We were stumped. First, we tried to use diagnostic runtime settings to get more information about accelerator activity. This simply gave more perplexing messages. We could see every accelerator action on every node, but that large volume of output was difficult to interpret and did not help us to find the root cause of the bug.” “Our user relinked the RAMSES code with Allinea DDT’s memory debugging library … and found a problem inside the solver routine when reaching an accelerated region of the code – and we simply needed to replace one line!” “It turned out that a diagnostic routine that compares GPU and CPU data was using an incorrect parameter for an array boundary in a loop. This resulted in memory on the GPU being overwritten, which led to kernel launch errors for subsequent data transfers between the CPU and the GPU. As there is no solution to execute runtime checks for array bounds on the GPU, Allinea DDT was the only easy way to find the problem.” PARTNERING TO SOLVE CHALLENGES IN A CHANGING WORLD FASTER Working with Allinea Software allows CSCS to provide a first class system for users, with a wide range of tested and validated environments to help users work productively, while focusing on their science. Allinea Software also provides CSCS with early access to pre-release versions of Allinea DDT, which CSCS uses to prepare their team and validate the tools against the latest system revisions before they reach their users. Another case that showed the real value of the close technical relationship, involved a user who was having trouble examining a crucial variable in a code that was incorrectly set,” he says. “Allinea DDT confirmed the value was wrong and allowed us to pin down the routine where the variable was modified - but then we could find no further explanation to why this was the case.” “We needed to see more information – so we asked Allinea Software’s support team. They identified that the compiler was generating incorrect debug information – that was preventing us going further.” “We could then contact the compiler team with the crucial information about the debug data. They released a new version of their compiler which fixed the problem and then we could progress with the code. This was a good example of the need for our close working relationship with Allinea Software and our system vendor,” he says. Email: sales@allinea.com | Web: www.allinea.com