CSCS partnership with Allinea Software helps Switzerland`s

advertisement
CSCS cracks extreme software challenges
with Allinea DDT on Europe’s #1 Supercomputer
CASE STUDY
CSCS partnership with Allinea Software helps
Switzerland’s scientists exploit massive supercomputer
CLIENT
Swiss National Supercomputing Centre
(CSCS)
MISSION
To achieve world-leading results for
science and society from Europe’s most
powerful supercomputer
CHALLENGE
Developing and extending software to
exploit leadership system capabilities
RESULTS
Debugging complex software faster is
saving time and enabling larger and more
accurate scientific simulations to be run
SUMMARY QUOTE
“Saving time on resolving software
problems is key. We want to help people
focus on their core activity – whether their
science, or improvements on their existing
codes. Allinea DDT helps us to do that,”
says Jean-­Guillaume Piccinali, Scientific
Computing Support Specialist.
Founded in 1991, CSCS is the pride
of Switzerland’s supercomputing
community. The Centre, based in
Lugano, is widely used by Swiss and
international researchers and industry.
Its flagship supercomputer, Piz Daint, is
a 5272-­node Cray XC30 supercomputer
with Intel Sandybridge processors and
NVIDIA K20X GPUs. Piz Daint has a
theoretical peak performance of 7.8
petaflops – and has been Europe’s
most powerful system for almost 18
months.
This extreme-­scale computing resource
runs software from a wide variety of
disciplines, including climate science,
geoscience, chemistry, physics, biology
and materials research.
Many of the software codes are
customized or written by the scientists
that use Piz Daint to take advantage of
the GPU accelerators or the scale of
the system – and CSCS is determined
to ensure that it offers users the right
environment to accelerate research and
deliver first class science.
CSCS relies on a variety of software
stacks: OpenACC, CUDA and the Cray
Programming Environment, containing
various compilers, tools and libraries.
Regardless of the environment chosen
by the user, the application must remain
consistent in terms of scientific results.
TRAINING COMPUTATIONAL
SCIENTISTS FOR THE
FUTURE
The team at CSCS recognized reliable
and bulletproof tools were needed to
solve potential migration problems –
such as bugs found or created when
porting to GPUs – and to ensure that
the applications are reliable.
To tackle software bugs, they chose
Allinea DDT, the debugger from
Allinea Software with a reputation and
capability that matched their needs.
Extremely scalable, Allinea DDT lets
users debug their applications that use
MPI and OpenMP, and also the two
key programming languages for the
NVIDIA GPUS in PizDaint – CUDA and
OpenACC.
CSCS has been promoting the tool to
users, to ensure that they are used, and
used well. “Tools achieve great things if
we expose their value to the community,
so that they are actively used when
the need arises,” says Jean-Guillaume
Piccinali, Scientific Computing Support
Specialist.
DELIVERING REAL SCIENCE
- ROBUSTLY
Piccinali gives an example where what
could have become a long, painful
process was resolved within a short
period of time by the use of Allinea DDT.
RAMSES is an open source code
used to model astrophysical systems,
featuring self-gravitating, magnetized,
compressible, radiative fluid flows.
It is based on the Adaptive Mesh
Refinement (AMR) technique and
makes intensive use of the GPUs to
accelerate the computation.
“When we ran RAMSES on Piz Daint
the execution would abort over and
over again. We were stumped. First,
we tried to use diagnostic runtime
settings to get more information about
accelerator activity. This simply gave
more perplexing messages. We could
see every accelerator action on every
node, but that large volume of output
was difficult to interpret and did not help
us to find the root cause of the bug.”
“Our user relinked the RAMSES code
with Allinea DDT’s memory debugging
library … and found a problem inside
the solver routine when reaching an
accelerated region of the code – and we
simply needed to replace one line!”
“It turned out that a diagnostic routine
that compares GPU and CPU data
was using an incorrect parameter
for an array boundary in a loop. This
resulted in memory on the GPU being
overwritten, which led to kernel launch
errors for subsequent data transfers
between the CPU and the GPU. As
there is no solution to execute runtime
checks for array bounds on the GPU,
Allinea DDT was the only easy way to
find the problem.”
PARTNERING TO SOLVE
CHALLENGES IN A
CHANGING WORLD FASTER
Working with Allinea Software allows
CSCS to provide a first class system
for users, with a wide range of tested
and validated environments to help
users work productively, while focusing
on their science. Allinea Software also
provides CSCS with early access to
pre-release versions of Allinea DDT,
which CSCS uses to prepare their team
and validate the tools against the latest
system revisions before they reach their
users.
Another case that showed the real
value of the close technical relationship,
involved a user who was having trouble
examining a crucial variable in a code
that was incorrectly set,” he says.
“Allinea DDT confirmed the value was
wrong and allowed us to pin down the
routine where the variable was modified
- but then we could find no further
explanation to why this was the case.”
“We needed to see more information –
so we asked Allinea Software’s support
team. They identified that the compiler
was generating incorrect debug
information – that was preventing us
going further.”
“We could then contact the compiler
team with the crucial information about
the debug data. They released a new
version of their compiler which fixed the
problem and then we could progress
with the code. This was a good example
of the need for our close working
relationship with Allinea Software and
our system vendor,” he says.
Email: [email protected] | Web: www.allinea.com
Download