TTG Apptimizer, a CPU+GPU autotuning toolkit

advertisement
TTG Apptimizer, a CPU+GPU autotuning toolkit
A product of ttgLabs
Customer Problem
Today, hybrid systems where GPUs are intensively used for parallel computations become more and more popular.
However, to use the real computational power of GPUs, i.e. to get a performance of hundreds of GFLOPS on a single
GPU and a 30- to 100-fold performance gain compared to CPU, one has to efficiently optimize the application and to tailor
it to the architecture of GPU it is running on and to the structure of the processed data. Unfortunately, the architecture of
graphics accelerators from different vendors and even of devices of different generations from the same vendor may differ
dramatically thus requiring the usage of specific yet incompatible optimization methods.
Technically, the development and further optimization of applications for hybrid platforms entails with several difficulties.
One of them consists in accurate choosing of processor type for each part of the algorithm which requires a priori detailed
knowledge of the platform architecture. Another problem arises as one tries to fit the so called ‘magic constants’ such as
block size or threads topology to graphics accelerators’ architecture. While these constants could have a noticeable effect
on application performance there is no ‘rule of thumb’ for choosing their particular values.
Traditional Approach
Within current programming paradigm and development tools, taking into account the particular architectural features of
computational cores is a tough task. The development of applications that use efficiently all the potential of hybrid
platforms remains unacceptably time consuming. Developers have to study an absolutely new programming discipline
thus significantly increasing time to market. While there are several code analyzers on the market that dramatically
facilitate the reveal of the application bottlenecks, in practice the developer usually has to rely on the ‘test-and-error’
approach or to guess the proper values of the aforementioned ‘magic constants’ to come up with an efficiently optimized
code. But even in this case, to remain an application performance at comparable level after changing the hardware or
data structure, in most cases this work has to be done from the beginning.
That’s why it is much easier to write a separate code for each type of processing units than to build a highly optimized
‘one-fits-all’ software. As a result, the developers have to choose between creating a single universal version of an
application with performance being two or three times lower than its possible value or wasting additional resources to
develop and support several versions of an application for various GPUs.
Our Solution - TTG Apptimizer
Contrary to traditional solutions, we offer an absolutely new, dynamic approach to the software optimization problem. Its
key idea is the software autotuning, or dynamic optimization, which means that an application dynamically tailors itself to
the particular hardware platform and data structure directly in the runtime.
The described approach has been implemented in TTG Apptimizer toolkit that contains a library of C++ templates and
some mechanisms for applications autotuning. Its key components are 'smart' optimization algorithms that take into
account various behavioral models of hybrid software and optimize several dynamic parameters transparently to the
application. TTG Apptimizer enables to use all available processing units of hybrid system simultaneously and provides
load balancing between them. This toolkit efficiently solves the most tedious problems of ‘hybrid coding’ the developers
usually met with.
TTG Apptimizer will direct all the computer processor power to computational tasks by efficiently distributing them
between CPUs and GPUs and by providing load balancing between these two types of processing units. This software
accomplishes several dynamic optimization procedures thus allowing one to develop new applications for and to port
existing software to hybrid platforms sometimes without significant recoding. TTG Apptimizer can be considered as an
extension of widely used parallel programming tools, therefore the cost of its integration into software development
process reduces significantly. Basically this software runs on top of existing industrial solutions thus facilitating their usage
and significantly reducing the demands to customer developers’ skills.
How It Works
The developer should make minimal modifications with the source code just embedding TTG Apptimizer primitives into
his/her computing kernels. And that’s it. Even for a very complicated code, it usually takes no more than a couple of days.
During runtime, the optimizer module will gather information about available GPUs and processing data. After that, each
kernel will be automatically tuned for a current usage scenario and the computations will be efficiently distributed between
all GPUs, thus providing performance close to the maximum one for this particular system and data structure.
Competitive Advantages




Autotuning (application will optimize itself in runtime)
Universal solution (no ‘platform-lock’, supports various GPUs, OSs, APIs and compilers)
Shorter time-to-market for new customer applications
Not so demanding to customer developer’s skills
Potential Customers
TTG Apptimizer can be used in a broad range of HPC areas for solving computational problems that could be efficiently
parallelized on hybrid platforms. The areas of its application include various disciplines of physics, chemistry and
computational biology, drug design, geological prospecting and meteorology, ecology and forecast of natural disasters,
automobile and aircraft design, semantic analysis and business intelligence. Potential customers are enterprises that
actively use HPC applications including universities and other research organizations, design departments in different
industries, oil and gas enterprises, pharmacological companies, data centers of meteorological agencies and of
organizations that are involved in seismological data analysis and simulation of global processes, and any other
companies that work with computationally intensive applications.
Available Editions and Prices
Currently, TTG Apptimizer toolkit is available in three editions, namely Lite (Trial), Workstation and Mini-Cluster. The Lite
edition can be downloaded from ttgLabs.com for free. The Workstation edition supports from one to ten GPUs with
prices starting from 500 USD for usage on 1 or 2 GPUs. The Mini-Cluster edition (under development) is addressed to
systems with at least three GPUs, prices started from 2290 USD. A detailed TTG Apptimizer price list is provided upon
request.
Support
Basic customer support should be provided by the reseller. A ‘second-line’ support will be provided by vendor in working
hours by e-mail, Skype or phone. Special support plans can be also discussed.
Contact
Pavel Ivanov, PhD
Co-founder and Deputy CEO, Business Development
p_ivanov@ttgLabs.com
+7 903 121 1420
ttgLabs.com
Download