Red paper System Level Accelerators: A Hybrid Computing Solution

advertisement
Redpaper
Matthew Drahzal
Kirk E. Jordan
Eric Kronstadt
Lance E. Miller
James C. Sexton
Alexander Zekulin
System Level Accelerators: A Hybrid
Computing Solution
This IBM® Redpaper publication presents the Systems Level Accelerator strategy and
taxonomy and provides a roadmap of how to employ Systems Level Accelerators today. The
paper also discusses examples of System Level Acceleration and includes strategies for
forward progress.
© Copyright IBM Corp. 2008. All rights reserved.
ibm.com/redbooks
1
Executive summary
IBM Blue Gene® is the fastest computer architecture in the world, able to simulate nuclear
collisions, the activity of the human mind, and the way a virus replicates. The IBM Cell
Broadband Engine™ architecture provides supercomputer performance in a set-top gaming
machine, as well as providing horsepower for the super computer system under development
for Los Alamos National Labs. Yet, current mainstream system and software development
paradigms have not embraced the power of these solutions for business- and
technical-related computing. These specialized systems and processors, however, point the
way into the future and in a direction that cannot be ignored.
A review of compute technology shows that performance improvements have cycled through
changes that have alternated between general purpose single central processor unit (CPU)
speed increases and specialization. Historically, when performance improvements in High
Performance Computing (HPC) were needed, specialization technologies such as
vectorization and parallelization were embraced and then abandoned quickly as processor
technology evolved. We depended upon Moore’s Law to improve single core performance, yet
we always knew the speed of light would eventually limit that growth.1
Moore’s Law has held, but we have been cheating. Instead of adding more density to a single
core, multiple core solutions on a single die have been created. Thus, the individual
performance of each general purpose core has not improved. For single threaded
applications, performance has hit the wall. The gating factor is just that, gate leakage at the
very fundamentals of each microprocessor. In recent years gate leakage has been an
increasing contributor to the passive power consumption in microprocessors.
This paper proposes another paradigm: System Level Accelerators, the deployment of diverse
compute resources to solve the problem of a single work stream, all with the experience of a
single server to the user.
Supported by middleware, ISV applications, and development tooling, System Level
Acceleration allows you to take advantage of state-of-the-art specialized processors to
accelerate applications without total code rewrite and speedup applications even faster than
execution on a multi-core parallel system.
Introduction
This section discusses the current state of HPC and current industry trends.
Current state of HPC
This section discusses the current state of HPC.
The end of classical CPU scaling
IBM foresaw the end of CPU speed scaling many years ago and began to lay the framework
for more innovative solutions by introducing the first multi-core POWER™ processors in 2001.
Consequently, IBM has the most pervasive use of multi-core solutions in the market today,
with experience in multithreaded and multi-core CPUs, servers, and application tooling.
Although multi-core technology has extended Moore’s law for IBM, the performance increases
to a single application can only, at best, be linear with the number of processors that are
1
2
Moore’s Law is the “empirical observation made in 1965 that the number of transistors on an integrated circuit for
minimum component cost doubles every 24 months.” Source: Wikipedia
System Level Accelerators: A Hybrid Computing Solution
applied to the problem. Yet, as the industry approach 4 Gigahertz CPU speed, the current
speed limit for general purpose processors, multi-core is all that are available, with many-core
processors being the way of the future. As circuit sizing is reduced to the size of individual
atoms, it is going to get rather difficult to get any smaller.
There are two types of power that are used in calculating the power consumption of a
processor:
򐂰 Active power, the power used to do work
򐂰 Passive power, the cost of being there
Think of passive power as the overhead or bureaucracy of a processor. Figure 1 shows
current trends of power consumption verse gate length (in microns). Notice that the passive
power is approaching active power consumption. In recent years gate leakage has been an
increasing contributor to the passive power consumption in microprocessors. So, as the gate
length (size of the etchings of the lithography of a processor) shrinks, the bureaucracy
increases, thus preventing processors from running at much higher speeds. When passive
power equals active power, it is difficult to get any work done on a processor.
1000
100
Active
Power
10
Passive Power
1
0.1
0.01
1994
2005
0.001
1
0.1
0.01
Gate Length (microns)
Figure 1 Two relevant power consumption types in a microprocessor
The energy crisis
In addition to difficulty in obtaining greater speeds, we are now dealing with power
consumption issues. IBM is beginning to see Requests for Proposals with floating point
operations per megawatt as part of the information that is required. Electricity expense is
becoming a large life-cycle issue for high performance computing users. As energy costs
increase, the energy that is consumed by CPUs is becoming more important. The amount of
energy that is required by a CPU increases polynomially (as the cube) with its speed. Larger
clock speeds require more cooling, more airflow, and more electricity to drive the cooling
System Level Accelerators: A Hybrid Computing Solution
3
process. If extra heat in a CPU is bad, the problem is exacerbated by the equally wasteful
energy that is required to move the waste heat. It is no coincidence that the fastest Super
Computer in the world, Blue Gene, uses efficient chips running at a rather low clock speeds.
Blue Gene is, in fact, very green.
Scaling solutions in a cluster configuration makes the energy consumption issue even worse.
When we need to add processors to increase performance, we also add memory, power
supplies, disk drives, fans, network adapters, and more. All of this extra energy-using
equipment is using power even when the CPU is not being used. It is a large additional
per-unit fixed cost for scaling through clustering. Some of the highest energy consuming
compute installations in the world are based upon clustering configurations.
What is System Level Acceleration
Systems Level Acceleration is a computing approach that allows for multiple, disparate
computer systems to interact and complete a complete computational and interactive
workflow. Although this approach can be applied in its simplest form to a cluster farm with
disparate processors, all connected homogeneously, the real power of a Systems Level
Accelerator approach is with disparate systems each with specific functions such as
computing, visualization, data storage, and so forth. The reason for this approach is the vast
improvement in the efficiency of the overall completion of a workflow. There are several
different approaches to Systems Level Accelerator.
Faster engineering of specialized processors
IBM has proven its ability to create new custom processor designs quickly based upon
specific needs for particular applications. Examples of this ability today are the Cell
Broadband Engine (Cell/B.E.™) CPU for the SONY PS3, the processors running the XBOX
360 and the Nintendo Wii, and many other custom engineered solutions for various
customers.
As an example of how quickly IBM can design a specialized processor, IBM signed a contract
with Microsoft® in August 2003 for a three-core gaming processor. IBM had early access test
chips available a year later, the full chip taped out in October 2004, and in January 2005, the
chip tape-out was complete. IBM had designed a specialized gaming CPU successfully in
less time than it takes many companies to do a minor software revision. For additional
information about this design effort, see the article by Dean Takahashi, Learning from failure The inside story on how IBM out-foxed Intel with the Xbox 360, Electronic Business, 01 May
2006, at:
http://www.reed-electronics.com/CA6328378.html
Today, IBM can create a processor for a specific purpose. These processors have
super-linear speedup from general purpose processors, use less electricity, and require less
cooling. This capability is new and IBM is leading the way.
The flood of data streams
The amount of data that is flowing and must be analyzed continually is also growing. IBM has
announced recently efforts in stream computing that are focused on receiving flows of
information and making decisions and actions based upon the content of that incoming data
stream, whether it be financial data, company news, or current events. To learn more about
the effort from IBM in stream computing, see the article by Darryl K. Taft, IBM Previews
Stream Computing System, eWeek.com, 19 June 2007, at:
http://www.eweek.com/c/a/Infrastructure/IBM-Previews-Stream-Computing-System/
Additionally, IBM is on the cusp of a change in technical data sensor collection. The more
data that is collected, the more processing that must occur on each data stream. The Astron
4
System Level Accelerators: A Hybrid Computing Solution
Low Frequency Array (LOFAR) software telescope project is an example of sensor data
arriving and needing to be processed in real time. To learn more about IBM’s involvement with
Astron LOFAR project, go to:
http://domino.watson.ibm.com/comm/pr.nsf/pages/news.20040223_bluegene.html
On the security front, there is an increasing use of video cameras, a technology that needs
gigaflops of processing to digitize, analyze, and store incoming data streams.
The need for speed
The grand irony is that the failure of classical scaling is occurring at a time when a need for
computing performance is growing. Areas such as computational biology, drug discovery,
petroleum exploration, and simulation-based engineering sciences are driving computing
requirements to a higher level than ever. For more information about simulation-based
engineering, see Simulation - Based Engineering Science, from the National Science
Foundation; Document number: sbes0506; May 26, 2006, at:
http://www.nsf.gov/publications/pub_summ.jsp?ods_key=sbes0506
High performance computing Independent Software Vendors (ISVs) are trying to find ways to
scale the performance of their applications cost effectively.2 However, the marketplace cannot
afford to rewrite entire HPC applications, and the need for speed is greater than ever.
Current industry trends
There are multiple changes occurring in the marketplace which make adoption of the System
Level Acceleration (SLA) paradigm important. These changes provide the underlying
infrastructure to allow SLA growth, as well as some changes in development models that
allow us to think more abstractly.
Processor diversity
As we look at innovation in computing, there is large innovation in specialty processors. For
example, there is movement to employ capability as diverse as Video Graphics Processors
and Field Programmable Gate Array (FPGA) technologies as accelerators. Traditionally, for a
repeatable algorithm (for example, Signal Processing), the industry created Application
Specific Integrated Circuits (ASIC) to execute that algorithm. With the cost of FPGA chips
lower than ever, we can code these algorithms and write them to FPGAs much later, giving us
flexibility for later change. Some server companies have even made this reconfigurable
computing as part of their server offerings.
In addition, we see diversity even on a single chip itself. The Cell Broadband Engine
(Cell/B.E.) processor is single-die heterogeneous computing, containing one Power
Processing Element (PPE) and up to eight specialized Synergistic Processing Elements
(SPE) which perform floating point mathematics at lightening speed.
Even Blue Gene itself is an example of diversity in design. With its architecture of
four-processor shared memory design with very high speed interconnects, applications that
map well to that architecture run excellently. However, this architecture cannot be all things to
all people.
This concept of specialization extends outside the boundaries of high performance
computing. Even .Net and Java™ architectures are being optimized by using specialized
processors with products such as Azul Systems Compute Appliances for Network Attached
2
To learn more about the ISV HPC market, read the book by Earl Joseph, Ph. D., Addison Snell, Christopher, G.
Willard, Ph.D., Suzy Tichenor, Dolores Shaffer, Steve Conway; Study of ISVs Serving the High Performance
Computing Market: The Need For Better Application Software, July 2005.
System Level Accelerators: A Hybrid Computing Solution
5
Processing. These processors are non general purpose processors in a custom architecture
with a single purpose—accelerating virtual machines.
Virtualization
Virtualization has become a leading trend in the marketplace today (although it has been part
of the IBM portfolio for decades). Server access is virtualized, with Logical Partitions in IBM
System p™ permitting access to even portions of a processor, or virtual machines running
Java and .Net, or even virtual hardware, with virtualization is becoming commonplace.
Developers have become accustomed to thinking about virtual abstractions of servers, and
every time a line of Java code is written, they have very little care about where it runs. They
only care about what it does.
As we extend this to acceleration, we can begin to implement applications that can benefit
from acceleration, yet we might not know what form the acceleration takes. In many respects,
System Level Acceleration is the ultimate form of virtualization.
Service-oriented architecture
IBM has been making a large investment at enabling service-oriented architecture (SOA). In
this architecture paradigm, the concept of definable, implement-able, and deployable services
becomes the core of the systems development strategy. Business processes are defined as a
series of inter-operating services, and modern business applications are implemented as
such. These composite applications are developed using open standards and middleware
such as WebSphere® and DB2® as the underlying fabric.
What is important about SOA is that the server that implements the service is virtualized. A
user of an SOA service is happily unaware of how that service is implemented or even the
type of server that implements that service (a System z™, a System x™, or perhaps, even a
Blue Gene server). The service consumer simply wants the service to complete as quickly as
possible with the correct answers. Extending this paradigm to HPC is a natural evolution of
technology.
The opportunity today
It has been said that technological revolutions happen as a collection of events that all point in
the same direction and lead to a similar outcome. If true, we could say that the System Level
Acceleration revolution has already begun.
Visualization
In several instances, we are seeing IBM customers and others integrating, or at least
connecting, systems, that each perform a specific task of a certain workflow. Perhaps the
most common instance of SLA is often connected with supercomputing and the subsequent
visualization of the supercomputing results. IBM Deep Computing Visualization is an example
of this integration. The rendering cluster ingests data from some parallel system that does the
compute and the rendering cluster produces the pixels that can be passed on to another
system for actual display.
High performance computing
Even Blue Gene is being used in an occasional instance now as a System Level Acceleration.
The LOFAR project at ASTRON in the Netherlands is using a combination of clusters to
6
System Level Accelerators: A Hybrid Computing Solution
ingest data from the array sensors, Blue Gene for correlation, and other systems to handle
visualization and imaging. The Blue Gene supercomputer is one part of a heterogeneous
solution and serves as a correlation accelerator.
XML Acceleration
Back to commercial network processing, IBM is now employing accelerators in the
commercial world. The DataPower® family of XML Accelerators has been added recently to
the IBM product portfolio. The DataPower accelerator appliances are purpose-built,
easy-to-deploy network devices that simplify, help secure, and accelerate XML and Web
services deployments, while extending SOA infrastructures (see Figure 2).
The DataPower solutions allow customers to off-load XML and Security processing from their
existing server infrastructure, thus freeing the servers from the CPU and memory that is
needed to handle these tasks. In many cases this XML processing workload consumes a high
percentage of a server’s capacity, and using a purpose-built accelerator to offload this work is
a cost-effective method of optimizing server resources.
XA35 in XML Proxy Mode
Content & Application
Generation
Web Server
XML
HTML
Internet
XA35 XML Accelerator
XML Fragments
XML
WML
XSL
XML
XML Database
Application Server
or Web Server
Wireless Net
Figure 2 DataPower Accelerator
Opportunity tomorrow
IBM is in a unique position to move this vision forward because the market forces are
demanding a solution (tighter IT budgets, human resource, power and cooling constraints).
What started as autonomic computing, has evolved to a truly virtual environment, where
programs are designed to an open architecture, and an enterprise optimizer can decide
where best to run portions of a code base, based on constraints imposed by the system
administrators.
Because of the diverse systems and software heritage from IBM, it is in a unique position to
see this as it is evolving and is in a position to drive this to a standard approach throughout
the industry.
System Level Accelerators: A Hybrid Computing Solution
7
The System Level Acceleration solution
This section discusses the System Level Acceleration solution.
The goal
From the proceeding analysis, we foresee a future in which, for technology, cost, and
capability reasons, large scale computer installations in business, research, and government
will consist of multiple heterogeneous systems (that is Hybrid Computing). Each different
system will manage a piece of the work load which the installation is delivering. The
competitive driver to the adoption and deployment of such systems will continue to be the
ease and ability to develop the applications. The end goal, we propose, is to provide a
heterogeneous system that contains a number of different compute platforms. This
heterogeneous system can provide computational service for many work loads each
consisting of a number of separate components.
The System Level Acceleration Approach that we propose here seeks to achieve the following
goals:
򐂰 For the user
The heterogeneous system needs to look and feel like a single integrated computer that
can be driven from the user’s work station or mobile computer through a GUI,
spreadsheet, or other simple interface. To the user, the effort to manage the work load on
the system can be simple and intuitive, but it needs to also provide the user with the full
cost benefits possible from deploying the correct systems architecture to solve each piece
of the work load.
򐂰 For the applications developer
The System Level Acceleration Approach must provide portable interfaces that abstract
hardware interfaces from the application approach that allow the developer to be
somewhat unaware of the nature of the System Level Acceleration. This approach implies
a late binding of the accelerator to the application and a level of abstraction to enable
System Level Acceleration to be approachable for a wide-range of developers.
򐂰 For the systems manager
Again, this heterogeneous system must look and feel like a single integrated computer.
Tooling that allows single system images and management of the entire solution is an
important part of any System Level Acceleration solution.
Approaches
At first glance, Systems Level Acceleration is not a new idea. Heterogeneous systems
installations have been around since the dawn of computers. What has changed significantly
is the understanding that work loads are requiring significantly more cycles than can be
delivered with a single processor core. Given the end of classical scaling, this problem
requires multiprocessor systems for solution. You then face an optimization issue to
determine the correct multiprocessor system for a given application or for a range of work
streams. Increasingly, the competitive systems advantage goes to those systems that
integrate well with each other in heterogeneous environments, and the competitive
applications advantage goes to those applications that can be deployed easily on
heterogeneous systems.
8
System Level Accelerators: A Hybrid Computing Solution
From a user perspective, we identify two classes of application or work flow where System
Level Acceleration is advantageous:
򐂰 Loosely coupled acceleration (LCA)
򐂰 Tightly coupled acceleration (TCA)
Loosely coupled acceleration
Loosely coupled acceleration (LCA) is a technique that is somewhat familiar to the traditional
HPC users. In LCA, each step in a work stream is a separate application. In general, the
output result from one application is stored on shared disk, and then, at some future time,
those results are used as an input to another application running on, perhaps, a different
server.
The execution of these applications is coordinated by scripts and utilities which schedule and
control the job submissions. It is not much of a leap to make an association between LCA and
traditional batch processing (see Figure 3).
Because LCA is a familiar instance of acceleration, many solutions have already been
designed to use this technique. In addition, scheduling tools and applications are readily
available to support execution using this model.
Scheduler
Accelerator
System
Host
System
Filesystem
Filesystem
Schedulers and Parallel File Systems
Scheduler
Figure 3 Loosely Coupled Acceleration block diagram
IBM has developed demonstrations of this type of coupling for supercomputing visualization
using Blue Gene as the accelerator. The accelerator applications are computational
chemistry modeling codes, Amber and NAMD. Both of these codes compute models of
molecules and produce text and visual output to disk. The output can then be read from disk
and visualized with industry-established open source visualizer Visual Molecular Dynamics
(VMD). The demonstration of loosely coupling these two systems together was achieved
using a bash shell wrapper that employed secure shell (ssh) forwarding to create a seamless
interface to the user. The user could with one command have the Blue Gene application
remotely executed and its output visualized with customized parameters on the users mobile
computer. This system was then enhanced by using Deep Computing Visualization (DCV).
System Level Accelerators: A Hybrid Computing Solution
9
This approach was tested on a video wall consisting of four off the shelf high resolution
monitors each driven by a dedicated Intellistation. The demonstration was configured then to
send the output using Scalable Visual Network (SVN) to this scaled wall remotely. See
Figure 4 for a workflow of this demonstration.
Client
BG Front End
Scripts to
authenticate
user (ssh
forwarding).
initiate run.
Present data
using VMD.
Calls Blue Gene
to execute
application.
Gathers data
on disk.
Blue Gene
Disk
Figure 4 IBM Blue Gene as member of a LCA system
Tightly coupled acceleration
The tightly coupled acceleration (TCA) model is just that—more tightly coupled than LCA. In
TCA architecture, we think less about separate applications each running start to finish and
more about a single application which is distributed across a host, which houses the master,
and one or more accelerators, which house the worker portion of the application.
In this model, an application begins, and then like a good manager, the application uses
accelerator-workers, if available, to take sections of the computation and execute them. The
manager can then wait for the return of the work package, or it can go on to other tasks while
the accelerator-worker completes its section of the computation.
After the accelerator-worker has completed its task for the manager (and returned the
results), the worker is now available to be tasked again (by the same manager or perhaps
even a different instance of the manager on another server).
We see this tightly coupled acceleration encoded in applications in three forms:
򐂰 Compiler-based and Library Acceleration
In this model, the compiler (or an explicit library call) is used to create subroutine calls in
an application which invokes the accelerator-workers. This is very similar to vectorizing
compilers made popular in the 1980’s, which examined code and made all the requisite
calls to send work to the vector processor. This technique has worked very well where the
latencies for handing a task to a accelerator-worker are very small.
򐂰 Message Passing through Parallel Virtual Machine (PVM) and heterogeneous MPI
These message-passing techniques require more architectural control over an application,
and have traditionally been accomplished over homogeneous networks (although PVM
had some success in heterogeneous use in the 1990’s).
This kind of TCA is the most common today, with hundreds of applications having been
developed with MPI.
10
System Level Accelerators: A Hybrid Computing Solution
򐂰 Remote Procedure Call (RPC)
In this model, data is transferred and work is instantiated on an accelerator-server through
the use of calls to a middleware API. The middleware handles the actual data
transmission, process and thread creation, and overall control.
An example of this type of coupling was completed by a collaboration of academic
researchers and IBM scientists using innovative software by a research team at Rutgers
University and the University of Texas at Austin (see Figure 5).
Generate Guesses
Start Parallel
IPARS
Instances
Send Guesses
Instance
connects
to
DISCOVER
DISCOVER
notifies clients,
clients interact
with IPARS
SPSA
VFSA
Send
Guesses
Optimization
Service
Client
DISCOVER
IPARS
Factory
Exhaustive
Search
Client
If guess is in DB,
send response to
clients and get a
new guess from
optimizer
MySQL
Database
If guess is not in
DB, instantiate
IPARS with guess
as a parameter
Figure 5 IPARS/Discover Tightly Coupled Acceleration model using Blue Gene as an SLA
The Rutgers team developed an application framework called DISCOVER to monitor and
steer other applications during runtime. A computational reservoir simulator Integrated
Parallel Accurate Reservoir Simulator (IPARS), from the Center for Subsurface Modeling at
the University of Texas at Austin, has long been a standard test example for high performance
computing performance. IBM developed a System Level Acceleration proof of concept using
IPARS as the accelerator code running on Blue Gene. IPARS was updated to tightly couple
with this DISCOVER server. Here the codebase of IPARS was updated to connect directly to
the Discover software during runtime which allowed the user to initiate the execution without
knowledge of where the execution was taking place. The results were fast and seamless to
the user. In this scenario, Blue Gene runs all of the IPARS instances, and the optimization
service that explores the parameter space of the reservoir model.
Comparison with Grid Computing
System Level Acceleration is not Grid Computing, although Grid Computing and System
Level Acceleration share some characteristics. Grid Computing is a way to manage and use
multiple systems, but in this paradigm, each system keeps its separate identity. Grid
Computing is a system and hardware focused approach to management of multiple disparate
resources. Traditionally, a major issue for Grid Computing is the network interconnect. Often
the network connection bottleneck is the slowest link connecting systems, which might be 100
megabits per second or less, with high latencies The network bottleneck dictates that parallel
tasks minimize communications between them and operate extremely independently.
System Level Acceleration is a work stream approach. The idea is that a work stream is
broken up into component pieces. The component pieces use the best possible system for
System Level Accelerators: A Hybrid Computing Solution
11
the particular component and communicates necessary data to other components. For
example, a computational fluid dynamics work stream might consist of a grid generation
component, a partitioning component, a parallel solver component, and a visualization
component. The grid generation and partitioning component might be best suited to a
symmetric multiprocessing (SMP) with large memory. After the partitioning is complete, the
partition pieces might best be solved using parallel solver. The results of a parallel solution,
because of the need for significant memory, may require parallel rendering and finally
displayed at the researcher’s desk.
Through System Level Acceleration, the entire work stream would be developed with the
algorithms that are best suited to the systems performing the specific component part, but
might also be aware of the interaction or data transfer between component parts. Some of the
System Level Acceleration algorithmic development design overlaps and hides data transfer
to further reduce the work stream elapse time. If done in an appropriate abstraction, when
new technology arrives, only that part affected in the work stream need be revised to take
advantage of the newer technology.
Conclusion
At a hardware level, Systems Level Acceleration integrates all the diverse systems
architectures currently available. It also provides a paradigm that allows development and
deployment of new architectures as they are acquired. It can also simplify the deployment of
new architectures in the future because it allows development to focus on the core attributes
of the new architecture and provides a ready made environment into which that architecture
can be dropped when complete.
Systems Level Acceleration simplifies customers’ uptake and integration effort for new
architectures and provides a new framework for users to exploit new application areas more
easily and robustly. In addition to decreasing time to solution, this framework flexibly allows
fewer code changes in the future and allows more systems to play key roles in application
development and usage. One key example of this flexibility is to extend the example of TCA
that we presented in this paper, where IPARS was the accelerator running on Blue Gene
using the DISCOVER application as a tightly couple runtime front end to the application. This
example can be extended to include a visualization component that can be tightly or loosely
coupled. Visualization of output is an essential component of any system, and the System
Level Acceleration paradigm can be employed to use the right hardware (dedicated SMPs
and such) to do the job. These results can seamlessly be joined and broadcast in numerous
ways through the strategic use of existing IBM applications such as DCV.
IBM has always been the world leader in mainstream and high performance computing. While
there are limits to general purpose single core performance gains, the hardware innovation
toward processor specialization and integration to expand our computing capabilities has
already been led by innovative work from IBM. IBM will continue this role by developing
software and middleware and by supporting architectures to allow applications and
developers to take full advantage of these innovations, while making high performance
computing even easier for developers and companies to use.
12
System Level Accelerators: A Hybrid Computing Solution
The team that wrote this paper
This paper was produced by a team of specialists working with the International Technical
Support Organization (ITSO), Rochester Center.
Matthew Drahzal is a Product Line Strategist for the IBM Deep Computing CTO organization.
He has 20 years of experience at developing, designing, managing, and mentoring real-time,
commercial, and HPC solutions. He has a BSCS and MBA from Syracuse University and has
spoken worldwide on modern software engineering practices.
Kirk E. Jordan is the Emerging Solutions Executive in IBM Deep Computing. At IBM, he
oversees development of applications for advanced computing architectures, investigates and
develops concepts for new areas of growth especially in the life sciences involving HPC, and
provides leadership in high-end computing and simulation in such areas as systems biology
and high-end visualization. He has a Ph.D. in Applied Math, University of Delaware and has
more than 25 years experience in high performance and parallel computing. He held
computational science positions at Exxon R&E, Argonne National Lab, Thinking Machines,
and Kendall Square Research before joining IBM in 1994. At IBM, he has held several
positions promoting HPC and high performance visualization, including managing IBM
University Relations Shared University Research (SUR) Program and leading IBM Healthcare
and Life Sciences Strategic Relationships and Institutes of Innovation Programs.
Eric Kronstadt is Director of Exploratory Server Systems and the Director of the Deep
Computing Institute, with responsibility for advanced operating systems research, HPC
architectures including BlueGene, and emerging high performance applications, including
computational biology. Prior to this role, Eric was Director of VLSI Systems, where his
responsibilities continue to include development of the PowerPC® architecture, research in
microprocessor implementation and micro-architecture, as well as CAD development. He has
been with IBM over 20 years, starting as a software developer, moving into management, and
being involved in the design and specification of a number of high performance experimental
RISC microprocessors, as well as the development of a standard cell design system. Eric
graduated from Brown University and received his Ph.D. in mathematics from Harvard
University.
Lance E. Miller is currently a doctoral candidate at the University of Connecticut for two
degrees, one is in computer science and the other in mathematics. Before coming to
Connecticut, he performed research as a student in the department of computer science at
New Mexico State University and at Physical Science Laboratories working with Army
Research Laboratory. He is a two time recipient of the competitive IBM Ph.D. fellowship. In
support of these fellowships he spent time with IBM working on HPC demonstrations for IBM
Deep Computing.
James C. Sexton is a Research Staff Member at IBM T. J. Watson Research Center where
he works on the IBM Blue Gene project. He received his Ph.D. in Theoretical Physics from
Columbia University in New York. He has held research positions at Fermilab in Illinois, at the
Institute for Advanced Study in Princeton, and at the Hitachi Central Research Laboratory in
Tokyo. Prior to joining IBM in 2005, he was a professor in the School of Mathematics in Trinity
College Dublin and Director of the Trinity College Center for High Performance Computing.
His interests are in theoretical and computational physics and applications and systems for
HPC.
Alexander Zekulin is a Business Development Executive with IBM System Blue Gene
Solutions group. His focus is to enable customers and partners to develop their solutions on
the Blue Gene platforms. Alex received his Ph.D. from the University of Kentucky in
Geophysics. Alexander has applied his knowledge in IBM working with petroleum and
environmental companies in high performance computing, advanced data management, and
System Level Accelerators: A Hybrid Computing Solution
13
non-linear optimization. In his previous role at IBM, he was in the healthcare and life sciences
group focused on solving IBM healthcare customers IT and business problems. Before that,
Alex was responsible for the IBM services relationships with numerous analytical solution
providers, including SAS Institute, Intel®, Business Objects, MicroStrategy, and ESRI. These
partnerships resulted in numerous joint solution offerings for IBM clients across many
industries.
Thanks to the following for contributing to this project:
LindaMay Patterson, ITSO, IBM Rochester, U.S.
14
System Level Accelerators: A Hybrid Computing Solution
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs.
© Copyright International Business Machines Corporation 2008. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by
GSA ADP Schedule Contract with IBM Corp.
15
This document REDP-4409-00 was created or updated on April 24, 2008.
®
Send us your comments in one of the following ways:
򐂰 Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
򐂰 Send your comments in an e-mail to:
redbooks@us.ibm.com
򐂰 Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400 U.S.A.
Redpaper ™
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. These and other IBM trademarked terms are
marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
Redbooks (logo)
Blue Gene®
DataPower®
DB2®
®
IBM®
PowerPC®
POWER™
System p™
System x™
System z™
WebSphere®
The following terms are trademarks of other companies:
Cell Broadband Engine and Cell/B.E. are trademarks of Sony Computer Entertainment, Inc., in the United
States, other countries, or both and is used under license therefrom.
Java, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other
countries, or both.
Microsoft, Xbox 360, Xbox, and the Windows logo are trademarks of Microsoft Corporation in the United
States, other countries, or both.
Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
16
System Level Accelerators: A Hybrid Computing Solution
Download