IBM Deep Computing Systems - Document Server

advertisement
IBM Deep Computing
FORWORD BY MARK HARRIS
(IBM COUNTRY GENERAL MANAGER SOUTH AFRICA)
After ten years of research and development, IBM delivered to the US government, a
supercomputer capable of 100 Teraflops. This project, in which hundreds of IBMers have
been involved, has had a tremendous effect upon the computer industry and in many ways
our society as a whole. It has helped usher in new technologies that both spurred new
markets or forever changed old ones. One of the key advances was the development of
processor interconnect technology. As processor speed increases alone could not meet the
computing challenge, engineers developed this technology to allow thousands of processors
to work on the same problem at once. Related software was developed to allow those
thousands of processors to write data to a single file.
These innovations have allowed fundamental changes in science and business. The
automotive industry now uses sophisticated simulations to build and design safer cars –
virtually. Oil companies use modelling of the earth's geologic layers to pinpoint more likely
sources of oil to reduce unneeded drilling. Weather forecasting has been improved – believe
it or not. Boeing has even designed an entire virtual aircraft on a computer and brought it to
market without ever building a prototype. Great strides have also been made in genomic
research, drug discovery, and cancer research. The Square Kilometre Array (SKA) project
holds immense opportunities for astronomers the world over, yet it would not have been
possible without the availability of high performance and grid computing.
Of course, with advances like these come new challenges, but at the same they also bring
new opportunities from a research and computational application perspective. South Africa
has an opportunity to capitalise on this situation within the context the Department of Science
and Technology’s ICT Roadmap and in particular it’s Technology Roadmap for High
Performance Computing (HPC). In compiling this white paper, IBM South Africa, attempts to
illustrate the potential journeys that may be undertaken by means of the Department’s
roadmap. It is hoped that the Roadmap will draw together business, government, academic
and research experts to help define and shape innovative ways in which high performance
computers and advanced algorithms may provide valuable information and answers for the
numerous intractable problems confronting the South African research community. I certainly
hope that this white paper will facilitate such an endeavour.
____________________________
Mark Harris
Copyright 2005@
IBM / CHPC / Confidential
Page
i
Table of Contents
1
Deep Computing Overview ............................................................................. 1
IBM Deep Computing Systems ..........................................................................................1
Paradigm shift in computing systems ..............................................................................2
Holistic design and system scale optimisation................................................................3
Impact on Deep Computing ................................................................................................4
Blue Gene – a new approach to supercomputing ............................................................4
Novel architectures .............................................................................................................5
Long-Term, High Productivity Computing Systems research ........................................6
2
Deep Computing Products .............................................................................. 8
Scale-up and scale-out .......................................................................................................8
Capability or Capacity .........................................................................................................9
Server options ...................................................................................................................10
Parallel scaling efficiency .................................................................................................10
IBM Power Architecture servers ...................................................................................... 11
IBM Intel and Opteron based servers ..............................................................................12
IBM Deep Computing clusters .........................................................................................12
IBM and Linux ....................................................................................................................14
Blue Gene: A supercomputer architecture for high performance computing
applications ........................................................................................................................15
US ASC Purple procurement ............................................................................................16
Integration of heterogeneous clusters ............................................................................17
IBM's Deep Computing Capacity on Demand .................................................................18
Storage ...............................................................................................................................18
IBM Deep Computing Visualization .................................................................................21
3
GRID ............................................................................................................... 24
GRID for academia and scientific research and development .....................................24
4
IBM in the Deep Computing Marketplace ..................................................... 28
HPCx – the UK National Academic Supercomputer ......................................................30
5
IBM Research Laboratories .......................................................................... 32
Work with clients ...............................................................................................................32
On Demand Innovation Services .....................................................................................32
First-of-a-Kind (FOAK) Projects .......................................................................................32
Industry Solutions Laboratories ......................................................................................32
Work with universities ......................................................................................................32
IBM Deep Computing Institute .........................................................................................33
Systems Journal and the Journal of Research and Development ...............................33
6
HPC Network Facilitation – fostering relationships in the HPC community34
Executive Briefings ...........................................................................................................34
Conferences and seminars...............................................................................................34
IBM System Scientific Computing User Group, SCICOMP ...........................................34
SP-XXL ................................................................................................................................34
UK HPC Users Group ........................................................................................................35
7
Collaboration Projects................................................................................... 36
University of Swansea, UK - Institute of Life Sciences .................................................36
Copyright 2005@
IBM / CHPC / Confidential
Page
ii
University of Edinburgh , UK - Blue Gene ......................................................................36
IBM Academic Initiative - Scholars Program ..................................................................36
IBM's Shared University Research Program ..................................................................37
8
9
IBM Linux Centres of Competency ............................................................... 39
Technology considerations for a new HPC centre ...................................... 41
Grid enablement of central and remote systems ............................................................42
Distributed European Infrastructure for Supercomputing Applications ...........................43
ScotGrid ...........................................................................................................................44
10
Skills development, training, and building human capacity .................. 46
Programme Management..................................................................................................46
Support Services ...............................................................................................................46
With the CHPC of South Africa in mind... ........................................................................47
11
Assisting with the Promotion and Exploitation of the CHPC ................. 48
Trademarks and Acknowledgements .................................................................. 49
Addendum ............................................................................................................ 50
Copyright 2005@
IBM / CHPC / Confidential
Page
iii
1 Deep Computing Overview
Deep Computing (DC) or High Performance Computing (HPC) is a term describing the use of
the most powerful computers to tackle the most challenging scientific and business problems.
It is most frequently applied today to systems with large numbers of processors clustered
together to work in parallel on a single problem.
Deep computing delivers powerful solutions to customers' most challenging and complex
problems, enabling businesses, academics and researchers to get results faster and realize a
competitive advantage. It provides extreme computational power which enables increasingly
complex scientific and commercial applications to be run where trillions of calculations can be
performed every second, and models of extreme complexity can be analysed to provide
insights never before possible. It is a world of large numbers and the terms Tera (10**12 or 1
million million), Peta (10*15 or 1 thousand million million) and Exa (1 million million million)
are frequently used in terms such as '3 TFLOPS' which means 3 million million Floating Point
Operations (ie calculations) per Second or 2 PetaBytes which means 2 thousand million
million Bytes of storage.
While some of the most ambitious HPC work takes place in government laboratories and
academia, much is performed in newer commercial marketplaces such as drug discovery,
product design, simulation and animation, financial and weather modelling, and this sector is
growing rapidly.
IBM is the worldwide leader in Deep Computing and is able to provide the technology, the
research capabilities, the support and the industry expertise for Deep Computing through a
comprehensive range of products and services designed to help businesses and
organizations reap the full benefits of deep computing. IBM's technical know-how and
extensive industry knowledge includes technical experts capable of providing industry-specific
high performance computing solutions. IBM has expertise in critical areas such as scaling
and application modelling and demonstrable leadership in the latest industry trends, including
clustering, chip technology and Linux. IBM's world class research capabilities provide
strategic business and technical insight. Indeed, IBM's research activities lead to the award of
3,248 US patents in 2004, the twelfth year in succession that IBM was awarded more patents
than any other company.
In the past, HPC was dominated by vector architecture machines, and indeed IBM provided
vector processors on its mainframe computer line, but the advent of powerful scalar
processors and the ability to interconnect them in parallel to execute on a single job has seen
vector systems eclipsed by Massively Parallel Machines, MPPs or Clusters. There are still
some applications for which vector machines provide an excellent, though sometimes
expensive, solution, particularly in engineering, but a glance at the Top500 Supercomputers
List shows the overwhelming domination of clusters of scalar architecture machines. Clusters
are the workhorse of general purpose, multi-disciplinary, scientific and research HPC.
IBM’s Deep Computing strategy can be articulated simply – IBM is dedicated to solving larger
problems more quickly at lower cost. To achieve this we will aggressively evolve the
POWER-based Deep Computing products; continue to develop advanced systems based on
loosely coupled clusters; deliver supercomputing capability with new access models and
financial flexibility; undertake research to overcome obstacles to parallelism and bring
revolutionary approaches to supercomputing.
IBM Deep Computing Systems
IBM provides cluster solutions based on IBM POWER architecture, on Intel architecture, and
on AMD Opteron architecture, together with support for a broad range of applications across
these multiple platforms. We offer full flexibility, with support for heterogeneous platforms,
architectures and operating systems. Our hardware systems are supplied with sophisticated
software designed for user productivity in demanding HPC environments. IBM strongly
supports Open Standards and our entire server product line is enabled to run Linux. IBM
cluster solutions provide cost effective solutions for both capability computing, running the
most demanding simulations and models, and capacity computing where throughput is of
crucial importance.
In what can be termed the “high range” segment, the most powerful high performance Power
clusters, equipped with high bandwidth, low latency switch interconnections provide versatile
systems able to run the widest range of HPC applications effectively and efficiently in
production environments from weather forecasting to scientific research laboratories and
academia.
At a slightly reduced performance, the “mid range” segment utilises mid-range Power servers
in clusters which can be configured to run either AIX or Linux, using either a high performance
interconnect or a lower performance industry standard interconnect.
IBM Power based supercomputer products use mainstream technologies and programming
techniques, leading to cost-effective systems designs that can sustain high performance on a
wide variety of workloads. This approach benefits from the sustained, incremental industry
improvements in hardware and software technologies that drive both performance and
reliability. IBM research continues to explore extensions to the POWER architecture, and one
such extension is examining their capability to include some applications traditionally handled
with vector processing. All Power systems share full binary compatibility across the range.
In the “density segment”, where applications and communications are less demanding, IBM
provides clusters comprising blades, with Intel or AMD Opteron processors. Whether
customers choose a single cluster, or a combination of clusters, IBM provides interoperability
between systems, with common compilers and libraries, shared file systems, shared
schedulers and grid compatibility. These clusters, based on high volume, low cost processors
and industry standard interconnects are aimed at applications which perform especially well
on these architectures and thus enjoy an excellent price performance. IBM blade server
clusters provide a particularly space and power efficient way to construct such systems.
The results speak for themselves as IBM leads the world in supercomputing, supplying 58 of
the world's 100 most powerful supercomputers according to the November 2004 Top500
Supercomputers List. IBM machines in the list include the most powerful, the IBM Blue Gene
at Lawrence Livermore National Laboratory (since doubled in size, and soon to be doubled
yet again); the largest supercomputer in Europe, the Mare Nostrum Linux Cluster at the
Barcelona University Supercomputer Centre and HPCx, the UK National Academic
Supercomputer Facility. IBM supplied 161 of the 294 Linux Cluster on the list. .
Deep Computing stands at the forefront of computing technology, and indeed IBM's Deep
Computing business model is predicated on developing systems to meet the most aggressive
requirements of HPC customers certain in the knowledge that commercial customers will
need these capabilities in the near future.
Paradigm shift in computing systems
The largest challenge facing the entire computing industry today is the paradigm shift about to
take place, caused by chip designers having reached the end of classical scaling. Forty years
ago, Gordon Moore made his famous observation, dubbed Moore's Law, that the number of
transistors on a chip would double every 18-24 months. Since that time, Moore's Law has
held true and circuit densities have increased so that it is not uncommon for a single IBM chip
the size of a thumbnail to have 200 million transistors. Scaling technology far more complex
that a simple application of Moore's Law, but we can see how performance improvements
have arisen as a result:
Copyright 2005@
IBM / CHPC / Confidential
Page
2
–
the relentless increase in the number of circuits has enabled designers to implement more
functions in hardware, and less in software, dramatically speeding systems. A good
example of this is to compare a current IBM compute server, which can calculate four
floating point operations in a single clock cycle with older machines, where a single
calculation would require many instructions in microcode, and would take many clock
cycles to perform.
–
The reduction in the size of the electronic components on the chip has allowed clock
frequencies to rise. One only has to look at 1970s computers which ran at 1 MHz and
compare them with today's servers running at 2,000 MHz, some 2000x faster.
Classical scaling is the coordinated reduction, year on year, of a fixed set of device
dimensions governing the performance of silicon technology. The next generation of silicon
fabrication can implement the identical circuit components smaller – in 2001 Power4 used
180 nm technology (chip details are 180 thousands of a millionth of a metre); in 2005 Power5
uses 130 nm technology, and in 2007, Power6 will use 65 nm technology. A side effect of
increased circuit density, increased numbers of circuits and increased frequency, is that power
densities in chips increase – processor chips typically run at power densities up to 10
Watts/sq cm, about double that of the sole plate of a domestic iron.
Consider the gate oxide, which provides the insulation between the input signal and the
output in a CMOS transistor. As implemented in the smallest dimensions today, this gate
oxide is only about 6 atoms thick. Fabrication technologies are not perfect and if we assume
a defect just one atom thick on each side of the layer, our insulation is 1/3 thinner than
required. The problem facing all chip designers is that single atom defects can cause local
leakage currents to be ten to one hundred times higher than average. Oxides scaled below
about 10 Angstroms (1 one thousand millionth of a meter) are leaky and likely to be
unreliable.
Chip designers are grappling with these problems where the characteristics of the chip are
based on probability, not certainty and while leakage current has been cited, there are now
many such "non-statistical behaviours" appearing in technology.
Classical scaling is now failing because the power generated by the leakage currents is
increasing at a much faster rate than that generated by the “useful” currents and will soon
exceed it.
Holistic design and system scale optimisation
HPC, being at the forefront of computing technology has seen the effects of the paradigm shift
first. The key question is “what will drive performance improvements in the future if classical
scaling no longer does?”
IBM's answer to this challenge is that holistic design and system scale optimisation will be the
major drivers of performance improvements in the future. Technological advances at the chip
level will still occur, and IBM will continue to research novel transistor architectures and
improvements in material properties (IBM has developed strained silicon for 90nm technology
which has conductor mobility 350x higher than only eight years ago, leading to better current
flows and reduced power dissipation).
By holistic design, IBM means innovation from atoms to software, requiring the simultaneous
optimization of materials, devices, circuits, cores, chips, system architectures, system assets
and system software to provide the most effective means of optimising computer
performance. It will not be possible to assemble systems from disparate components as is
often done today – each and every part of the system will need to be integrated in a coherent
whole.
Innovation will overtake scaling as the driver of semiconductor technology performance gains.
Processor design and metrics have already changed irrevocably as designers grapple with
Copyright 2005@
IBM / CHPC / Confidential
Page
3
the increased power dissipations, the power cliff, which some designs have already fallen
victim to. System level solutions, optimized via holistic design will ultimately dominate
progress in information technology, and IBM is perhaps the only vendor with the in-house
skills and capabilities to drive this revolution.
IBM’s Power Architecture is evolving along these lines, with Scalable Multi-Core chips having
200 million circuits some of which control power and heat dissipation; and technological
advances such as asset virtualisation, fine grained clock gating, dynamically optimized multithreading capability and open (accessible) architecture for system optimization/compatibility
to enhance performance.
Impact on Deep Computing
Many Deep Computing customers’ needs for the immediate future will be met by conventional
clusters which will continue to be developed to produce systems of 100s of TFLOPS. IBM is
for example developing Power6 based processors, and subsequent systems, which will
provide clusters of these sizes with acceptable power consumption and floor area.
Customers with extreme requirements are unlikely to be able to continue as today since their
computing demands will outstrip their environments and it is likely that their choice of future
HPC systems will be dictated as much by floor area and power consumption as by the
compute power they need. If one examines the US Advanced Simulation and Computing
(ASC, formally ASCI) Program, we see that single machines will have increased from 1
TFLOP in 1997 to 360 TFLOPS in 2005, an increase of 360x in only eight years. Clearly, the
next eight years will not see conventional systems increasing in power at this rate.
While IBM will continue to develop and supply traditional clusters of 100s of TFLOPs, it is
unlikely that these customers could afford the space and power requirements of conventional
systems 100x (10 PFLOPS) or 1000x (100 PFLOPS) as large.
Blue Gene – a new approach to supercomputing
As a result of these constraints, IBM began a project five years ago to develop a novel
architecture supercomputer with up to one million processors, but greatly reduced power
dissipation and floor area. The Blue Gene supercomputer project is dedicated to building a
new family of supercomputers optimized for bandwidth, scalability and the ability to handle
large amounts of data while consuming a fraction of the power and floor space required by
today’s fastest systems and at a much reduced cost.
The first of these systems is the 64 rack, 360 TFLOP Blue Gene/L to be delivered to
Lawrence Livermore National Laboratory as part of the ASC Purple procurement. At only one
quarter of its final contracted size, the first delivery of Blue Gene/L leads the Top500
Supercomputers List, and offers over 100 times the floor space density, 25 times the
performance per kilowatt of power, and nearly 40 times more memory per square metre than
the previous Top500 leader, the Earth Simulator. BlueGene/L dramatically improves
scalability and cost performance for many compute intensive applications, such as biology
and life sciences, earth sciences, materials science, physics, gravity, and plasma physics.
All HPC customers, from large to small, will benefit from the Blue Gene project for they will be
able to afford much larger systems than previously, and be able to accommodate them in
existing facilities without re-building. Numerous smaller Blue Gene/L systems are now
installed in research and academic laboratories worldwide.
Copyright 2005@
IBM / CHPC / Confidential
Page
4
Novel architectures
IBM will continue to investigating other novel architectures for powerful computing systems.
IBM's collaborative work with Sony and Toshiba to develop the Cell Processor for gaming and
High Definition TV has lead to a processor with the power of past industrial supercomputers
which may prove to be an excellent building block for future HPC systems.
Cell processor
The first-generation Cell processor is a multi core chip with a 64-bit Power processor and
eight "synergistic" processors, capable of massive floating point processing and optimized for
compute-intensive workloads and broadband rich media applications. A high-speed memory
controller and high-bandwidth bus interface are integrated on-chip. The Cell's breakthrough
multi-core architecture and ultra high-speed communications capabilities will deliver vastly
improved, real-time response. The Cell processor supports multiple operating systems
simultaneously. Applications will range from a next generation of game systems with
dramatically enhanced realism, to systems that form the hub for digital media and streaming
content in the home, to systems used to develop and distribute digital content, to systems to
accelerate visualization and to supercomputing applications.
The first generation of Cell processors will be implemented with about 230 million transistors
on a chip about 220 mm2. The Cell processor will run at about 4 GHz and will provide a
performance of 0.25 TFLOPS (single precision) or 0.026 TFLOPS (double precision) per chip.
Copyright 2005@
IBM / CHPC / Confidential
Page
5
64b Power
Processor
Mem.
Contr.
Synergistic
Processor
..
.
Synergistic
Processor
Flexible
IO
Long-Term, High Productivity Computing Systems
research
IBM has been awarded funding from the Defense Advanced Research Projects Agency
(DARPA) for the second phase of DARPA's High Productivity Computing Systems (HPCS)
initiative. IBM's proposal, named PERCS (Productive, Easy-to-use, Reliable Computing
System), is for ground-breaking research over the next three years in areas that include
revolutionary chip technology, new computer architectures, operating systems, compiler and
programming environments.
The research program will allow IBM and its partners, a consortium of 12 leading universities
and the Los Alamos National Laboratory, to pursue a vision of a highly adaptable systems that
configure their hardware and software components to match application demands.
Adaptability enhances the technical efficiency of the system, its ease of use, and its
commercial viability by accommodating a large set of commercial and high performance
computing workloads. Ultimately, IBM's goal is to produce systems that automatically analyze
the workload and dynamically respond to the changes in application demands by configuring
its components to match application needs.
Copyright 2005@
IBM / CHPC / Confidential
Page
6
PERCS is based on an integrated software-hardware co-design that will enable multi-Petaflop
sustained performance by 2010. It will leverage IBM's Power architecture and will enable
customers to preserve their existing solution and application investments. PERCS also aims
at reducing the time-to-solution, starting from the inception to actual result and to this end,
PERCS will include innovative middleware, compiler and programming environments that will
be supported by hardware features to automate many phases of the program development
process.
The IBM project will be managed in IBM's Research Laboratory in Austin and will include
members from IBM Research, Systems Group, Software Group and the Microelectronics
Division. A fundamental goal of the research is commercial viability and developments arising
from the project will be incorporated in IBM Deep Computing systems by the end of the
decade.
References
IBM Deep Computing:
http://www.ibm.com/servers/deepcomputing/
IBM Power Architecture:
http://www.ibm.com/technology/power/
IBM Research:
http://www.research.ibm.com/
Blue Gene:
http://www.research.ibm.com/bluegene/
Cell Processor:
http://www.research.ibm.com/cell/
PERCS: http://domino.research.ibm.com/comm/pr.nsf/pages/news.20030710_darpa.html
Accelerated Strategic Computing Program:http://www.llnl.gov/asci/overview/asci_mission.html
DARPA:
http://www.darpa.mil/
Top500 Supercomputers List:
http://www.top500.org/
Copyright 2005@
IBM / CHPC / Confidential
Page
7
2 Deep Computing Products
High Performance Computing falls into a number of overlapping areas depending on the type
of applications to be run and the architecture best matched to that computing.
Scale-up and scale-out
The largest HPC machines needed to run capability jobs are scale-up and scale-out
machines. They operate at the tens or hundreds of TFLOPS range, with thousands
or tens of thousands of processors. Servers are often (but not always) wide SMPs
with many processors giving to rise to the term 'scale-up', while many such servers
are needed to achieve the performance required, giving rise to the term 'scale-out'.
The wider the machine scales, the more important the performance of the switching
interconnect becomes as a single capability job is capable of utilising all or most of
the servers in the system. Jobs can take advantage of using a mixture of OpenMP
(in the servers) and message passing (between the servers) or message passing
across the entire machine. These systems provide the most general purpose HPC
systems available as they can run many different types of applications effectively and
efficiently. Typical applications include weather forecasting and scientific research,
while engineering applications, which cannot usually scale widely, can make use of
the wide SMPs. A typical example of such a general purpose capability machine is
HPCx, the UK National Academic Supercomputer, which provides about 11 TFLOPS
(Peak) with 50 x 32 way servers (1620 CPU in total) interconnected by the High
Performance Switch.
Copyright 2005@
IBM / CHPC / Confidential
Page
8
Blue Gene offers a similar approach to providing systems in the hundreds and thousands of
TFLOPS range by interconnecting up to one million individual processors. The novel nature
of Blue Gene lies in the techniques used to minimise the resulting power dissipations and
floor area, and the topology and performance of the interconnection fabric needed to maintain
a fully balanced system.
At the other end of the scale Departmental machines are much smaller, and typically
comprise either a single SMP or small clusters of either single CPU or small SMP servers.
Departmental clusters most often run Linux and may use Gigabit Ethernet or Myrinet as
interconnects.
In between, Enterprise or Divisional machines range from larger Departmental to smaller
capability systems, depending on the applications to be run. Jobs do not need to scale as
widely as capability systems, so lower performance industry standard interconnects can
sometimes be used, but some applications will not run as effectively on these systems. A
number of these systems run Linux although is currently unusual to find sites doing so where
a full 24 x 7 service availability is essential (weather forecasters) or where many different
users are running many different types of applications and the systems needs strong system
management to maintain availability (e.g. science research or national academic systems).
Capability or Capacity
HPC systems can also be characterised as being either Capability or Capacity machines.
Capability systems, by definition, are able to run a single job using all, or most of, the system.
As the job is parallelised across many processors, intercommunication between the
processors is high, and efficient running can only be achieved if the message passing (MPI)
efficiency is high. Capacity machines, on the other hand, are used to run many smaller jobs,
none of which extend to run across the whole machine, and can therefore be implemented
with lower performance interconnects while maintaining high performance.
Copyright 2005@
IBM / CHPC / Confidential
Page
9
Server options
IBM's highest performance, most general purpose Deep Computing products for the capability
sector utilise IBM's standard range of Power servers (i.e. Unix servers running IBM's version
of Unix, AIX or Linux) assembled into clusters connected with IBM's High Performance Switch
(HPS) interconnect. Servers can be configured with up to 64 processors in a single SMP
image, offering great flexibility for applications to exploit either OpenMP or message passing
constructs, or a mix of both. A suite of special purpose HPC software provides parallel
computing enabling capabilities like message passing, parallel file systems and job
scheduling.
IBM's medium performance range of Deep Computing products utilise IBM's mid-range Power
servers or IBM's standard range of Intel and AMD Opteron products, running Linux, and
assembled into clusters connected by industry standard interconnects such as Myrinet or
Gigabit Ethernet.
IBM's density segment Deep Computing products utilise IBM's Blade Centre populated with 2
way Power or 4 way Intel processors.
Parallel scaling efficiency
Similarly, systems can be characterised by the parallel scaling efficiency. Blue Gene appears
on this chart as a highly parallelised system which has an extremely efficient interconnect as
the system is well balanced between processor compute power and interconnect scaling.
Copyright 2005@
IBM / CHPC / Confidential
Page
10
IBM Power Architecture servers
IBM's pSeries eServers utilise industry leading Power architecture processors. IBM first
introduced Power architecture in 1990 and it has evolved through Power2, Power3 and
Power4 to today's Power5.
Copyright 2005@
IBM / CHPC / Confidential
Page
11
pSeries servers deliver reliable, cost-effective solutions for commercial and technical
computing applications in the entry, midrange and high-end segments and are used in IBM
Deep Computing systems.
IBM changed the UNIX computing landscape with the introduction of the current Power5
pSeries p5 servers, an advanced line of UNIX and Linux servers using Power5™
microprocessors to achieve unprecedented computing performance and reduced costs for a
wide range of business and scientific applications. The eServer p5 systems are the result of a
large-scale, three-year intensive research and development effort, and they extend beyond
traditional UNIX servers by implementing mainframe-inspired features to provide higher
utilization, massive performance, greater flexibility, and lower IT management costs.
IBM p5 servers give clients choices for implementing different solutions, ranging from 2-way
to 64-way servers, all leveraging the industry standard Power Architecture™ and all designed
to deliver the most potent performance and scalability ever available in the entry, midrange
and large scale UNIX platforms. All pSeries servers are binary-compatible, from single
workstations to the largest super computers.
IBM Intel and Opteron based servers
IBM xSeries servers utilise Intel and Opteron processors, and incorporate mainframe-derived
technologies and intelligent management tools for reliability and ease of management.
IBM Deep Computing clusters
Cluster 1600 – IBM pSeries
Deep Computing requires systems which greatly exceed the computing power available from
a single processor or server, and while in earlier days, it was possible to provide enough
power in a single server, this is no longer the case. Deep Computing systems are now
invariably constructed as clusters of interconnected servers, where a cluster can be defined
Copyright 2005@
IBM / CHPC / Confidential
Page
12
as a collection of interconnected whole computers, connected in parallel and used as a
single, unified computing resource.
IBM first introduced clustered Unix systems in the 1990's, running AIX (IBM's Unix) and they
quickly established themselves as the cluster of choice for High Performance parallel
computing. Today's Cluster 1600 are configured with high performing pSeries p5 servers, and
powerful switching interconnects, to provide systems capable of efficiently and effectively
running the most demanding applications.
The IBM eServer p575 and p595 cluster nodes have been designed for "extreme"
performance computing applications. Specially designed to satisfy high-performance, high
bandwidth application requirements, the p575 includes eight, and the p595 sixty four, of the
most powerful 1.9 GHz IBM POWER5 microprocessors for the ultimate in high-bandwidth
computing. Multiple cluster nodes can be interconnected by a low latency, high bandwidth
switching fabric, the High Performance Switch, to build the largest supercomputers capable of
true deep computing.
HPCx Cluster 1600 at Daresbury Laboratory, UK
For less demanding applications, communication between nodes can be achieved by using
industry-standard interconnects such as 10/100Mbps or Gigabit Ethernet while users running
Linux, whose applications need higher bandwidth and lower latency than that which can be
provided by Gigabit Ethernet, can opt for the Myricom Myrinet-2000 switch.
The Cluster 1600 provides a highly scalable platform for large-scale computational modelling.
Cluster 1300 – IBM Intel and AMD Opteron
While the most demanding applications need the capability of Power servers and high
performance switching interconnects, a broad spectrum of applications (including, for
example, many applications in the life sciences, particle physics and seismic fields) are well
suited to lower performance systems. IBM provides Linux clusters, using Intel and AMD
Opteron processors, to serve these customers, allowing affordable supercomputing for
departmental HPC users
Copyright 2005@
IBM / CHPC / Confidential
Page
13
Clusters have also proven to be a cost-effective method for managing many HPC workloads.
It is common today for clusters to be used by large corporations, universities and government
labs to solve problems in life sciences, petroleum exploration, structural design, high-energy
physics, finance and securities, and more.
Now, with the recent introduction of densely packaged rack-optimized servers and blades
along with advances in software technology that make it easier to manage and exploit
resources, it is possible to extend the benefits of clustered computing to individuals and small
departments.
With the IBM Departmental Supercomputing Solutions, barriers to deploying clustered
servers- high price, complexity, extensive floor-space and power requirements - have been
overcome. Clients with smaller budgets and staff but with challenging problems to solve can
now leverage the same supercomputing technology used by large organizations and
prestigious laboratories.
IBM Departmental Supercomputing Solutions are offered in a variety of packaged, pre-tested
clustered configurations where clients have the flexibility to choose between configurations
with 1U servers or blades in reduced-sized racks, and servers with Intel Xeon or AMD
Opteron processors, running either Microsoft Windows or, more commonly, Linux operating
systems.
Mare Nostrum at University of Barcelona Supercomputing Centre, Spain
IBM and Linux
IBM is strongly committed to Linux, and our Linux support extends across server hardware,
software and through into our services groups. IBM is focusing on:
 Creating a pervasive application development and deployment environment built on
Linux.
 Producing an industry-leading product line capable of running both AIX (IBM Unix) and
Linux, together with the services needed to develop and deploy these applications.
Copyright 2005@
IBM / CHPC / Confidential
Page
14
 Creating bundled offerings including hardware, software and services built on Linux to
allow out-of-box capability in handling workloads best suited for Linux.
 Fully participating in the evolution of Linux through Open Source submissions of IBMdeveloped technologies and by partnering with the Open Source community to make
enhancements to Linux.
IBM is fully committed to the Open Source movement and believes that Linux will emerge as
a key platform for computing in the 21st century. IBM will continue to work with the Open
Source community, bringing relevant technologies and experience to enhance Linux, to help
define the standards and to extend Linux to support enterprise wide systems.
Blue Gene: A supercomputer architecture for high
performance computing applications
Blue Gene/L in its 360 TFLOP configuration
IBM Blue Gene is a supercomputing project dedicated to building a new family of
supercomputers optimized for bandwidth, scalability and the ability to handle large amounts of
data while consuming a fraction of the power and floor space required by today’s fastest
systems.
The first of the Blue Gene family of supercomputers, Blue Gene/L, has been delivered to the
U.S. Department of Energy’s National Nuclear Security Administration (NNSA) program for
Advanced Simulation and Computing (ASC) while others have been delivered to Edinburgh
University's Parallel Computing Centre and to ASTRON, a leading astronomy organization in
the Netherlands.IBM and its customers are exploring a growing list of high performance
computing applications that can be optimized on Blue Gene/L with projects in the life
sciences, hydrodynamics, quantum chemistry, molecular dynamics and climate modelling.
Copyright 2005@
IBM / CHPC / Confidential
Page
15
Blue Gene/L’s original mission was to enable life science researchers to design effective
drugs to combat diseases and to identify potential cures but its versatility, advanced
capabilities, compact size and power efficiency make it attractive for applications in many
fields including the environmental sciences, physics, astronomy, space research and
aerodynamics.
US ASC Purple procurement
The US Department of Energy's ASC Purple procurement comprises machines from each of
the above three architectures on IBM's Technology Roadmap. The smallest machine, is a 9.2
TFLOPS Linux Cluster. This is complemented by two Power clusters, the first a 12 TFLOP
Power4 / High Performance Switch system, and the second, just delivered, a 100 TFLOP
Power5 / High Performance Switch cluster which are the Lawrence Livermore National
Laboratory's main workhorse machines. The third system is a 360 TFLOP Blue Gene/L which
will be used for extreme computations on selected applications.
ASC Purple systems
Copyright 2005@
IBM / CHPC / Confidential
Page
16
#
ASC Purple 100 TFLOP machine architecture
Integration of heterogeneous clusters
Many customers do as ASC Purple did and procure more than one architecture and it then
becomes critical for those systems to be integrated into a seamless operating unit. IBM's
HPC software suite enables this integration by clusters to be managed from a common
management tool, by allowing multiple systems to access shared file systems under General
Purpose File System, by allowing jobs to be scheduled across the entire system and by
enabling grid access to the system. Applications further benefit from common compilers and
run-time libraries, reducing the porting and support effort required.
Copyright 2005@
IBM / CHPC / Confidential
Page
17
IBM's Deep Computing Capacity on Demand
IBM Deep Computing Capacity on Demand centres deliver supercomputing power to
customers over the Internet, freeing them from the fixed costs and management responsibility
of owning a supercomputer or providing increased capacity to cope with peaks of work.
Customers are provided with remote highly secure Virtual Private Network (VPN) access over
the Internet to supercomputing power owned and hosted by IBM enabling them rapidly and
temporarily to flex up/down High Performance Computing capacity in line with business
demands permitting them to respond to predictable or unpredictable peak workloads or
project schedule challenges. Clients can implement an optimal combination of in-house fixed
capacity/fixed cost and hosted variable capacity/variable cost HPC infrastructure.
Capacity on Demand Centres are equipped with the full range of IBM Deep Computing
systems including:
 IBM eServer Cluster 1350 with xSeries Intel 32-bit technology to run Linux™ or
Microsoft ™ Windows workloads
 IBM eServer Cluster 1350 with eServer AMD Opteron processor 32-bit/64-bit
technology to run Linux or Microsoft Windows workloads
 IBM pSeries with IBM POWER 64-bit technology to run Linux or IBM AIX 5L workloads
 IBM Blue Gene with PowerPC technology to run Linux-based workloads
Applications which are well suited to Capacity on Demand include most scientific fields, and
especially:
 Petroleum - seismic processing, reservoir simulation
 Automotive and Aerospace Computer-Aided Engineering - crash and clash analysis,
computational fluid dynamics, structural analysis, noise/vibration/harshness analysis,
design optimization
 Life Sciences - drug discovery, genomics, proteomics, quantum chemistry, structurebased design, molecular dynamics
 Electronics - design verification and simulation, auto test pattern generation, design
rule checking, mask generation, optical proximity correction
 Financial Services - Monte Carlo simulations, portfolio and wealth management, risk
management, compliance
 Digital Media - rendering, on-line game scalability testing
Storage
High Performance Computing systems require high performance disk storage systems to
provide local storage during the computational phase and large near line or off line systems,
usually tape, for longer term or permanent storage of user data.
General Parallel File System, GPFS
The IBM General Parallel File System (GPFS) is a high-performance shared-disk file system
which provides fast, reliable data access from all nodes in one or more homogeneous or
heterogeneous clusters of IBM UNIX servers running either AIX or Linux. GPFS allows
parallel applications simultaneous access to a single file or set of files from all nodes within
the clusters and allows data to be shared across separate clusters. While most UNIX file
systems are designed for a single-server environment, and adding more file servers does not
improve the file access performance, GPFS delivers much higher performance, scalability
Copyright 2005@
IBM / CHPC / Confidential
Page
18
and failure recovery by accessing multiple file system nodes directly and in parallel. GPFS's
high performance I/O is achieved by "striping" data from individual files across multiple disks
on multiple storage devices, and reading and writing data in parallel and performance can be
increased as required by adding more storage servers and disks.
GPFS is used on all large IBM clusters doing parallel computing, which includes many of the
world's largest supercomputers, supporting hundreds of Terabytes of storage and over 1000
disks in a single file system. The shared-disk file system provides every cluster node with
concurrent read and write access to a single file and high reliability and availability is assured
through redundant pathing and automatic recovery from node and disk failures. GPFS
scalability and performance are designed to meet the needs of data-intensive applications
such as engineering design, digital media, data mining, financial analysis, seismic data
processing and scientific research.
Tivoli Storage Manager, TSM
Tivoli Storage Manager, TSM, is IBM's backup, archive and Hierarchical Storage Manager
(HSM) software solution for desktop, server and HPC systems. It provides support for IBM
and many non-IBM storage devices and can manage hundreds of millions of files and TBytes
to PBytes of data. TSM can exploit SAN technology in a SAN environment.
TSM is used on many large IBM High Performance Computing systems to transfer data
between the on-line GPFS and the near-line of off-line tape libraries.
High Performance Storage System, HPSS
HPSS is a very high performance storage system aimed at the very largest HPC users who
require greater performance than traditional storage managers, like Tivoli, can provide. HPSS
software manages hundreds of Terabytes to Petabytes of data on disk cache and in robotic
tape libraries providing a highly flexible and fully scalable hierarchical storage management
system keeping recently used data on disk, and less recently used data on tape. HPSS uses
cluster and SAN technology to aggregate the capacity and performance of many computers,
disks, and tape drives into a single virtual file system of exceptional size and versatility. This
approach enables HPSS easily to meet otherwise unachievable demands of total storage
capacity, file sizes, data rates, and number of objects stored.
Copyright 2005@
IBM / CHPC / Confidential
Page
19
HPSS Architecture
HPSS is the result of over a decade of collaboration between IBM and five US Department of
Energy laboratories, with significant contributions by universities and other laboratories
worldwide. The approach of jointly developing open software with end users having the most
demanding requirements and decades of high-end computing and storage system experience
has proven a sound recipe for continued innovation, industry leadership and leading edge
technology. The founding collaboration partners are IBM, Lawrence Livermore National
Laboratory, Los Alamos National Laboratory, ational Energy Research Supercomputer
Center (NERSC) at Lawrence Berkeley National Laboratory, Oak Ridge National Laboratory
and Sandia National Laboratories.
HPSS was developed to provide:
Scalable Capacity - As architects continue to exploit hierarchical storage systems to scale
critical data stores beyond a Petabyte (1024 Terabytes) towards an Exabyte (1024
Petabytes), there is a equally critical need to deploy a high performance, reliable and scalable
HSM systems.
Scalable I/O Performance - As processing capacity grows form trillions of operations per
second towards a quadrillion operations per second and data ingest rates grows from tens of
Terabytes per day to hundreds of Terabytes per day, HPSS provides a scalable data store
Copyright 2005@
IBM / CHPC / Confidential
Page
20
with reliability and performance to sustain 24 x 7 operations in demanding high availability
environments.
Incremental Growth - As data stores with 100s of Terabytes and Petabytes of data become
increasing common, HPSS provides a reliable, seamlessly scalable and flexible solution using
heterogeneous storage devices, robotic tape libraries and processors connected via LAN,
WAN and SAN.
Reliability - As data stores increase from tens to hundreds of million of files and collective
storage I/O rates grow to hundreds of Terabytes per day, HPSS provides a scalable meta data
engine ensuring highly reliable and recoverable transactions down to the I/O block level.
HPSS sites which have accumulated one Petabyte or more of data, in a single HPSS file
system include:
 Brookhaven National Laboratory, US
 Commissariat à l'Energie Atomique, France
 European Centre for Medium-Range Weather Forecasts, UK
 Lawrence Livermore National Laboratory, US
 Los Alamos National Laboratory, US
 National Energy Research Scientific Computing Center, US
 San Diego Supercomputer Center, US
 Stanford Linear Accelerator Center, US
The data stored in these systems ranges from digital library material to science, engineering
and defence application data, and includes data from nanotechnology, genomics, chemistry,
biochemistry, radiology, functional magnetic resonance imaging, fusion energy, energy
efficiency, astrophysics, nuclear physics, accelerator physics, geospatial data, digital audio,
digital video, weather, climate and computational fluid dynamics.
IBM Deep Computing Visualization
High Performance Computing systems generate vast quantities of data and high performance
visualisation systems are essential if the information buried within the increasingly large,
sometimes distributed, data sets data is to be found. Visualization is the key to unlocking data
secrets.
Effective visualization systems require massive computing power and intensive graphics
rendering capabilities, which has traditionally lead to extremely specialized and monolithic
infrastructures. IBM's Deep Computing Visualization (DCV) solution offers an alternative
approach by using IBM IntelliStation workstations and innovative middleware to leverage the
capabilities of the latest generations of commodity graphics adapters to create an
extraordinarily flexible and powerful visualization solution.
Deep Computing Visualization is a complete solution for scientific researchers, who need
increased screen resolutions and/or screen sizes while maintaining performance. DCV
enables the display of applications on high-resolution monitors, or on large, multi-projector
display walls or caves with no, or minimal, impact on performance. IT also makes graphics
applications much easier to manage by keeping the application in one central location and
securely transferring visual data to remote collaborators anywhere on the network.
Copyright 2005@
IBM / CHPC / Confidential
Page
21
The Deep Computing Visualisation Concept
Deep Computing Visualisation is a visualisation infrastructure that combines commodity
components with a scalable architecture and remote visualization features. Its goal is to
leverage the price/performance advantage of commodity graphics components and
commodity interconnect adapters to provide visualisation systems of great power and
capability, from workstations to high end systems.
At the heart of IBM’s DCV infrastructure is middleware that links multiple workstations into a
scalable configuration, effectively combining the output of multiple graphics adapters. DCV is
based on Linux and OpenGL and addresses two major market demands for scientific
visualization:
 For users of traditional high performance proprietary graphics systems, DCV aims to
provide equivalent or better levels of performance at a lower cost on many applications.
–
For users of personal workstations and PCs, DCV aims to provide a more scalable
solution that is still based on commodity technologies.
DCV has two built-in functional modes which previously were available only on high-end
proprietary graphics systems.

Scalable Visual Networking is a "one-to-few" mode that displays applications, without
modification, on multiple projectors and/or monitors, creating immersive or stereo
visualization environments.

Remote Visual Networking is a "one-to-many" mode that transports rendered frames to
multiple remote locations, allowing geographically dispersed users to simultaneously
participate in collaborative sessions.
Both Scalable Visual Networking and Remote Visual Networking can be used simultaneously
or separately without additional setup, allowing both local and remote users to participate in
immersive environments.
Copyright 2005@
IBM / CHPC / Confidential
Page
22
Deep Computing Visualisation improves the ability to gain insight by enhancing 2D and 3D
graphics capabilities, portraying large quantities of data in ways that are easy to understand
and analyse and which support better decision making.
References
IBM Power Architecture
http://www.ibm.com/technology/power/
IBM Deep Computing products
–
Overview:
–
Departmental Supercomputers:
http://www.ibm.com/servers/eserver/clusters/hardware/dss.html
–
Clusters:
http://www.ibm.com/servers/eserver/clusters/
–
pSeries Power servers:
http://www.ibm.com/servers/eserver/pseries/
–
Intel servers:
http://www.ibm.com/servers/eserver/xseries/
–
AMD Opteron servers:
http://www.ibm.com/servers/eserver/opteron/
–
Blades:
http://www.ibm.com/servers/eserver/bladecenter/index.html
–
Storage systems:
http://www.ibm.com/servers/storage/index.html
http://www.ibm.com/servers/deepcomputing/offer.html
GPFS:
http://www-1.ibm.com/servers/eserver/clusters/software/gpfs.html
TSM:
http://www-306.ibm.com/software/tivoli/products/storage-mgr/
HPSS:
http://www.hpss-collaboration.org/hpss/index.jsp
Deep Computing Capacity on Demand: http://www.ibm.com/servers/deepcomputing/cod.html
Deep Computing Visualisation: http://www.ibm.com/servers/deepcomputing/visualization/
Copyright 2005@
IBM / CHPC / Confidential
Page
23
3 GRID
GRID for academia and scientific research and
development
Research requires the analysis of enormous quantities of data. Researchers need to share
not only data and processors but also instrumentation and test equipment, becoming virtual
organizations (VOs). Grids allow these organizations to make database and file-based data
available across departments and organizations along with securing data access and
optimizing storage. Grid technologies enable creating and managing VOs giving faster, more
seamless research collaboration by
 Reducing research and development costs and increasing the efficiency of codevelopment
 Reducing the time to market by executing tasks faster and more accurately
 Improving the hit-rates through better simulation of real-world characteristics
 Seamless sharing of raw data
IBM grid and deep computing solutions provide faster and more seamless research
collaboration by:
 Helping researchers analyse data through simplified data access and integration
 Enabling a flexible and extensible infrastructure that adapts to changing research
requirements
 Helping procure compute and data resources on demand
 Providing on demand storage capacity for data collected from sensor etc
 Facilitating fusion engines that assimilate, aggregate and correlate raw information
Copyright 2005@
IBM / CHPC / Confidential
Page
24
Grid solutions start with the IBM Grid Toolbox and GLOBUS toolkit and can incorporate IBM
eServer xSeries, pSeries, zSeries, IBM Total Storage, Loadleveler, DB2 database, DataJoiner,
General Parallel File System, NFS, Content Manager, SAN FS, ITIO, ITPM, Tivoli Suite, CSM,
WebSphere Application Server, software from IBM Business Partners Platforms (LSF and
Symphony), DataSynapse GridServer, United Devices and Avaki. IBM Business Consultancy
Services and ITS can provide the skills and resource to implement full grid solutions.
Unlike many vendors, IBM offers clients a full end-to-end solution, from initial design concept,
through detailed design, supply of hardware and software and implementation services to roll
out the complete solution.
The key differentiators between IBM and other potential grid suppliers can be summarised as
our breakthrough initiatives and thought leadership, vast product and patent portfolio, deep
industry experience, heavy research investment (people, facilities, projects), extensive
intellectual capital and our partnerships with open standards groups, application providers and
customers. IBM has implemented hundreds of grid implementations worldwide including
academic customers such as Marist College (US), University of Florida, University of
Oregon and CINECA (France). Other grid implementations include
AIST (National Institute of Advanced Industrial Science and Technology) - Japan's largest
national research organization provides an on-demand computing infrastructure which
dynamically adapts to support various research requirements.
Butterfly net – who made the strategic decision to build an architecture based on the grid
computing model, using standard protocols and open-source technologies.
EADS (European Aeronautic Defence and Space Company) – who cut analysis and
simulation time, while improving the quality of the output.
Copyright 2005@
IBM / CHPC / Confidential
Page
25
IN2P3 (Institut National de Physique Nucleaire et de Physique des Particules) – this French
research institute improves performance of research projects and enhances collaboration of
European technical community.
Marist College – this U.S. college needed a more stable, resilient and powerful platform for
internal IT operations and computer science student laboratories.
Royal Dutch Shell – who use grid applications for seismic studies in the petroleum industry.
Tiger - The government of Taiwan's technological research grid to integrate in-country
academic computing resources for life sciences and nanotechnology.
University of Pennsylvania, NDMA – who uses a powerful computing grid to bring advanced
methods of breast cancer screening and diagnosis to patients across the nation, helping to
reduce costs at the same time.
Some of the many reasons which have lead clients like these to choose IBM to implement
their grids include:
Unrivalled expertise - IBM has implemented grid computing for over one hundred
organizations worldwide and IBM uses grids internally to support hundreds of thousands of
employees.
Range of solutions from small to large - Clients can start small and grow with one of IBM’s
21 industry-focused grid offerings
Grid Computing leadership - Analysts and media continue to cite IBM's grid activities as
industry leading.
IBM partnerships with leading grid middleware and application ISVs – including like
SAS, Dassault, Cadence, Accelrys, Cognos, DataSynapse, Platform Computing, Avaki,
United Devices, etc.
Full business partner support - IBM’s Solution Grid for Business Partners allows business
partners to use the IBM grid infrastructure to remotely access IBM tools, servers, and storage
so partners can grid enable and validate their applications for a distributed, on demand
environment and bring their applications to market faster.
Breadth of IBM solution - IBM offers a grid computing platform consisting of industry leading
middleware from IBM and our business partners, allowing clients to selectively choose the
components of their grid solution.
 Tivoli Provisioning Manager and Orchestrator provide resource provisioning and
allocation within a grid solution.
 e-Workload Manager offers best-of-breed distributed workload management based on
years of experience in managing mainframe workloads.
 WebSphere XD (Business Grid) offers a compelling value proposition based upon
integrating J2EE transactional and grid workloads.
IBM's products in the data space allow customers to virtualise their enterprise information
assets without requiring significant changes to their information architecture.
 DB2 Information Integrator
 High performing file systems (General Parallel File System, SAN File System), and
 Industry leading storage virtualisation via SAN Volume Controller
IBM’s deep commitment to open standards - grid computing is more powerful when
implemented with open standards such as WS-RF, Linux, OGSA. IBM is a key collaborator in
the Globus Project, the multi-institutional research-and-development effort for grid which is
Copyright 2005@
IBM / CHPC / Confidential
Page
26
developing Open Grid Services Architecture (OGSA), a set of standards and specifications
that integrate Web services with grid computing. IBM sponsors the Global Grid Forum,
whose mission is the development of industry standards for grid computing, and IBM led the
development of OGSA into Web Services standards, the Web Services Resource Framework
(WS-RF).
IBM Business Consultants and world-class and worldwide support - to help clients
leverage grid technology. BCS has hundreds of consultants, with industry expertise, trained
on applying Grid Technologies while IBM Design Centers in the US, EMEA and Japan support
advanced client engagements.
References
IBM Grid:
http://www-1.ibm.com/grid/index.shtml
IBM Grid Toolbox:
http://www-1.ibm.com/grid/solutions/grid_toolbox.shtml
Grid Solution for higher education:
http://www.ibm.com/industries/education/doc/content/solution/438704310.html?g_type=rhc
IBM Grid Solutions for aerospace, agricultural chemicals, automotive electronics, financial
services, government, higher education, life sciences and petroleum:
http://www-1.ibm.com/grid/solutions/index.shtml
Customer implementations of grids:
Life Sciences: http://www-1.ibm.com/grid/gridlines/January2004/industry/lifesciences.shtml
Research at Argonne National Laboratory:
http://www.ibm.com/grid/gridlines/January2004/feature/laying.shtml
Government:
http://www-1.ibm.com/grid/gridlines/January2004/industry/govt.shtml
Academia:
http://www-1.ibm.com/grid/gridlines/January2004/industry/education.shtml
Teamwork:
http://www-1.ibm.com/grid/gridlines/January2004/feature/teamwork.shtml
Copyright 2005@
IBM / CHPC / Confidential
Page
27
4 IBM in the Deep Computing Marketplace
IBM is the world's leading supplier of High Performance Computing systems.
The Top500 Supercomputers List tabulates the 500 largest supercomputers worldwide and
IBM has 216 entries in the list, more than any other vendor, and over 49% of the aggregate
throughput. IBM has most systems, four, in the Top10; the most, eight in the Top20; and the
most, 58, in the Top100. There are 294 Linux clusters on the List and IBM supplied 161 of
them showing our strength in both traditional and Linux clusters and in systems from the very
largest to more modest systems.
The leader of the Top500 List is IBM's Blue Gene/L supplied to the US US Department of
Energy at the Lawrence Livermore National Laboratory as part of ASC Purple contract, and
rated at over 70 TFLOP (Linpack R max) when the list was published in November 2004.
This system was then only one quarter of its final contracted size and it has seen been
doubled in size to about 140 TFLOPS and will be doubled again this year. Lawrence
Livermore National Laboratory (LLNL) is a US Department of Energy national laboratory
operated by the University of California. Initially founded to promote innovation in the design
of the US nuclear stockpile through creative science and engineering, LLNL has now become
one of the world's premier scientific centres, where cutting-edge science and engineering in
the interest of national security is used to break new ground in all areas of national
importance, including energy, biomedicine and environmental science.
IBM has since installed a further component of ASC Purple, namely a 100 TFLOP Power5,
12,000 CPU pSeries cluster.
The largest Linux cluster in the list is Mare Nostrum, the largest supercomputer in Europe and
number four on the Top500 List. It is a 20 TFLOP (Linpack R max) system with over 2,200
eServer blades installed at the Ministerio de Ciencia y Tecnologia, University of Barcelona
Supercomputing Centre and provides high performance computing to academic and scientific
researchers in Spain running applications as diverse as life sciences (proteomics,
bioinformatics, computational chemistry) weather and climate research and materials
sciences.
Other IBM systems in the Top 100 include:
–
Weather forecasting - European Centre for Medium-Range Weather Forecasts, ECMWF
provides 3 to 10 day forecasts for member states throughout Europe.
–
Oceanography - Naval Oceanographic Office (NAVOCEANO), US
–
Scientific research - the US National Energy Research Scientific Computing Center
(NERSC), the flagship scientific computing facility for the Office of Science in the U.S.
Department of Energy. NERSC provides computational resources and expertise for basic
scientific research and is a world leader in accelerating scientific discovery through
computation)
–
Scientific research and Grid - The US National Center for Supercomputing Applications
(NCSA) is a leader in defining the future's high-performance cyberinfrastructure for
scientists, engineers, and society. The Center is one of the four original partners in the
TeraGrid project, a National Science Foundation initiative that now spans nine sites. When
completed, the TeraGrid will be the most comprehensive cyberinfrastructure ever deployed
for open scientific research, including high-resolution visualization environments, and
computing software and toolkits connected over the world's fastest network.
–
Academic research – University College of San Diego Supercomputer Center, US
Copyright 2005@
IBM / CHPC / Confidential
Page
28
–
Academic research – HPCx is the UK national academic supercomputer, run by a
consortium led by the University of Edinburgh, with the Council for the Central Laboratory
for the Research Councils (CCLRC) and IBM and funded by the Engineering and Physical
Sciences Research Council.
–
Grid - The Grid Technology Research Centre (GTRC) in Tsukuba, Japan was founded in
January 2002 with a mission to lead collaboration between industrial, academic and
government sectors and serve as a world leading grid technology Research and
Development centre.
–
Scientific research - Forschungszentrum Juelich (FZJ) is the Jülich Research Centre, a
major German multi-disciplinarian research centre conducting research in high energy
physics, information technology, energy, life sciences and environmental science
–
weather forecasting – the US National Centers for Environmental Prediction is the US
weather forecasting agency.
–
Atmospheric research – US National Center for Atmospheric Research is operated by the
University Corporation for Atmospheric Research, a non profit corporation of 61 North
American universities with graduate programs in atmospheric sciences. NCAR conducts
research in the atmospheric sciences, collaborates in large multi-institution research
programs and develops and provides facilities to support research programs in UCAR
universities and at the Center itself.
–
Petroleum – Saudi Aramco is a leading oil exploration and production organisation
–
Seismic processing – Geoscience UK is a leading seismic exploration company while
other seismic installations include Conoco Philips and PGS in the US
–
Grid – Westgrid provides high performance computing, networking and collaboration tools
to universities in western Canada, concentrating on grid enabled projects.
–
Academic research – CINECA, the Inter-university Consortium for North Eastern Italy
provides high performance computing to thirteen universities conducting public and private
research
–
Academic research - Institute of Scientific Computing, Nankai University, China is the
largest supercomputer in China and provides computing for basic and applied research in
on of China's top universities.
–
Scientific research - Korea Institute of Science and Technology, South Korea is a multidisciplinary research facility
and in commercial banks (UK and Germany), Credit Suisse (US and Switzerland),
semiconductor companies (Israel and US), petroleum (Brazil), manufacturing company (US),
University of Southern California (US), University of Hong Kong, Environmental Canada (the
Canadian weather forecaster), University of Alaska (US), electronics companies (US), digital
media (UK), Oak Ridge and Sandia National Laboratories (both US), Max Plank Institute
(Germany), military research (US Army, UK atomic weapons), Sony Pictures (US) and
Deutscher Wetterdienst (Germany's weather forecaster). Thirty three of the thirty five entries
in the list from 100 to 134 are IBM Linux Clusters.
IBM has installed a number of Blue Gene/L systems since the November list was published.
The University of Edinburgh Parallel Computing Centre (UK) installed the first Blue Gene/L in
Europe for a joint project with IBM to tackle some of the most challenging puzzles in science,
such as understanding the behaviour of large biological molecules and modelling complex
fluids.
Copyright 2005@
IBM / CHPC / Confidential
Page
29
ASTRON, a leading astronomy organization in the Netherlands, has installed a Blue Gene/L
for a joint research project with IBM to develop a new type of radio telescope capable of
looking back billions of years in time to provide astronomers unique insight into the creation
of the earliest stars and galaxies at the beginning of time. The Astron Blue Gene/L, while
occupying only six racks, would rank fourth on the current Top500 list.
HPCx – the UK National Academic Supercomputer
HPCx is one of the world's most advanced high performance computing centres and provides
facilities for the UK science and engineering research communities enabling them to solve
previously inaccessible problems. HPCx is funded by the Engineering and Physical Science
Research Council, the Natural Environment Research Council and the Biotechnology and
Biological Sciences Research Council and HPCx is operated by the CCLRC Daresbury
Laboratory and Edinburgh University.
The HPCx Consortium is led by the University of Edinburgh, with the Council for the Central
Laboratory for the Research Councils (CCLRC) and IBM providing the computing hardware
and software. The University of Edinburgh Parallel Computing Centre, EPCC, has a long
experience of national HPC services and continues to be a major centre for HPC support and
training. The Computational Science and Engineering Department at CCLRC Daresbury
Laboratory has for many years worked closely with research groups throughout the UK
through many projects, including EPSRC's Collaborative Computational Projects, and has
unmatched expertise in porting and optimising large scientific codes.
HPCx at Daresbury Laboratory, UK
The HPCx cluster is based on IBM POWER 4 pSeries servers and currently comprises 52 x
32 way 1.7 GHz p690+ servers, each configured with 32GB of memory. The system has 36
TBytes of storage accessible under General Parallel File System. The first phase, delivered
in 2002, provided a peak capability of 6.7 TFLOPS and this was upgraded in 2004, with
virtually no disruption to the operational service, to over 11 TFLOPS, while a further upgrade
to 22 TFLOPS in 2006 is planned. At the end of March 2005, IBM's HPCx system achieved a
remarkable milestone in that there had been no break of service attributable to IBM since
September 2004 – six months of faultless running for one of the largest computing systems in
Europe.
Copyright 2005@
IBM / CHPC / Confidential
Page
30
HPCx is used to address important and complex problems in a wide range of sciences from
the very small, such as the nature of matter, to the very large, such as simulations of whole
systems from cells and organs to global simulations of the Earth. It has enabled new
advances to be made in the human genome project, helped engineers to design new and
safer structures and aircraft and assisted in opening up completely new fields of research,
such as bio-molecular electronics.
HPCx is used for a wide variety of peer-reviewed academic, scientific and engineering
research projects including atomic and molecular physics, biochemistry, computational
chemistry, computational engineering, environmental sciences, computational material
science and numerical algorithms. Some of the new application areas which HPCx will
enable include:
Drug design - tomorrow's drugs will be highly specific and finely targeted using Terascale
technology. It is already known how individual molecules interact with proteins but HPCx will
enable more molecules to be screened more quickly so more potential chemical compounds
can be tested for their potential in treating disease.
Flight simulation - at present only the airflow around the wing of an aircraft can be
simulated, but HPC can, potentially, enable the analysis of the entire flow around an aircraft.
By looking at how turbulent the air is behind an aeroplane on take-off could mean greater use
of air space and ease the control of traffic in the air.
Structure of the Earth - the Earth's core has a major impact on our lives, for example it
shapes the magnetic field which acts as a protection from the harmful effects of charged
particles from the Sun. HPC techniques can be used to investigate the structure and
behaviour of the core in a way that is impossible by direct observation and experiment.
References
Top500 Supercomputers List:
http://www.top500.org
Lawrence Livermore National Laboratory:
http://www.llnl.gov/
ASC Purple Project:
http://www.llnl.gov/asci/platforms/purple/
Mare Nostrum
http://www.bsc.es/
ECMWF:
http://www.ecmwf.int
NERSC:
http://www.nersc.gov/
NCSA:
http://www.ncsa.uiuc.edu/
NCSA TeraGrid Project:
http://www.ncsa.uiuc.edu/Projects/TeraGrid/
University College San Diego:
http://www.sdsc.edu/
National Centers for Environmental Prediction:
http://www.ncep.noaa.gov/
Grid Technology Research Centre, Japan:
http://www.gtrc.aist.go.jp/en/
Forschungszentrum Juelich:
http://www.fz-juelich.de/portal/home
HPCx:
http://www.hpcx.ac.uk/
Copyright 2005@
IBM / CHPC / Confidential
Page
31
5 IBM Research Laboratories
IBM Research activities extend over a broad area where technical disciplines include
chemistry, computer science, electrical engineering, materials science, mathematical
sciences, physics and cross-disciplinary activities include communications technology, deep
computing, display technology, e-commerce, personal systems, semiconductor technology,
storage and server and embedded systems.
IBM researchers in eight laboratories around the world work with each other and with clients,
universities and other partners on projects varying from optimizing business processes to
inquiring into the Big Bang and the origins of the universe. IBM Research's focus is to
continue to be a critical part of IBM's success by balancing projects that have an immediate
impact with those that are long-term investments.
Work with clients
IBM and its Research division realize the importance of delivering innovation and competitive
advantage to our clients and to aid them in achieving their specific goals, IBM Research has
created On Demand Innovation Services, the First-of-a-Kind program and the Industry
Solutions Lab.
On Demand Innovation Services
On Demand Innovation Services provide research consultants to partner with consultants
from IBM Global Services on client engagements that explore cutting-edge ways to increase
clients' flexibility and provide them with unique market advantages in line with IBM's on
demand strategy.
First-of-a-Kind (FOAK) Projects
First-of-a-Kind projects are partnerships between IBM and clients aimed at turning promising
research into market-ready products. Matching researchers with target companies to explore
new and innovative technologies for emerging opportunities gives clients a research team to
solve problems that don't have ready solutions. Researchers get immediate client feedback to
further enhance their projects.
Industry Solutions Laboratories
The Industry Solutions Labs, located in New York and Switzerland, give IBM clients the
chance to discover how leading-edge technologies and innovative solutions can help solve
business problems. Each visit is tailored to meet specific needs, focused enough to target an
immediate objective or broad enough to cover an array of emerging technologies. The
engagement generally lasts for one day and consists of presentations by IBM scientists and
industry experts, collaborative discussions on specific business issues and demonstrations of
key strategic solutions.
Organisations which have taken undertaken research projects with IBM include Bank
SinoPac, El-Al Israel Airlines, US Federal Wildland Fire Management Agencies, Finnair,
GILFAM, the Government of Alberta, Steelcase and Swiss Post.
Work with universities
IBM research extends beyond the boundaries of our labs to colleagues in university labs and
researchers regularly publish joint papers with them in areas of mutual interest. In special
Copyright 2005@
IBM / CHPC / Confidential
Page
32
cases, IBM fosters collaborative relationships through Fellowships, grants and shared
research programs. Amongst others, IBM has collaborated with Carnegie Mellon University,
Concordia University - Montreal, Canada, Florida State University, Massachusetts Institute of
Technology , Stanford University, Technion - Israel University of Technology, Tel-Aviv
University, University of Coimbra, Portugal, University of Edinburgh and Virginia Polytechnic
Institute and State University.
IBM Deep Computing Institute
IBM Deep Computing Institute is an organization within IBM Research that coordinates,
promotes and advances Deep Computing activities. The institute has three objectives:
–
Develop solutions to previously intractable business and scientific problems by exploiting
IBM's strengths in high-end computing, data storage and management, algorithms,
modelling and simulation, visualization and graphics.
–
Realize the potential of the emerging very large scale computational, data and
communications capabilities in solving critical problems of business and science.
–
Lead IBM's participation within the scientific community, and in the business world, in this
important new domain of computing.
Systems Journal and the Journal of Research and
Development
IBM regularly publishes the Systems Journal and the Journal of Research and Development,
both of which can be accessed from the web. The Journal of Research and Development,
Vol. 45, No. 3/4, 2001 was titled Deep Computing for the Life Sciences while Vol. 48, No. 2,
2004 was entitled Deep Computing.
References
IBM research:
http://www.research.ibm.com
IBM research work with clients: http://www.research.ibm.com/resources/work_clients.shtml
IBM work with universities: http://www.research.ibm.com/resources/work_universities.shtml
IBM Journals:
Copyright 2005@
http://www.research.ibm.com/journal/
IBM / CHPC / Confidential
Page
33
6 HPC Network Facilitation – fostering
relationships in the HPC community
IBM considers it very important for its high performance computing customers to meet with
IBM developers and with other HPC customers to exchange information, make suggestions
for improvements or future requirements. There are now a number of fora in which this
information exchange can take place.
Executive Briefings
IBM hosts regular one-on-one briefings for HPC customers, usually in the development
laboratories, where customers can meet with system developers to discuss future
developments and requirements.
Conferences and seminars
IBM holds regular conferences and seminars on Deep Computing including, for example, the
recent Deep Computing Executive Symposium in Zurich in 2004, details of which can be
found at https://www-926.ibm.com/events/04deep.nsf
IBM System Scientific Computing User Group,
SCICOMP
IBM fully supports the IBM System Scientific Computing User Group, SCICOMP, is an
international organization of scientific and technical users of IBM systems. The purpose of
SCICOMP is to share information on software tools and techniques for developing scientific
applications that achieve maximum performance and scalability on systems, and to gather
and provide feedback to IBM to influence the evolution of their systems. To further these
goals, SCICOMP will hold periodic meetings which will include technical presentations by
both IBM staff and users, focusing on recent results and advanced techniques. Discussions of
problems with scientific systems will be held and aimed at providing advisory notifications to
IBM staff detailing the problems and potential solutions. Mailing lists will be maintained to
further open discussions and provide the sharing of information, expertise, and experience
that scientific and technical applications developers need but may not easily find anywhere
else.
SCICOMP meetings are held once a year, normally in the spring and alternate meetings are
held outside the USA at sites chosen by members at least a year in advance. Additional
informal meetings, such as Birds-of-a-Feather sessions at other conferences, may also be
scheduled. Information on past and proposed meetings, including presentation materials, can
be found on the SCICOMP web site at http://www.spscicomp.org/
SCICOMP 10 was hosted by the Texas Advanced Computing Centre in August 2004, while
SCICOMP 11 will be hosted by Edinburgh University, Scotland in May 2005 and SCICOMP 12
will be hosted by the National Centre for Atmospheric Research in Colorado, USA, in early
2006.
SP-XXL
IBM fully supports SP-XXL, a self-organized and self-supporting group of customers with the
largest IBM systems. Members and affiliates actively participate in SP-XXL meetings which
are held around the world at approximately six monthly intervals. Member institutions are
Copyright 2005@
IBM / CHPC / Confidential
Page
34
required to have an appropriate non-disclosure agreement in place with IBM as IBM often
discloses and discusses confidential information with SP-XXL.
The focus of the SP-XXL is on large-scale scientific and technical computing on IBM systems.
SP-XXL addresses topics across a wide range of issues important to achieving Terascale
scientific and technical computing on scalable parallel machines including applications, code
development tools, communications, networking, parallel I/O, resource management, system
administration and training. SP-XXL believes that by working together, customers are able to
resolve issues with better solutions, provide better guidance to IBM, improve our own
capabilities through sharing knowledge and software and collaboratively develop capabilities
which we would not typically be able to as individual institutions.
Details on SP-XXL can be found at http://www.spxxl.org/
UK HPC Users Group
IBM fully supports the UK HPC Customer User Group, a self-organized and self-supporting
group of UK based customers with the largest IBM HPC systems. The group meets
approximately every six months and IBM is usually to attend for discussions.
References
IBM Executive Briefing Centers: http://www-1.ibm.com/servers/eserver/briefingcenter/
Copyright 2005@
IBM / CHPC / Confidential
Page
35
7 Collaboration Projects
IBM collaborates extensively with customers worldwide and some examples of collaborations
include:
University of Swansea, UK - Institute of Life Sciences
The University of Swansea is setting up an Institute of Life Sciences which aims to provide an
innovative and imaginative problem-solving environment through a unique collaboration
between regional Government, a business leader and a world-class university. The Institute
is expected to become one of the world's premier scientific and computing facilities and will
host a new European Deep Computing Visualisation Centre for Medical Applications. The
Institute and IBM have a multi-year collaboration agreement which includes a new IBM
supercomputer, one of the fastest computers in the world dedicated to life science research,
designed to accelerate ILS programmes. The Deep Computing Visualisation Centre for
Medical Applications will research new solutions for healthcare treatment, personalised
medicine and disease control. IBM will provide technical expertise and guidance, as well as
specialist life science solutions that will enable a joint development programme.
The ILS is a key step in delivering the recommendations set out in Knowledge Economy
Nexus - the final report of the Welsh Assembly Government's Higher Education and Economic
Development Task and Finish Group. IBM's contribution will comprise hardware infrastructure
(the 'Blue C' supercomputer and visualisation system), software and implementation
services. Importantly, IBM will also provide extensive industry and Life Sciences knowledge
and expertise, in order to accelerate and drive research programmes in collaboration with the
University. IBM is one of the leading providers of technology and services to the life sciences
sector. It has more than 1,000 employees around the world dedicated to the field, including
bioinformaticians, biologists, chemists, and computer scientists. The agreement with the
Swansea University is part of IBM's continued commitment to the healthcare and life sciences
sectors and its strategy of partnering with some of the world's leading research organisations.
Life Sciences is recognised as one of the most fertile sources of technology transfer having
the potential to create massive economic wealth from developments in the knowledge
economy, through research, intellectual property licensing, spin out companies and inward
investment.
University of Edinburgh , UK - Blue Gene
The University of Edinburgh has installed a commercial version of IBM’s Blue Gene/L
supercomputer. The university’s system, which was the first Blue Gene system to run in
Europe, is a smaller version of its much faster prototype, yet will still be in the top five most
powerful systems in the UK. University researchers hope that the computer will eventually
provide an insight into various key scientific fields, including the development of diseases
such as Alzheimer’s, cystic fibrosis and CJD. Edinburgh University, in collaboration with IBM
and the Council for the Central Laboratory of the Research Councils in the UK, already
manages the largest supercomputer service for academic use in Europe, through the High
Performance Computing (HPCx) initiative.
IBM Academic Initiative - Scholars Program
The IBM Academic Initiative - Scholars Program offers faculty members and researchers
software, hardware and educational materials designed to help them use and implement the
latest technology into curriculum and research. The IBM Scholars Program provides
accredited and approved academic members with access to a wide range of IBM products for
instructional, learning and non-commercial research. Offerings range from no-charge licenses
Copyright 2005@
IBM / CHPC / Confidential
Page
36
for IBM software (including WebSphere, DB2, Lotus and cluster software), to academic
discounts for IBM eServers, to ready-to-use curriculum.
Faculty and researchers at higher education institutions and secondary/high schools can
apply for the IBM Scholars Program and have access to:
 Most comprehensive set of e-business software available
 Discounts on servers
 Access to Linux and zSeries hubs
 Training and educational materials
 Curriculum and courseware
 Certification resources and special offers
 Technical support
 Newsletters and newsgroups
Full details can be found at http://www.developer.ibm.com/university/scholars/
IBM's Shared University Research Program
IBM's highly-selective Shared University Research (SUR) program awards computing
equipment (servers, storage systems, personal computing products, etc.) to institutions of
higher education around the world to facilitate research projects in areas of mutual interest
including: the architecture of business and processes, privacy and security, supply chain
management, information based medicine, deep computing, Grid Computing, Autonomic
Computing and storage solutions. The SUR awards also support the advancement of
university projects by connecting top researchers in academia with IBM researchers, along
with representatives from product development and solution provider communities. IBM
supports over 50 SUR awards per year worldwide.
In 2004, IBM announced the latest series of Shared University Research (SUR) awards,
bringing the company's contributions to foster collaborative research to more than $70 million
over the last three years. With this latest set of awards, IBM sustains one of its most important
commitments to universities by enabling the collaboration between academia and industry to
explore research in areas essential to fuelling innovation.
The new SUR awards will support 20 research projects with 27 universities worldwide.
Research projects range from a multiple university exploration of on demand supply chains to
an effort to find biomarkers for organ transplants. The research reflects the nature of
innovation in the 21st century – at the intersection of business value and computing
infrastructure. Universities receiving these new awards include: Brown University, Cambridge
University (UK), Columbia University, Daresbury University (UK), Fudan University (China),
North Carolina Agricultural & Technical State University, Politecnico di Milano (Italy), SUNY
Albany, University of Arizona, University of British Columbia (Canada), University of California
– Berkeley, University of Maryland – Baltimore County, College Park, and Uppsala University
(the Netherlands) and Technion – Israel Institute of Technology.
"Universities play a vital role in driving innovation that could have a business or societal
impact," said Margaret Ashida, director of corporate university relations at IBM. "The research
collaborations enabled by IBM's Shared University Research award program exemplify the
deep partnership between academia and industry needed to foster innovation that matters."
Copyright 2005@
IBM / CHPC / Confidential
Page
37
Examples of SUR projects already under way include:
–
IBM is working with Oxford University to find better and faster access to more reliable and
accurate mammogram images, thereby potentially increasing early cancer detection and
the number of lives saved
–
IBM is collaborating with Penn State University, Arizona State University, Michigan State
University and University College Dublin to create supply chain research labs to conduct
research on advanced supply chain practices that can be used to help businesses
respond on demand to changing market conditions.
Columbia University and IBM researchers worked on a project to develop core technologies
needed for using computers to simulate protein folding, predict protein structure, screen
potential drugs and create an accurate computer aided drug design program.
Copyright 2005@
IBM / CHPC / Confidential
Page
38
8 IBM Linux Centres of Competency
IBM has established a number of Linux Centres of Competency worldwide to offer deep,
specialized, on-site skills to customers and Business Partners looking to understand how to
leverage Linux-enabled solutions to solve real business challenges. The IBM Linux Centres
provide in-depth technology briefings, product and solution demonstrations, workshops and
events, while also serving as showcases for Independent Software Vendors' applications on
Linux, and providing facilities for customer proofs of concept. The Centres of Competency are
closely affiliated with the IBM Executive Briefing Centres. For more information, visit:
http://www-1.ibm.com/servers/eserver/briefingcenter/
The first Linux Centres of Competency were established in Austin, Texas, USA; Bangalore,
India; Beijing, China; Boeblingen, Germany; Moscow, Russia and Tokyo, Japan.
IBM South Africa has recently launched the IBM South Africa Linux Centre of Competence, a
facility sponsored by IBM and Business Partners which will assist customers to integrate
Linux solutions in their businesses, give access to experienced technical personnel for Linux
education, training and support and provide information on Linux related solutions, offerings
and future directions. The Centre will provide access to IBM OpenPower Systems, Power5
based Linux only servers which make the Linux value proposition unbeatable.
Customers are rapidly embracing Linux for its flexibility, cost efficiencies and choice – and
now, in increasing numbers, for its security. Not only is Linux the fastest growing operating
system, it is the open source foundation for IBM's On Demand customer solutions. Linux has
moved well beyond file-serving, print-serving and server consolidation and is now deployed in
mission-critical and industry-specific applications. All industries are enjoying improved
services delivery by implementing Linux-based business solutions.
Linux and IBM are changing the IT landscape, challenging the very nature of application
development and the economics of application deployment and ensuring customers get the
best, most cost-effective solutions for their business with appropriate and cost effective
hardware and superior support, training and infrastructure. IBM is fully committed to open
standards in general, to Linux in particular and to being the leader in providing Linux solutions
that work. All IBM hardware servers are able to run Linux and some Power5 servers only run
Linux. IBM has over 300 Linux compatible software products and IBM continues to support
the ongoing development of Linux solutions and applications.
IBM South Africa has targeted seven focus areas for Linux:

The South Africa Linux Centre of Competence will continue to offer deep, specialized, onsite skills to customers and Business Partners

Linux for infrastructure solutions such as e-mail and file-, print- and Web serving

Linux on IBM Clusters and Blades for affordable scalability that can easily be deployed
and managed

Linux for workload consolidation to allow customers to eliminate proliferating server farms,
reduce costs, use resources efficiently and simplify IT management.
Copyright 2005@
IBM / CHPC / Confidential
Page
39

continue to build industry applications - like branch automation or point-of-sale solutions –
which deliver specific business solutions with the flexibility, adaptability and costeffectiveness of Linux. The rapid adoption of Linux by governments worldwide exemplifies
our success in this focus area.

Support ValueNet, a network of IBM partners creating repeatable solutions, from problem
identification to solution implementation and support.

IBM Global Services will lead our Linux client focus, concentrating on delivering choice in
desktop and browser-based Linux solutions.
References
IBM Linux:
http://www-1.ibm.com/linux/
Linux Centres of Competency:
http://www-1.ibm.com/linux/va_4057.shtml
Linux on Power servers:
http://www-1.ibm.com/servers/eserver/linux/power/?P_Site=Linux_ibm
PartnerLens for Business Partners:
http://www-1.ibm.com/linux/va_12.shtml
Linux for developers:
http://www-1.ibm.com/linux/va_4068.shtml
IBM Executive Briefing centres:
http://www-1.ibm.com/servers/eserver/briefingcenter/
Copyright 2005@
IBM / CHPC / Confidential
Page
40
9 Technology considerations for a new HPC
centre
The choice of which technologies are most suitable for CHPC will depend crucially on
–
the type of applications to be run
–
the level of scaling required – are grand challenge problems to be run
–
is capability or capacity more important
–
the number of different applications to be run
–
the number and distribution of the users
–
the level and skill of system management support
–
the level and skill of user support
–
the required integration of the central facility with remote systems in universities
Traditional clusters such as pSeries systems use servers with processors designed for HPC
and computationally intensive tasks, with high memory bandwidths. The servers are
interconnected with high performance switches providing high bandwidth and low latency
communications across the entire system, maintaining the performance as the number of
processors is increased.
Linux clusters built with commodity components minimise the purchase cost of the system but
the performance of the cluster is influenced to a greater or lesser extent by each component
within the system, from hardware to software. Commodity clusters usually use PCI based
interconnects and bottlenecks here often place the major limitation on the cluster
performance.
pSeries servers coupled with a mature operating system such AIX today tend to provide
greater levels of reliability, availability and serviceability than Linux clusters, and the better
management tools integrated in the system make it easier to provide a continuous and
flexible service for multiple users and applications. While Linux capability is continually
improving, achieving high availability remains a significant challenge with the largest
systems.
The key question is “what applications are to be run?” Many applications such as scientific
codes, engineering analyses, weather forecasting etc require both powerful processors, with
adequate memory bandwidth to ensure the processor is fed with data, and high bandwidth,
low latency interconnects to provide efficient exchange of the large quantity of data moved
during the computation. For these applications, purpose built computational servers, with
their high memory bandwidth interconnected with high performance switches will provide the
best price/performing solution, especially when the problem size is large.
For other applications, like life sciences, monte carlo analysis and high energy physics
applications, Linux clusters can provide excellent performance as fewer demands are placed
on the system. There is a range of embarrassingly parallel applications which place little or
no demand on interprocessor communication and for which Linux clusters are ideally suited.
While there is little doubt that in future years, HPC systems like Blue Gene will be needed to
reach very high performances, the most appropriate use for Blue Gene today is likely to be for
Copyright 2005@
IBM / CHPC / Confidential
Page
41
preparing for that change rather than for providing a general purpose national HPC production
service.
Customers who need a to run a variety of applications therefore tend to have three
alternatives, although the boundaries between them are not rigid:
–
a pSeries cluster running AIX and equipped with a high performance switch. This general
purpose machine will be able to run the widest range of applications although it will be
“over-configured” for the less demanding codes. Its better management tools and its
integral design for reliability, availability and serviceability will make it easier to operate and
provide a heavily used production facility. Examples of national research HPC systems
running pSeries clusters include HPCx, the UK National Academic Supercomputer and
NERSC, the flagship scientific computing facility for the Office of Science in the U.S.
Department of Energy, a world leader in accelerating scientific discovery through
computation.
–
a Linux cluster of commodity processors and industry standard interconnect. If the bulk of
the applications are Linux cluster suitable, this system will provide excellent price
performance for those applications but is not likely to perform well on the more demanding
applications, especially capability or grand challenge problems, and is likely to be more
demanding on the support staff to operate and manage as a production facility. An
example of a national research facility running a Linux Cluster is Mare Nostrum at the
Barcelona National Supercomputing Centre, Spain.
–
a heterogeneous system of both pSeries and Linux clusters. This option can often be the
optimum solution as regards running the applications, for the codes can be run on the
most suitable platforms. Both pSeries and Linux clusters run the same IBM software and
systems can be installed as autonomous, or with common file systems and job scheduling,
or fully grid enabled. There are some restrictions on its use, and the largest problem
which can be run effectively will be limited by the largest single machine. There will be
increased support effort as two systems have to be managed and operated, and users
may feel it necessary to port, and then need to support, their codes to both systems. An
example of a national research facility running both pSeries and Linux clusters is the
Lawrence Livermore National Laboratory in the USA.
It is only by studying the application characteristics and balancing them against the
constraints, particularly the available support skills, that an effective choice can be made.
IBM has Deep Computing on Demand sites which can be used for running applications prior
to deciding, and has domain experts in all areas of HPC and site operations able to assist in
this analysis. IBM can also facilitate study tours, visits and exchanges for CHPC to suitable
HPC sites.
Grid enablement of central and remote systems

Whatever system CHPC select for their central HPC facility, significant benefits will
flow from grid enabling it and other systems in universities and industry to enable
data to be accessed and shared and jobs to be submitted to any resource within
the grid. Each site will decide which data is to be made visible, and how much
compute resource is to be made available, and authorised users will be able to
see all the data and submit jobs to be run on the available machines. A user in
Johannesburg could use an input dataset on a system in Durban, run the job on a
machine in Capetown and allow the results to be viewed and graphically analysed
in Pretoria, with everything taking place transparently.

Copyright 2005@
IBM / CHPC / Confidential
Page
42
IBM has worked extensively with customers to implement grid enabled HPC systems across
national and international boundaries. Two such examples are Distributed European
Infrastructure for Supercomputing Applications, DEISA and ScotGrid.
Distributed European Infrastructure for Supercomputing
Applications
Phase 1 of the DEISA AIX super-cluster has grid enabled four IBM Power4 platforms in FZJJülich (Germany, 1312 CPU, 8.9 TFLOPS); IDRIS-CNRS (France, 1024 CPU, 6.7 TFLOPS);
RZG–Garching (Germany, 896 processors, 4.6 TFLOPS) and CINECA (Italy, 512 processors,
2.6 TFLOPS). This super-cluster is currently operating in pre-production mode with full
production expected in mid 2005, when it is expected that CSC (Finland, 512 CPU, 2.2
TFLOPS) will be added.
DEISA's fundamental integration concept is transparent access to remote data files via a
global distributed file system under IBM's GPFS. Each of the national supercomputers above
is a cluster of autonomous computing nodes linked by a high performance network. Data files
are not replicated on each computing node but are unique and shared by all. A data file in the
global file system is “symmetric” with respect to all computing nodes and can be accessed,
with equal performance, from all of them. A user does not need to know (and in practice does
not know) on which set of nodes his application is executed.
The IBM systems above run IBM’s GPFS (Global Parallel File System) as a cluster file
system. The wide area network functionality of GPFS has enabled the deployment of a
distributed global file system for the AIX super-cluster as shown below:
Applications running on one site can access data files previously “exported” from other sites
as if they were local files. It does not matter in which site the application is executed and
applications can be moved across sites transparently to the user.
GPFS provides the high performance remote access needed for the global file system.
Applications running on the pre-production infrastructure operate at the the full 1 Gb/s
Copyright 2005@
IBM / CHPC / Confidential
Page
43
bandwidth of the underlying network when accessing remote data via GPFS. This
performance can be increased by increasing the network bandwidth as is demonstrated by
the 30 Gb/s TeraGrid network in the USA, where GPFS runs at 27 Gb/s in remote file
accesses.
DEISA's objective is to federate supercomputing environments in a supercomputing grid that
will include, in addition to the supercomputing platforms mentioned above, a number of data
management facilities and auxiliary servers of all kinds. The DEISA Supercomputing Grid can
be seen as a global architecture for a virtual European supercomputing centre.
ScotGrid
IBM UK has worked with the universities of Glasgow, Edinburgh and Durham to implement
ScotGrid, a three-site Tier-2 centre consisting of an IBM 200 CPU Monte Carlo production
facility run by the Glasgow Particle Physics Experimental (PPE) group, an IBM 24 TB
datastore and associated high-performance server run by Edinburgh Parallel Computing
Centre and the Edinburgh PPE group and a 100 CPU farm at Durham University Institute for
Particle Physics Phenomenology. ScotGrid was funded for the analysis of data primarily from
the ATLAS and LHCb experiments at the Large Hadron Collider and from other experiments
and is now providing IBM solutions for Grid Computing in Particle Physics, Bioinformatics as
well as Grid Data Management, medical imaging and device modelling simulations.
ScotGrid
References
HPCx:
http://www.hpcx.ac.uk
Barcelona Supercomputing Centre:
http://www.bsc.es/
LLNL:
http://www.llnl.gov
Copyright 2005@
IBM / CHPC / Confidential
Page
44
ScotGrid:
http://www.scotgrid.ac.uk/
DEISA:
http://www.deisa.org
http://www.deisa.org/grid/architecture.php
Copyright 2005@
IBM / CHPC / Confidential
Page
45
10 Skills development, training, and building
human capacity
Critical to the establishment of a new HPC facility, particularly a national one being
established for the first time, is a comprehensive programme to enable skilling and skills
transfer, as well as an ongoing process for generating a widespread skills base for supporting,
using and exploiting HPC technologies. The key aspects to IBM’s approach to this are as
follows:
Programme Management
This role coordinates and manages the various ‘streams’ of IBM activity - from implementation
support, through to technical training, assisting with access to skills for application tuning and
optimisation, visualisation, file management, scheduling, grid enablement, etc, issue
management, and through to facilitating networking in the HPC world. Part of the Programme
Management role is to ensure that Skills Transfer is an actively managed aspect of the
programme. The IBM approach is to appoint a suitable IBM Programme Manager familiar with
other HPC environments, ahead of taking on a new HPC Programme for the first time.
Support Services
The range of support services that IBM delivers is wide, as per the needs of HPC. These
include:
 IBM hardware and software maintenance and technical support. These are the
more traditional services associated with IBM’s products, and IBM has significant depth
of resources in most geographies for this.
 Specialist Technical Support. This is for the HPC specific components of both
hardware and software, and includes File System (GPFS) tuning and support,
Visualisation Tools (e.g. DCV) support, Job Scheduling, Application Tuning assistance,
etc. IBM trains local IBM technical engineers to act as first level support in most of the
critical support areas.
 On-Site Services. Depending on the skills maturity level and other considerations of
the HPC site, IBM can place on site personnel to support the HPC, as well as deliver
an on site skills transfer function.
 Education and Training. Formal education is run on site (depending on the customer
numbers and needs) or at IBM sites. An education plan would be kept up to date as
part of the overall programme.
 Hands On Internships and Exchanges. One aspects of IBM HPC networking
program is to help facilitate cross skilling between organisations, especially where
there is value in reciprocity. Technical and Management personnel in a new HPC
centre often gain rapid skill upgrades through short term Internships.
 HPC Networking. As covered in Section 6, IBM is active in facilitating networking
across HPC organisations through conferences, seminars, user groups and one-to-one
meetings. In effect, it is a strategically important ecosystem which helps all user
communities to keep up with the latest thinking, innovation, best of breed approaches,
and other learning opportunities. IBM is widely recognised as the leader in such
networking facilitation across many different customers.
 IBM HPC Study Tours. These are tours organised for one organisation or indeed a
number of organisations. Typically they cover a variety of exposures to ‘HPC in Action’
Copyright 2005@
IBM / CHPC / Confidential
Page
46
in IBM’s own Research Facilities, in leading HPC sites, in IBM Technical Specialist
Areas, and can include visits to IBM partners such as Intel and AMD.
With the CHPC of South Africa in mind...
A comprehensive Skills Development and Capacity Building Plan would be workshopped and
developped with the CHPC and its stakeholders. IBM has the building blocks with which to
create a meaningful and effective plan. Many of the building blocks are mentioned above and
need to be tailored to the unique needs of the CHPC.
There may also be some needs that CHPC in S Africa has, where special emphasis needs to
be placed on some needs in relatively greater terms. For example, these could include a
tailored education program aimed at enabling computational scientists to increase their HPC
skills. Or another could be that aside from the skills in the CHPC itself, there are a
geographically dispersed set of users who are very much part of the CHPC ecosystem that
also need a skilling program.
The key is that IBM has the breadth and depth of people and skills to be best positioned to
develop a comprehensive programme in conjunction with the CHPC. We recommend that this
area of the overall CHPC program receives special emphasis, as building a sustainable
human capacity for HPC exploitation is a foundation for a highly successful HPC facility.
Copyright 2005@
IBM / CHPC / Confidential
Page
47
11 Assisting with the Promotion and Exploitation
of the CHPC
Whilst the fundamentals of establishing a sound HPC operation are paramount and indeed
the first priority (including operational effectiveness, good governance, technical support and
many other factors), once well established, optimising the usage of CHPC will become a key
aspect.
The user audience is likely to include university and academic researchers and users, but
also other Users from both Public and Private Sector ‘customers’.
IBM can assist in a number of ways to help promote the CHPC in these communities. These
include:
 Market Awareness through Media communications.
 Marketing Events
 Target Account Planning - IBM has a wide Client Management team coverage
throughout South and Central Africa. Certain industries have very special HPC needs
and these can be jointly targeted, particularly through linkage with the IBM Client
teams.
 Networking Contacts linkages and exploitation.
IBM has strength in two relevant areas, namely (a) partnering in various ways and (b)
marketing skills and methods. Once again, we recommend that these initiatives are made a
formal part of the overall Program Management.
Copyright 2005@
IBM / CHPC / Confidential
Page
48
Trademarks and Acknowledgements
The following terms are registered trademarks of International Business Machines
Corporation in the United States and/or other countries: AIX, AIX/L, AIX/L(logo), Blue Gene,
BladeCenter, IBM, IBM(logo), ibm.com, LoadLeveler, POWER, POWER4, POWER4+,
POWER5, POWER5+, POWER6, Power Architecture, POWERparallel, PowerPC,
PowerPC(logo), pSeries, Scalable POWERparallel Systems, Tivoli, Tivoli(logo).
A full list of U.S. trademarks owned by IBM may be found at:
http://www.ibm.com/legal/copytrade.shtml.
UNIX is a registered trademark in the United States, other countries or both.
Linux is a registered trademark of Linus Torvalds in the United States, other countries or both.
Intel, Itanium and Pentium are registered trademarks and Intel Xeon and MMX are
trademarks of Intel Corporation in the United States and/or other countries
AMD Opteron is a trademark of Advanced Micro Devices, Inc.
Microsoft, Windows, Windows NT and the Windows logo are registered trademarks of
Microsoft Corporation in the United States and/or other countries.
Other company, product and service names may be trademarks or service marks of others.
Copyright 2005@
IBM / CHPC / Confidential
Page
49
Addendum
IBM UK has been registered to the requirements of ISO9001 and to the DTI-approved TickIT
guidelines since August 1991. This registration is now part of the single certificate issued by
BVQI for the totality of IBM Sales and Services in Europe, Middle East and Africa (EMEA).
The Certificate number is 82346.9.2 and the Scope covers all activities culminating in the
provision of IT solutions, including design, marketing, sales, support services, installation and
servicing of computer systems and infrastructure (hardware, software and networks),
management of computer centres and outsourcing services, help desk for computer users,
and the provision of IT & management consultancy and services.
Additionally, IBM manufacturing and hardware development locations worldwide are
registered to the requirements of ISO 14001 and are part of a single certificate issued by
BVQI. The Certificate number is 43820 and the scope covers development and manufacture
of information technology products, including computer systems, software, storage devices,
microelectronics technology, networking and related services worldwide.
IBM is a registered trademark of International Business Machines Corporation. All other
trademarks are acknowledged.
All the information, representations, statements, opinions and proposals in this document are
correct and accurate to the best of our present knowledge but are not intended (and should
not be taken) to be contractually binding unless and until they become the subject of
separate, specific agreement between the parties. This proposal should not be construed as
an offer capable of acceptance.
The information contained herein has been prepared on the basis that the agreement entered
into between the parties as a result of further negotiations will be based on the terms of the
IBM Customer Agreement.
This proposal is valid for a period of 30 days.
If not otherwise expressly governed by the terms of a written confidentiality agreement
executed by the parties, this proposal contains information which is confidential to IBM and is
submitted to CHPC on the basis that it must not be used in any way nor disclosed to any
other party, either whole or in part. The only exception to this is that the information may be
disclosed to employees or professional advisors of CHPC where such disclosure is on a need
to know basis, and is for the purpose of considering this proposal. Otherwise disclosures may
not take place without the prior written consent of IBM.
These Services do not address the capability of your systems to handle monetary data in the
Euro denomination. You acknowledge that it is your responsibility to assess your current
systems and take appropriate action to migrate to Euro ready systems. You may refer to IBM
Product Specifications or IBM's Internet venue at: http://www.ibm.com/euro to determine
whether IBM products are Euro ready.
Copyright 2005@
IBM / CHPC / Confidential
Page
50
Download