Practical Considerations in Building Beowulf Clusters


"Practical Considerations in Building Beowulf Clusters"

Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering

Poor light-socket coordination

Parallel Computing Architectures

• 1. (not parallel) Fastest Possible serial – a. Make it complex – b. Limits • 2. Old superscalar, vector Crays, etc.

• 3. Silicon graphics shared memory (<64 CPUs) • 4. Intel shared memory: 2-32 processor servers • 5. Distributed memory: “Beowulf” clusters • 6. Biggest D.m.: NEC SS6 “Earth Simulator”

Building a Beowulf cluster

+ glue → Cluster ?

Some Design Considerations

• 1. Processor type and speed • 2. Single or Dual • 3. Type of memory • 4. Disk topology • 5. Interconnection technology • 6. Physical packaging • 7. Reliability

“Just a bunch of ordinary PCs”

• But to be reliable, more must be watched.

– Power supplies – Fans – Motherboard components – Packaging layout – Heat dissipation – Power quality • To be cost effective, configure carefully.

– Easy to overspecify and cost >2x what is necessary – Don’t overdo the connections, they cost a lot.

– The old woman swallowed a fly. Be careful your budget doesn’t die.

1. Processor type & speed

• A. Pentium 4 Inexpensive if not leading edge speed • B. Xeon =dual processor P4. Shares a motherboard.

• C. AMD Opteron 64-bit Needed for >2GB mem.

• D. (future) Intel 64-bit Will be AMD compatible!

• E. IBM 970 (G5) True 64-bit design Apple is using • F. Intel Itanium “Ititanic” 64-bit long instruction word

Disk Topology

• 1. Disk per board • 2. Diskless + RAID

Interconnect Options

• Always a desire for way more speed than possible • Latency is ultimately an issue of light speed • Existing options: • 1. Ethernet, including Gigabit Switched • Very Robust, by Dave Boggs EECS’72 • Affordable, even at Gigabit • 2. Infiniband Switched • 3. Proprietary: Myrinet, Quadrics, Dolphin • Various topologies, including 2&3-D meshes • Remote DMA may be transfer method • Assumes noise-free channel, may have CRC

Physical Packaging

• It’s not “rocket science,” but it takes care.

• A few equations now and then never hurt when you are doing heat transfer design.

• How convenient is it to service?

• How compact is the cluster?

• What about the little things? Lights & buttons?

• “Take care of yourself, you never know how long you will live.”


• Quality is designed-in, not an accident.

• Many factors affect reliability.

• Truism: “All PCs are the same. Buy the cheapest and save.” • Mil-spec spirit can be followed without gold plate.

• Many components and procedures affect the result.

• Early philosophy: triage of failing modules • Later philosophy: Entire cluster uptime • Consequence of long uptime: user confidence, greatly accelerated research

Benchmarks ● Not a synthetic ● 100 timesteps of Terra code (John R. Baumgardner, LANL) ● Computational fluid dynamics application ● Navier-Stokes equation with ∞ Prandtl number ● 3D spherical shell multi-grid solver ● Global elliptic problem with 174,000 elements ● Inverting and solving at each timestep Results are with Portland Group pf90 Fortran compiler on –fastsse option And with Intel release 8 Fortran: Machine baseline lowpower Router2 epiphany pntium28 opteron146 Cray design P4 2.0

P4M 1.6

Xeon 2.4

Xeon 2.2

P4 2.8 /800 AMD 2.0 NEC SX-6 Compiler Intel Portland 319s 342s 362 sec 358 sec 264s 264s 172s 160s 305 sec 312 sec 209 sec 164 sec ~50 sec


• Usually is Linux with MPI for communication.

• Could be Windows, but not many.

• Compilers optimize.

• Management and monitoring software • Scheduling software




– 1 to 4 CPU Systems


PGF77 ® PGF90 ™


FORTRAN 77 Fortran 90


High Performance Fortran ANSI and K&R C ANSI C++ with cfront compatibility features


Source code debugger


Source code performance profiler

Linux Pentium 4 32-bit/64-bit Athlon Xeon


pgf77 pgf90 pghpf pgcc pgCC pgdbg pgprof

Windows Opteron


= PGI Compilers + Open Source Clustering Software

Workstation Clusters

A turn-key package for configuration of an HPC cluster from a group of networked Linux workstations or dedicated blades

What about the future?

• Always go Beowulf if you can.

• Work on source code to minimize communication.

• Compilers may never be smart enough to automatically parallelize or second-guess the programmer or the investigator.

• Components will get faster, but interconnects will always lag processors.

Future Hardware

• No existing boards are made for clustering.

• Better management firmware is needed.

• Blade designs may be proprietary.

• They may require common components to operate at all.

• Hard disks need more affordable reliability.

• Large, affordable Ethernet switches are needed.

General advice?

• Think of clusters as “personal supercomputers.” • They are simplest if used as a departmental or small-group resource.

• Clusters too large may cost too much: – Overconfigured – Massive interconnect switches – Users can only exploit so many processors at once – Multiple runs may beat one massively parallel run.

– Think “lean and mean.”


• 1. Test these machines with your code.

• 2. Get a consultation on configuration

More are Coming

• Peter Bunge sends his greetings In anticipation of a Deutsche Geowulf 256 Processors… And many more clusters here and there.


a p p y

Computing !

But, NOT The End
