CENG 546
Dr. Esma Yıldırım


A computing cluster consists of a collection of
interconnected stand-alone/complete computers, which
can cooperatively working together as a single,
integrated computing resource. Cluster explores
parallelism at job level and distributed computing with
higher availability.
A typical cluster:



Merging multiple system images to a SSI
(single-system image ) at certain functional levels.
Low latency communication protocols applied
Loosely coupled than an SMP with a SSI
Copyright © 2012, Elsevier Inc. All rights reserved.
2-2


It is a distributed/parallel computing system
It is constructed entirely from commodity subsystems
All subcomponents can be acquired commercially and separately
 Computing elements (nodes) are employed as fully operational
standalone mainstream systems


Two major subsystems:
Compute nodes
 System area network (SAN)





Employs industry standard interfaces for integration
Uses industry standard software for majority of services
Incorporates additional middleware for interoperability among
elements
Uses software for coordinated programming of elements in parallel
3
 Cluster: A network of computers supported
by middleware and interacting by message passing

PC Cluster (Most Linux clusters)

Workstation Cluster (NOW, COW)

Server cluster or Server Farm

Cluster of SMPs or ccNUMA systems

Cluster-structured massively parallel processors
(MPP) – about 85% of the top-500 systems
Copyright © 2012, Elsevier Inc. All rights reserved.
2-4
Copyright © 2012, Elsevier Inc. All rights reserved.
2-5





System availability (HA) : Cluster offers inherent high system
availability due to the redundancy of hardware, operating
systems, and applications.
Hardware Fault Tolerance: Cluster has some degree of
redundancy in most system components including both
hardware and software modules.
OS and application reliability : Run multiple copies of the OS
and applications, and through this redundancy
Scalability : Adding servers to a cluster or adding more clusters
to a network as the application need arises.
High Performance : Running cluster enabled programs to yield
higher throughput.
Copyright © 2012, Elsevier Inc. All rights reserved.
2-6


The ability to deliver proportionally greater sustained performance
through increased system resources
Strong Scaling
Fixed size application problem
 Application size remains constant with increase in system size


Weak Scaling
Variable size application problem
 Application size scales proportionally with system size


Capability computing
in most pure form: strong scaling
 Marketing claims tend toward this class



Capacity computing

Throughput computing

Includes job-stream workloads

In most simple form: weak scaling
Cooperative computing
Interacting and coordinating concurrent processes
 Not a widely used term
 Also: “coordinated computing”

7



Peak floating point operations per second (flops)
Peak instructions per second (ips)
Sustained throughput
Average performance over a period of time
 flops, Mflops, Gflops, Tflops, Pflops
 flops, Megaflops, Gigaflops, Teraflops, Petaflops
 ips, Mips, ops, Mops …


Cycles per instruction



Memory access latency


cpi
Alternatively: instructions per cycle, ipc
cycles per second
Memory access bandwidth



bytes per second (Bps)
bits per second (bps)
or Gigabytes per second, GBps, GB/s
8







I/O Interface
Memory Interface
Cache hierarchy
Register Sets
Control
Execution pipeline
Arithmetic Logic
Units
9




A general class of system
Integrates multiple processors in to an interconnected ensemble
MIMD: Multiple Instruction Stream Multiple Data Stream
Different memory models

Distributed memory
 Nodes support separate address spaces

Shared memory
 Symmetric multiprocessor
 UMA – uniform memory access
 Cache coherent

Distributed shared memory
 NUMA – non uniform memory access
 Cache coherent

PGAS
 Partitioned global address space
 NUMA
 Not cache coherence

Hybrid : Ensemble of distributed shared memory nodes
 Massively Parallel Processor, MPP
10



MPP
General class of large scale multiprocessor
Represents largest systems
IBM BG/L
 Cray XT3


Distinguished by memory strategy
Distributed memory
 Distributed shared memory

 Cache coherent
 Partitioned global address space


Custom interconnect network
Potentially heterogeneous

May incorporate accelerator to boost peak performance
11
12
13
Built jointly by IBM and LLNL teams and
funded by US DoE ASCI Research Program
Copyright © 2012, Elsevier Inc. All rights reserved.
2 - 14


Building block for large MPP
Multiple processors



Uniform Memory Access (UMA) shared
memory


2 to 32 processors
Now Multicore
Every processor has equal access in equal time to
all banks of the main memory
Cache coherent

Multiple copies of variable maintained consistent by
hardware
15
16
MP
MP
MP
MP
L1
L2
L1
L2
L1
L2
L1
L2
L3
M1
Legend :
MP : MicroProcessor
L1,L2,L3 : Caches
M1.. : Memory Banks
S : Storage
NIC : Network Interface Card
L3
M1
Controller
Mn-1
S
S
NIC
PCI-e
JTAG
Ethernet
Peripherals
USB
NIC
17
Distributed Shared Memory- Non-uniform memory access
18
•
•
•
An ensemble of N nodes each
comprising p computing elements
The p elements are tightly bound
shared memory (e.g., smp, dsm)
The N nodes are loosely coupled, i.e.,
distributed memory
•
•
4X
4X
4X
p is greater than N
Distinction is which layer gives us the
most power through parallelism
4X
16X
4X
4X
4X
4X
System Area Network
4X
16X
System Area Network
4X
4X
4X
4X
4X
4X
4X
64 Processor Commodity
Cluster
16X
16X
64 Processor
Constellation
19
Science Problems : Environmental Modeling, Physics, Computational Chemistry, etc.
Application : Coastal Modeling, Black hole simulations, etc.
Algorithms : PDE, Gaussian Elimination, 12 Dwarves, etc.
Model of Computation
Program Source Code
Programming Languages: Fortran, C, C++ , UPC, Fortress, X10, etc.
Compilers : Intel C/C++/Fortran Compilers, PGI C/C++/Fortran, IBM XLC, XLC++, XLF, etc.
Runtime Systems : Java Runtime, MPI etc.
Operating Systems : Linux, Unix, AIX etc.
Systems Architecture : Vector, SIMD array, MPP, Commodity Cluster
Firmware : Motherboard chipset, BIOS, NIC drivers,
Microarchitectures : Intel/AMD x86, SUN SPARC, IBM Power 5/6
Logic Design : RTL
Circuit Design : ASIC, FPGA, Custom VLSI
Device Technology : NMOS, CMOS, TTL, Optical
20
21
22