HPC_Cluster_Specific..

advertisement
Fyzikální ústav AV ČR, v. v. i.
Na Slovance 2
182 21 Praha 8
eli-cz@fzu.cz
www.eli-beams.eu
HPC - Computing cluster for the ELI
project
1. General specifications
Specifications
HPC cluster
Specifications solutions
1
Parameters
Minimum Requirements
Goal
Compute nodes for High Performance Computing cluster
Rack
Rack 19” with rack mount kits
Number of nodes
min. 56
Architecture
x86-64
Number of cores per
At least 16 physical CPU cores (hyperthreading not taken into account)
node
8 GB / per core, total cluster RAM min. 7.2 TB, ECC DDR3 1866 MHZ
RAM
(or faster)
HDD
At least 128 GB 2,5” SSD SATA III or SAS for OS and swap
CPU
Minimum SPECfp2006 rate baseline for one node: 530
GPU
Without GPU
MIC
Without MIC
Inf. Connection
1 x InfiniBand QDR or FDR port / per node
LAN Connection
1 x 1Gbit RJ45 port / per node (with PXE booting support)
OS
Open source Linux compatible with CentOS, Scientific Linux or Debian
Specifications
Login and master nodes for HPC cluster
Specifications solutions
Parameters
Minimum Requirements
2
Goal
Login and master nodes for High Performance Computing
Rack
Rack 19” with rack mount kits
Number of nodes
Architecture
CPU
RAM
Inf. Connection
At least 3 (physical servers)
x86-64
Same like compute nodes
Same like compute nodes
Same like compute nodes
1 x 10Gbit RJ45 port / per node and 1 x 1Gbit RJ45 port / per node
(PXE booting support)
Each node at least 2 local drives with capacity 500 GB, 15krpm, RAID1
Redundant, hot-swap version (power supply, RAID, etc.)
LAN Connection
HDD
Redundancy
OS
Open source Linux compatible with CentOS, Scientific Linux or Debian
Specifications
Storage systems for HPC cluster
Specifications solutions
Parameters
Minimum Requirements
Goal
Storage for High Performance Computing
Rack
Rack 19” with rack mount kits
3
SCRATCH system
The storage system consists of HOME storage for user data and
SCRATCH storage for temporary data and intermediate results
At least 192 TB net usable capacity – actually usable by a user
Actually achievable sustainable aggregate speed of sequential
operations for 256KB block 800 MB/s for reading and 500 MB/s
NFSv4 with Kerberos support
At least 192 TB net usable capacity – actually usable by a user
Actually achievable sustainable aggregate speed of sequential
operations for 256KB block 1400 MB/s for reading and 800 MB/s
Parallel file system (e.g. Lustre, GPFS or similar)
Specifications
Front-end for the storage system
Storage system
HOME capacity
HOME speed
HOME system
SCRATCH capacity
SCRATCH speed
Specifications solutions
Parameters
Minimum Requirements
4
Goal
Front-end servers for the HOME and SCRATCH storage systems
Rack
Rack 19” with rack mount kits
Number of nodes
At least 3 (physical servers), 1 active front-end for each storage
system (HOME, SCRATCH) and at least one passive fail-over
x86-64
Minimum SPECint2006 rate baseline for one node: 420
128 GB RAM ECC for each node
Architecture
CPU
RAM
Inf. Connection to the
min. 2 x InfiniBand QDR or FDR (same like compute nodes) links
HPC cluster
1 x 10Gbit RJ45 port / per node and 1 x 1Gbit RJ45 port / per node
LAN Connection
(PXE booting support)
HDD
Each node at least 2 local drives with capacity 300 GB, 10krpm, RAID1
Redundancy
Redundant, hot-swap version (power supply, RAID, etc.)
OS
Open source Linux compatible with CentOS, Scientific Linux or Debian
Specifications
Infrastructure for HPC cluster
Specifications solutions
Parameters
Goal
Rack
Dimension
Connections –
5
Minimum Requirements
Infrastructure for High Performance Computing
The whole system must fit within 2 racks, which must be included in
the offer together with rack mount kits.
42-48U, 600 or 800 x 1200 mm with cooling backdoor compatible with
FzÚ (IoP) water cooling system
- InfiniBand switches for connecting all nodes (compute, admin,
2|5
InfiniBand
Connections – LAN
network
Power supply
login and storage system front-end servers), InfiniBand
connection between core switches
- QDR or FDR InfiniBand technology
- Cables must be included
- LAN switch for connecting all nodes and data storage
(management)
- Internal network connection 1Gbit (metallic or fiber)
- Outside connection through login and admin and storage frontend (HOME, SCRATCH) nodes min. 4 x 1Gbit RJ-45 (metallic)
- Outside connection through login and admin and storage frontend (HOME, SCRATCH) nodes min. 4 x 10Gbit SFP+ (fiber)
- full FzÚ (IoP) network compatible (LAN management and
scripting)
- Cables for internal connection must be included
Maximum power supply of all HPC cluster parts at full operation
(including compute nodes, whole storage system with front-end
servers, switches, login and admin nodes, fans and all other electrical
components) must be less than 40kW. The maximum power supply
must be explicitly stated including the calculation of it, which should
be done as follows: Add up the nameplate power (or the maximum
power consumption provided by the manufacturer) of all anticipated
components. If the manufacturer of the component does not state the
wattage, it can be determined by multiplying the current (in Amps) by
the voltage (in Volts) of the device to get the VA, which approximates
the amount of watts the device will consume.
Additional parameters of the HPC cluster:
1. Data speed/capacity are stated using the units
1 TB = 1000000000000 bytes
1 GB = 1000000000 bytes
1 MB = 1000000 bytes
1 Gbit = 1000000000 bits
2. The hardware components must be identical in all compute nodes (including memory modules).
3. Redundancy is required if some hardware components are shared by several compute nodes.
Namely, no more than 2 nodes can fail in the case of failure of a single hardware component. In the
case of blade servers, it must be possible to replace individual components (switches, servers,
power supply, etc.) during operation.
4. All memory channels of all processors must be filled. The same number of DIMM modules must
occupy each channel. All DIMMs in all nodes must be identical.
5. MLC technology is acceptable for internal node SSD drives. The linear read and write speed must be
at least 500MB/s. Each SSD drive must provide at least 50000 IOPS for random read and write.
6. Network operation system booting must be supported as well as local booting from external drive. It
must be possible to set the sequence of booting devices.
7. All compute nodes must be connected through InfiniBand in the non-blocking fat tree configuration.
8. Access to the console of each node must be provided through one central location (single monitor +
keyboard).
9. The mainboard must contain management controller - BMC compatible with IPMI 2.0 or higher,
remote power management, monitoring fans and CPU and mainboard temperatures.
10. All HW components must be supported in the kernel or using external driver with source code
provided.
11. The software installed on the HPC cluster must include compilers – gcc and Intel, libraries OpenMPI, MVAPICH2, Intel MPI. Module environment is required.
12. Open source or proprietary software for system management and administration, scalable distributed
computing management and provisioning tools that provide a unified interface for hardware control,
3|5
discovery, and OS diskfull/diskfree deployment. In the case of proprietary software, the price of the
software its license, support including solving software conflicts and usability problems, and
supplying updates and patches for bugs and security holes for at least 3 years, must be included.
The documentation for all the software must be included and must be in English.
13. The results of performance tests must be supplied. The performance can be demonstrated providing
official results from www.spec.org on equivalent system or by running the benchmark on one of the
supplied compute nodes.
14. The supplier must verifiably and reproducibly demonstrate that the cluster meets the specified
performance parameters during the acceptance tests.
15. In case of failure to achieve the specified performance, the supplier will have an option to optimize
HW or SW so that the system reaches stated performance, but the acceptance protocol will not be
signed until the stated performance is achieved.
Additional specification of the storage system:
1. Each part of the storage system (HOME+SCRATCH) must be connected to the cluster through its
own front-end server. Another front-end server is required as a fail-over.
2. The front-end servers must have identical hardware and must be integrated in the InfiniBand
infrastructure.
3. The front-end servers InfiniBand and 10 Gbit interface can be on the same card, but it must be
possible to use both of them at the same time.
4. Access to the console of each front-end server must be provided through one central location (single
monitor + keyboard).
5. The mainboards of front-end servers must contain management controller - BMC compatible with
IPMI 2.0 or higher, remote power management, monitoring fans and CPU and mainboard
temperatures.
6. All HW components of all front-end servers must be supported in the kernel or using external driver
with source code provided.
7. In the case of HOME storage, the front-end server must export NFSv4 and support Kerberos
authentication. NFSv4 can reexport other filesystem.
8. Data speed/capacity are stated using the units
1 TB = 1000000000000 bytes
1 GB = 1000000000 bytes
1 MB = 1000000 bytes
1 Gbit = 1000000000 bits
9. The determination of the net usable capacity of a data storage solution must be stated for the
proposed/delivered configuration designed for the standard operation and must not be based on
presumptions, which cannot be ensured or which restrict the use of the data storage or other data
storage solutions or which do not comply with the requirements or possible interests of the Client.
10. The determination of the net usable capacity must not count on or take into account system features
or its components as potential additional space for data storage based upon presumptions, which
cannot be ensured (compression, deduplication, etc.) or to allocate more space than it is physically
possible or actually feasible without the need for other actions (oversubscription).
11. The tools, solutions used to determine the capacity must provide credible information and must work
with a known size of a data block or a known and accurate unit.
12. The determination of data storage speed must be stated for the proposed/delivered configuration
designed for standard operation (with full storage capacity). It must not be based on presumptions
which cannot be ensured or which restrict the use of the data storage or other data storage solutions
or which do not comply with the requirements or possible interests of the Client. For example the
performance of HOME data storage must not be influenced in any manner by the use of SCRATCH
data storage.
13. The determination of speed must not be based on a presumption of specific favourable conditions or
a specific favourable measurement mode (e.g. cache operations), unless such conditions or a mode
are explicitly required or stated.
14. Data storage solution components – disks, power supplies, RAID’s, switches, servers must be
replaceable during operation without causing any failure of the data storage operation.
15. High density of the storage system, average density at least 5TBit per 1U including all components
of the storage system.
16. The HOME storage solution’s disk fields must ensure data protection. RAID6 in configuration 16+2
(or better) or using equivalent technology with the same level of protection (number of parity drives).
4|5
17. The SCRATCH storage solution’s disk fields must ensure data protection. RAID5 in configuration
16+1 (or better) or using equivalent technology with the same level of protection (number of parity
drives).
18. The HOME storage RAID6 array may consist of groups connected together on the front-end server,
but all groups must have the same configuration and each RAID group must be realized using
external controller.
19. The SCRATCH storage RAID5 array may consist of groups connected together on the front-end
server, but all groups must have the same configuration and each RAID group must be realized
using external controller.
20. At least 4 GB write-back cache is required for all hardware RAID controllers.
21. The configuration of HOME storage must allow rebuild within 48 hours during standard operation
(performance decrease is acceptable).
22. At least 6 hot spare drives must be included.
23. All hard drives in the HOME and SCRATCH storage must be of the same type and size.
24. Components of network SCRATCH and HOME can be shared, but it is necessary to maintain the
desired performance parameters while testing both part of the storage system at once.
25. The speed of the HOME and SCRATCH storage systems will be measured for writing of a large data
file from 8 clients from a single compute node. It will be determined using:
iozone -t 8 -Mce –s1000g -r256k -i0 -i1 –F file1 file2 file3 file4 file5
file6 file7 file8
Important are the results „Children see throughput for 8 initial writers“ (for writing) and „Children see
throughput for 8 readers“ (for reading). The results of all tests must be supplied for iozone version
3.347 (http://www.iozone.org).
26. Open source or proprietary are for storage system management and administration including
filesystems. In the case of proprietary software, the price of the software its license, support
including solving software conflicts and usability problems, and supplying updates and patches for
bugs and security holes for at least 3 years, must be included. The documentation for all the
software must be included and must be in English.
27. The supplier must verifiably and reproducibly demonstrate that the cluster meets the specified
performance parameters during the acceptance tests.
28. In case of failure to achieve the specified performance, the supplier will have an option to optimize
HW or SW so that the system reaches stated performance, but the acceptance protocol will not be
signed until the stated performance is achieved.
5|5
Download