Introduction to High Performance Computing at ZIH

advertisement
Center for Information Services and High Performance Computing (ZIH)
Introduction to
High Performance Computing at ZIH
Getting started
Zellescher Weg 16
Trefftz-Bau (HRSK-Anbau) Room HRSK/151
Tel. +49 351 - 463 - 39871
Guido Juckeland (guido.juckeland@tu-dresden.de)
Agenda
Before you can get on – Paperwork
When you first get on – Using ssh, VPN, environment modules, available file
systems
Things to know about the hardware you are/will be using
Slides at: http://wwwpub.zih.tu-dresden.de/~juckel/slides
Slide 2 - Guido Juckeland
Before you can get on – Paperwork
Slide 3 - Guido Juckeland
Project Proposal
• No login without a valid HPC project!
• Every HPC user account has to associated with at least one project
• Project has to be endorsed (headed) by a Saxonian research group leader
• Applications (pdf):
http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/dienste/formulare
• Online project application:
https://formulare.zih.tu-dresden.de/antraege/antrag/antrag_form.html
• Small amount of CPU time can be granted immediately
• Proposal is peer reviewed and decided upon (Peers from all over Saxony)
• Projects have a lifetime – need to reapply for follow-up projects
Login Application
• Paperwork at:
http://www.tu-dresden.de/zih/hpc
• You need a signature of your project leader on the application
• What you get:
• ZIH-Standard Login (E-Mail Account, Personal Storage, Anti-Virus
Software, VPN Access, WLAN Access over Eduroam,…)
• Account on the HPC systems you applied for
• Automatic entry in the ZIH HPC Maillists (Announcements and Forum)
• Accounts usually expire every year at the end of October! You need to extend
your login!
When you first get on – Using ssh, VPN, environment modules,
available file systems
Slide 6 - Guido Juckeland
Access from within the TUD-Network
• You are on the TUD campus (you have an IP-Address that starts with 141.30
or 141.76)
• Simply „ssh/sftp“ to the machine address
(e.g. ssh deimos.hrsk.tu-dresden.de)
• No Web-Access or similar (so do not try mars.hrsk.tu-dresden.de in your
browser)
Access from the outside of the TUD Network
• You are sitting at a MPI or FhG (or at home)
• No direct access from outside the TUD local network (hardware firewall)
• 2 Options:
• Double ssh connection (tough for file transfers)
• First ssh to one of the central ZIH login servers (login1.zih.tudresden.de or login2.zih.tu-dresden.de) using your standard ZIH-login
• ssh to the desired HPC machine from there
• Use a ZIH VPN connection (preferred solution)
• Download and install a ZIH VPN client (more information under:
http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/
dienste/datennetz_dienste/vpn)
• Establish a VPN connection using your ZIH standard login
• Then open a ssh/sftp connection from your computer to the desired
HPC system
SSH Fingerprints der HRSK-Maschinen
mars.hrsk.tu-dresden.de:
1024 cf:89:20:a8:aa:36:3f:1f:7b:5e:f4:8e:57:99:15:35 ssh_host_dsa_key.pub
1024 1a:cc:4e:4f:ff:5f:b0:bc:25:9d:84:9f:39:12:d7:6d ssh_host_key.pub
1024 08:3b:da:02:1d:ff:a8:cf:26:27:96:16:86:07:a2:a9 ssh_host_rsa_key.pub
neptun.hrsk.tu-dresden.de:
1024 b0:0b:2c:3d:66:d9:d2:49:ec:fc:d1:89:6d:5b:4c:f7 ssh_host_key.pub
deimos10[1-4].hrsk.tu-dresden.de:
1024 48:f7:d6:37:d0:cf:b0:f4:49:67:b6:1f:c1:44:7d:9f ssh_host_dsa_key.pub
1024 5f:11:98:8a:29:20:c8:65:78:75:d7:a0:bb:d4:74:93 ssh_host_key.pub
1024 22:42:72:c6:38:57:71:03:90:72:2b:2c:72:e7:d0:cd ssh_host_rsa_key.pub
phobos.hrsk.tu-dresden.de:
1024 91:bd:d0:b0:8b:60:75:40:bc:4a:54:9d:54:2a:dc:b8 ssh_host_dsa_key.pub
1024 1b:1c:29:1f:d2:5c:a9:0b:ac:e6:cf:28:1c:4f:92:8f ssh_host_key.pub
1024 b8:14:54:9a:f5:06:f8:d5:da:cb:51:a8:21:fb:db:bd ssh_host_rsa_key.pub
Andreas Knüpfer: HRSK-Einführung
You are on – what do you find?
• HRSK: Standard Linux Enterprise installation (SuSE SLES 10 SP 2)
• Phobos: SuSE SLES 9 SP 3
• SX-6: SuperUX (Special UNIX environment)
• Similar to a Desktop Linux (some special programs missing)
• GCC, automake, and all the standard tools are there
• Only a limited number of GUI tools available (usually not needed)
• Caution: The amount of CPU time on the login nodes is limited to 5 minutes
• This can cause problems for large file transfers  contact us in this case
• 3rd party software or stuff that is not in the Linux distribution via enviroment
modules
Modules for environment variables
Non standard software installed into special paths (not in standard search
path for applications)
Modules set environment variables so that applications and libraries find their
binaries/shared objects
Show installed modules
module avail
Show currently loaded modules
module list
Load a module
module load <name>
Unload a module
module rm <name>
Exchange modules
module switch <1> <2>
Andreas Knüpfer: HRSK-Einführung
HRSK-Software
Installed Software on the HRSK Systems (not complete, not all on all systems):
Compilers:
– GCC
– Intel
– Pathscale
– PGI
Debuggers:
– ddd
– ddt
– idb
Libraries:
– acml
– atlas
– blacs
– blas
– boost
– hypre
– lapack
– mkl/clustermkl
– netcdf
– petsc
Applications:
– Abaqus
– Ansys
– CFX
– Comsol
– CP2K
– Fluent
– Gamess
– Gaussian
– Gromacs
– Hmmer
– Lammps
– Totalview
– Valgrind
Andreas Knüpfer: HRSK-Einführung
–
–
–
–
–
–
–
–
–
–
LS-Dyna
Maple
Mathematica
Matlab
MSC
Namd
Numeca
Octave
R
Tecplot
File system layout
Michael Kluge
Altix 4700
CXFS
– The same on all Altix partitions
– work
[ /work ]
• contains /work/home[0-9]/
• 8,8 TB
• Backup
– fastfs
[ /fastfs ]
• 60 TB
• DMF, no Backup
• Fastest file system
scratch
[ /scratch ]
• Local – only visible per Altix partition
• Fast alternative to /tmp
Michael Kluge
Deimos
Lustre
– work
[/work]
• contains/work/home[0-9]/
• global 16 TB
• Backup
– fastfs
[ /fastfs ]
• global 48 TB
• noBackup
• Fastest available file system
local (ext3)
– scratch [ /scratch ]
• local per node (per core about 40 GB)
Michael Kluge
Deimos (2)
NFS
– /hpc_fastfs
• /fastfs from the Altix
• Also dmf commands available to access archive
• Also Deimos only users have access here to archive data
– /hpc_work
• /work from the Altix
• Incl. Home directories there
Michael Kluge
Project directories
• You are by default in a a user group the has the same name as your project
• Your project has a shared „Home“ and „Fastfs“ directory for you to share
applications and data
• There are symbolic links in your home directory to the project directories
• Please use them and do not install software into each of your project
members home directories!
DMF - Commands
DMF copies data back and forth automatically
Manual invocation possible to migrate data between disk and tape
dmput
– Moves data from disk to tape
– “-r” also removes the data from disk after moving
– Moving is done in the background
dmls
– Extended ls
– Displays the location of the file data (ONL=disk, OFL=tape; DUL=on disk
and tape; MIG=currently moving to tape; UNM=currently moved to disk)
dmget
– Recalls data from tape to disk
Use dmput/dmget calls for full directories if needed!!
Michael Kluge
I/O Recommendations
Temporary data data -> /fastfs
Compile in /scratch
Source code etc. -> home
Checkpoints -> fastfs
Archive results as tar files (no need to compress) to /fastfs or /hpc_fastfs and
run dmput -r on it afterwards
Parallel file systems are bad for small I/O! (e.g. compilation)
Large I/O bandwitdth with
– Lots of clients
– Lots of processes (that may even write to the same file)
– Large I/O blocks
Michael Kluge
Things to know about the hardware you are/will be using
Slide 20 - Guido Juckeland
SGI Altix 4700
SGI Altix 4700 (5 partitions)
1024 x 1.6GHz/18MB L3 Cache
Itanium II / Montecito CPUs (2048 Cores)
13,1 TFlop/s Peak Performance
6,6 TB Memory (4 GB/Core)
NumaLink4
Local disks + 68 TB SAN
SuSE SLES 10 incl. SGI ProPack 4
Intel Compiler and Tools
Vampir
Alinea DDT Debugger
Batchsystem LSF
Michael Kluge
CPU
Intel Itanium II (Montecito), ca. 1.7 Billion transistors
IA-64 (not x86!!!)
1.6 GHz
Dual-Core
per Core:
– L1: 16 KB Data (no floating-point data) / 16 KB instructions
– L2: 256 KB Data / 1024 KB Instructions
– L3: 9 MB
Instuction bundles with 128 bit
3 instructions per bundle
No out-of-order execution
Depends extremely on the compiler (do not use GCC!!)
Michael Kluge
Connection to local memory and the rest of the system
Itanium II
Socket
DDR2 DIMM
DDR2 DIMM
DDR2 DIMM
DDR2 DIMM
DDR2 DIMM
DDR2 DIMM
10,7 GB/s
SHUB 2.0
NumaLink4 2*6,4 GB/s
DDR2 DIMM
DDR2 DIMM
DDR2 DIMM
DDR2 DIMM
DDR2 DIMM
DDR2 DIMM
Michael Kluge
The whole system architecture
1 Chip (2 Cores) per blade
8 Blades per IRU
4 IRUs per Rack
32 Racks
 1024 Chips
2048 Cores spread over 5 partitions
one Paritition = 1 computer (1 operating system instance)
Michael Kluge
jupiter - Topology
Folie: SGI
Michael Kluge
Altix partitions
On all partitions:
4 CPUs set aside for the operating system
mars:
– 384 GB main memory
– 32 Prozessoren Login
– 346 Prozessoren batch operations
jupiter,saturn,uranus
– 2 TB main memory
– 506 CPUs batch operations
neptun
– 124 Prozessoren interactive use
– 2 * FPGA
– 4 graphic boards
Michael Kluge
User‘s view on the Altix
Login via SSH -> terminal emulation
Boot-CPU-Set with 4 processors
SuSE Enterprise Server 10 SP 2
Standard Linux-Kernel
Batch system places user requests on the rest of the available processors
(also on the other partitions)
Access
via ssh
Firewall
Login
mars
LSF
jupiter, saturn, uranus
LSF
neptun
FPGA
Graphics
Michael Kluge
Linux Networx PC-Farm (Deimos)
1292 AMD Opteron x85 Dual-Core CPUs (2,6 GHz)
726 Compute nodes with 2, 4 oder 8 CPU Cores
Per core 2 GiByte main memory
2 Infiniband interconnects (MPI- and I/O-Fabric)
68 TByte SAN-Storage
Per node 70, 150, 290 GByte scratch-disk
OS: SuSE SLES 10
Batch system: LSF
Compiler: Pathscale, PGI, Intel, Gnu
3rd party applications: Ansys100, CFX, Fluent, Gaussian,
LS-DYNA, Matlab, MSC,…
Slide 28 - Guido Juckeland
Deimos - Partitions
2 Master Nodes
– Not accessible for users, PC-Farm management
4 Login Nodes
– 4 Core Nodes
– Accessible with DNS Round Robin under deimos.hrsk.tu-dresden.de
Single-, Dual- und Quad-Nodes
– 1, 2 or 4 CPUs
– 4, 8 or 16 GiByte main memory (24 Quads with 32 GiByte)
– 80, 160 or 300 GByte local disks
Setup in phase 1 and phase 2 nodes
– Identical hardware
– Differences in the connection to the MPI- and the I/O-Fabric (later)
Slide 29 - Guido Juckeland
(4 GiByte)
Memory
Deimos – Layout of a single-CPU node
AMD
Opteron
185
Hypertransport
Peripheral devices
(Infiniband, Ethernet, Disk)
Slide 30 - Guido Juckeland
Hypertransport
Hypertransport
Peripheral devices
(Infiniband, Ethernet, Festplatte)
Slide 31 - Guido Juckeland
AMD
Opteron
285
(4 GiByte)
AMD
Opteron
285
Memory
(4 GiByte)
Memory
Deimos – Layout of a dual-CPU nodes
(4 GiByte)
Memory
Hypertransport
AMD
Opteron
885
Hypertransport
Hypertransport
Peripheral devices
(Infiniband, Ethernet, Festplatte)
Slide 32 - Guido Juckeland
(4 GiByte)
Hypertransport
AMD
Opteron
885
(4 GiByte)
Hypertransport
AMD
Opteron
885
Memory
AMD
Opteron
885
Memory
(4 GiByte)
Memory
Deimos - Layout of a quad-CPU Node
Deimos Infiniband-Layout (rough sketch)
Node
Node
Node
MPI Netzwerk
Node
Node
Node
IO Netzwerk
Node
...
Node
...
Node
Node
Slide 33 - Guido Juckeland
Deimos MPI-Fabric
3 288-Port Voltaire ISR 9288 IB-Switches with 4x Infiniband Ports
+-------------------+
+--------------------+
+-------------------+
|
Switch 1
|
|
Switch 2
|
|
Switch 3
|
|
| 30x |
| 30x |
|
|
Rack 05
|-------|
Rack 20
|-------|
Rack 25
|
|
|
|
|
|
|
| all Phase1 Nodes |
| Phase2 Duals+Quads |
| Phase 2 Singles |
+-------------------+
+--------------------+
+-------------------+
Slide 34 - Guido Juckeland
Deimos I/O Fabric
Tree structure with
– 1 192 Port Voltaire ISR 9288 IB-Switch with 4x Infiniband Ports (Rack 07)
– 36 24 Port Mellanox IB-Switch (4x) passive
24 Port Mellanox
24 Port Mellanox
24 Port Mellanox
24 Port Mellanox
...
...
Voltaire
Core-Switch
24 Port Mellanox
24 Port Mellanox
Phase 2
Phase 1
Slide 35 - Guido Juckeland
Download