Compute nodes - NCAS Computational Modelling Services

advertisement
ARCHER
Advanced Research Computing High End Resource
Nick Brown
nick.brown@ed.ac.uk
Website Location
http://www.archer.ac.uk
support@archer.ac.uk
Machine overview
About ARCHER
ARCHER (a Cray XC30) is a Massively Parallel Processor
(MPP) supercomputer design built from many thousands of
individual nodes.
There are two basic types of nodes in any Cray XC30:
• Compute nodes (4920)
• These only do user computation and are always referred to as “Compute
nodes”
• 24 cores per node, therefore approx 120,000 cores
• Service/Login nodes (72/8)
• Login nodes – allow users to log in and perform interactive tasks
• Other misc service functions
• Serial/Post-Processing Nodes (2)
Interacting with the system
User guide
Users do not log directly into the system. Instead they run commands via an
esLogin server. This server will relay commands and information via a service
node referred to as a “Gateway node”
Cray XC30 Cabinets
External
Network
Serial
node
Compute
node
Compute
node
esLogin
node
Gateway
node
LNET
Nodes
Ethernet
Compute
node
Compute
node
Compute
node
Compute
node
Cray Sonnexion
Filesystem
Lustre
OSS
Lustre
OSS
Infiniband links
Job submission example
Quick start guide
#!/bin/bash --login
Test-job.o50818
#PBS
-l select=2
my_job.pbs
PBS QUEUE
#PBS -N test-job
#PBS -A budget
Test-job.e50818
#PBS
-l walltime=0:20:0
Compute
Compute
Compute
node
node
node
Compute
node
Compute
node
Compute
node
# Make sure any symbolic links are resolved to absolute path
export PBS_O_WORKDIR=$(readlink -f $PBS_O_WORKDIR)
nbrown23@eslogin008:~>
qsub
my_job.pbs
aprun
-n 48 -N 24 ./hello_world
nbrown23@eslogin008:~>
nbrown23@eslogin008:~>
qstat
qstat –u
–u $USER
$USER
50818.sdb
50818.sdb
50818.sdb nbrown23
nbrown23 standard
standard test-job
test-job 29053-- 2 24848 -- --00:20
00:20
R Q00:00
-nbrown23@eslogin008:~>
ARCHER Layout
Compute node architecture and topology
Cray XC30 node
The XC30 Compute node
features:
• 2 x Intel® Xeon®
Sockets/die
Cray XC30 Compute Node
NUMA Node 0
NUMA Node 1
32GB
32GB
QPI
• 12 core Ivy Bridge
• 64GB in normal nodes
• 128GB in 376 “high
memory” nodes
• 1 x Aries NIC
• Connects to shared Aries
router and wider network
DDR3
Intel® Xeon®
12 Core die
Aries NIC
Aries
Router
Intel® Xeon®
12 Core die
PCIe 3.0
Aries
Network
XC30 Compute Blade
Cray XC30 Rank1 Network
o Chassis with 16 compute blades
o 128 Sockets
o Inter-Aries communication over
backplane
o Per-Packet adaptive Routing
Cray XC30 Rank-2 Copper Network
2 Cabinet
Group
768 Sockets
6 backplanes
connected with
copper cables in a 2cabinet group:
16 Aries connected
by backplane
4 nodes
connect to a
single Aries
Active optical
cables interconnect
groups
Copper & Optical Cabling
Optical
Connections
Copper
Connections
ARCHER Filesystems
Brief Overview
Nodes and filesystems
Login/PP Nodes
RDF
Compute
Nodes
/home
/work
ARCHER Filesystems
User guide
• /home (/home/n02/n02/<username>)
• Small (200 TB) filesystem for critical data (e.g. source code)
• Standard performance (NFS)
• Fully backed up
• /work (/work/n02/n02/<username>)
• Large (>4 PB) filesystem for use during computations
• High-performance, parallel (Lustre) filesystem
• No backup
• RDF (/nerc/n02/n02/<username>)
• Research Data Facility
• Very large (26 PB) filesystem for persistent data storage (e.g. results)
• High-performance, parallel (GPFS) filesystem
• Backed up via snapshots
Research Data Facility
RDF guide
• Mounted on machines such as:
• ARCHER (service and PP nodes)
• DiRAC Bluegene/Q (frontend nodes)
• Data Transfer Nodes (DTN)
• Jasmin
• Data Analytic Cluster (DAC)
• Run compute, memory, or IO intensive analyses on data hosted on
the service.
• Nodes are specifically tailored for data intensive work with direct
connections to the disks.
• Separate from ARCHER but very similar architecture
ARCHER Software
Brief Overview
Cray’s Supported Programming Environment
Programming
Languages
Fortran
Programming
models
Distributed
Memory
(Cray MPT)
• MPI
• SHMEM
Compilers
Cray Compiling
Environment
(CCE)
Tools
Optimized Scientific
Libraries
Environment setup
LAPACK
I/O Libraries
NetCDF
Modules
ScaLAPACK
Debuggers
HDF5
BLAS (libgoto)
C
GNU
Shared Memory
Allinea (DDT)
• OpenMP 3.0
• OpenACC
C++
PGAS & Global
View
Python
• UPC (CCE)
• CAF (CCE)
• Chapel
lgdb
3rd Party
Compilers
• Intel
Composer
Debugging Support
Tools
• Abnormal
Termination
Processing
Iterative
Refinement
Toolkit
Cray Adaptive
FFTs (CRAFFT)
FFTW
STAT
Cray PETSc
(with CASK)
Performance Analysis
Cray Trilinos
(with CASK)
•CrayPat
Cray developed
Licensed ISV SW
3rd party packaging
Cray added value to 3rd party
• Cray
Apprentice2
Scoping Analysis
Reveal
17
Module environment
Best practice guide
• Software is available via the module environment
• Allows you to load in different packages and different versions of
packages
• Deals with potential library conflicts
• This is based around the module command
• List currently loaded modules: module list
• List all modules: module available
• Load a module: module load x
• Unload a module: module unload x
ARCHER SAFE
Service Administration
https://www.archer.ac.uk/safe
SAFE
SAFE user guide
• SAFE is an online ARCHER management system which
all users have an account on
• Request machine accounts
• Reset passwords
• View resource usage
• Primary way in which PIs manage their ARCHER projects
• Management of project users
• Track user’s project usage
• Email users of the project
Project resources
User guide
• Machine usage is charged in kAUs.
• This is time running your jobs on each compute node, 0.36 kAUs
for a node hour.
• There is no usage charge for time spent working on the login
nodes, post processing nodes or RDF DAC
• You can track usage via the SAFE or the budgets command
(calculated daily.)
• Disk quotas
• There is no specific charge made for disk usage, but all projects
have quotas
• If you need more disk space then contact the PI or us if you
manage the project
To conclude….
• You will be using ARCHER during this course
• If you have any questions then let us know
• The documentation on the archer website is a good
reference tool
• Especially the quick start guide
• In normal use if you have any questions or can not find
something then contact the helpdesk
• support@archer.ac.uk
Download