ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk Website Location http://www.archer.ac.uk support@archer.ac.uk Machine overview About ARCHER ARCHER (a Cray XC30) is a Massively Parallel Processor (MPP) supercomputer design built from many thousands of individual nodes. There are two basic types of nodes in any Cray XC30: • Compute nodes (4920) • These only do user computation and are always referred to as “Compute nodes” • 24 cores per node, therefore approx 120,000 cores • Service/Login nodes (72/8) • Login nodes – allow users to log in and perform interactive tasks • Other misc service functions • Serial/Post-Processing Nodes (2) Interacting with the system User guide Users do not log directly into the system. Instead they run commands via an esLogin server. This server will relay commands and information via a service node referred to as a “Gateway node” Cray XC30 Cabinets External Network Serial node Compute node Compute node esLogin node Gateway node LNET Nodes Ethernet Compute node Compute node Compute node Compute node Cray Sonnexion Filesystem Lustre OSS Lustre OSS Infiniband links Job submission example Quick start guide #!/bin/bash --login Test-job.o50818 #PBS -l select=2 my_job.pbs PBS QUEUE #PBS -N test-job #PBS -A budget Test-job.e50818 #PBS -l walltime=0:20:0 Compute Compute Compute node node node Compute node Compute node Compute node # Make sure any symbolic links are resolved to absolute path export PBS_O_WORKDIR=$(readlink -f $PBS_O_WORKDIR) nbrown23@eslogin008:~> qsub my_job.pbs aprun -n 48 -N 24 ./hello_world nbrown23@eslogin008:~> nbrown23@eslogin008:~> qstat qstat –u –u $USER $USER 50818.sdb 50818.sdb 50818.sdb nbrown23 nbrown23 standard standard test-job test-job 29053-- 2 24848 -- --00:20 00:20 R Q00:00 -nbrown23@eslogin008:~> ARCHER Layout Compute node architecture and topology Cray XC30 node The XC30 Compute node features: • 2 x Intel® Xeon® Sockets/die Cray XC30 Compute Node NUMA Node 0 NUMA Node 1 32GB 32GB QPI • 12 core Ivy Bridge • 64GB in normal nodes • 128GB in 376 “high memory” nodes • 1 x Aries NIC • Connects to shared Aries router and wider network DDR3 Intel® Xeon® 12 Core die Aries NIC Aries Router Intel® Xeon® 12 Core die PCIe 3.0 Aries Network XC30 Compute Blade Cray XC30 Rank1 Network o Chassis with 16 compute blades o 128 Sockets o Inter-Aries communication over backplane o Per-Packet adaptive Routing Cray XC30 Rank-2 Copper Network 2 Cabinet Group 768 Sockets 6 backplanes connected with copper cables in a 2cabinet group: 16 Aries connected by backplane 4 nodes connect to a single Aries Active optical cables interconnect groups Copper & Optical Cabling Optical Connections Copper Connections ARCHER Filesystems Brief Overview Nodes and filesystems Login/PP Nodes RDF Compute Nodes /home /work ARCHER Filesystems User guide • /home (/home/n02/n02/<username>) • Small (200 TB) filesystem for critical data (e.g. source code) • Standard performance (NFS) • Fully backed up • /work (/work/n02/n02/<username>) • Large (>4 PB) filesystem for use during computations • High-performance, parallel (Lustre) filesystem • No backup • RDF (/nerc/n02/n02/<username>) • Research Data Facility • Very large (26 PB) filesystem for persistent data storage (e.g. results) • High-performance, parallel (GPFS) filesystem • Backed up via snapshots Research Data Facility RDF guide • Mounted on machines such as: • ARCHER (service and PP nodes) • DiRAC Bluegene/Q (frontend nodes) • Data Transfer Nodes (DTN) • Jasmin • Data Analytic Cluster (DAC) • Run compute, memory, or IO intensive analyses on data hosted on the service. • Nodes are specifically tailored for data intensive work with direct connections to the disks. • Separate from ARCHER but very similar architecture ARCHER Software Brief Overview Cray’s Supported Programming Environment Programming Languages Fortran Programming models Distributed Memory (Cray MPT) • MPI • SHMEM Compilers Cray Compiling Environment (CCE) Tools Optimized Scientific Libraries Environment setup LAPACK I/O Libraries NetCDF Modules ScaLAPACK Debuggers HDF5 BLAS (libgoto) C GNU Shared Memory Allinea (DDT) • OpenMP 3.0 • OpenACC C++ PGAS & Global View Python • UPC (CCE) • CAF (CCE) • Chapel lgdb 3rd Party Compilers • Intel Composer Debugging Support Tools • Abnormal Termination Processing Iterative Refinement Toolkit Cray Adaptive FFTs (CRAFFT) FFTW STAT Cray PETSc (with CASK) Performance Analysis Cray Trilinos (with CASK) •CrayPat Cray developed Licensed ISV SW 3rd party packaging Cray added value to 3rd party • Cray Apprentice2 Scoping Analysis Reveal 17 Module environment Best practice guide • Software is available via the module environment • Allows you to load in different packages and different versions of packages • Deals with potential library conflicts • This is based around the module command • List currently loaded modules: module list • List all modules: module available • Load a module: module load x • Unload a module: module unload x ARCHER SAFE Service Administration https://www.archer.ac.uk/safe SAFE SAFE user guide • SAFE is an online ARCHER management system which all users have an account on • Request machine accounts • Reset passwords • View resource usage • Primary way in which PIs manage their ARCHER projects • Management of project users • Track user’s project usage • Email users of the project Project resources User guide • Machine usage is charged in kAUs. • This is time running your jobs on each compute node, 0.36 kAUs for a node hour. • There is no usage charge for time spent working on the login nodes, post processing nodes or RDF DAC • You can track usage via the SAFE or the budgets command (calculated daily.) • Disk quotas • There is no specific charge made for disk usage, but all projects have quotas • If you need more disk space then contact the PI or us if you manage the project To conclude…. • You will be using ARCHER during this course • If you have any questions then let us know • The documentation on the archer website is a good reference tool • Especially the quick start guide • In normal use if you have any questions or can not find something then contact the helpdesk • support@archer.ac.uk