Getting Started on Emerald ITS- Research Computing Group Course Objectives Word for the Day: Heterogeneous Emerald: the Swiss army knife of computing, something for everyone :) Something you can use today A reference for something you can use tomorrow its.unc.edu 2 Course Objectives Cont. Educate users on the broader aspects of research computing Practical knowledge to allow you to efficiently perform your research Pointers towards more advanced topics its.unc.edu 3 Course Outline Course Objectives What are compute clusters and Emerald in particular? Accessing Emerald • login • file systems Running jobs on Emerald – Job Management • job schedulers • batch commands • submitting jobs • specialty scripts Available Software • software • package space Compiling Code its.unc.edu 4 Help Documentation Getting Started on Emerald • http://help.unc.edu/6020 • General overview of Emerald for range of users Short Course – Getting Started on Emerald • http://help.unc.edu/6479 • Detailed notes for beginning Emerald users its.unc.edu 5 What is a compute cluster? What is Emerald? Emerald Linux Cluster its.unc.edu 7 What is Emerald? General Purpose Linux Cluster • Maintained by Research Computing Group Appropriate for all users regardless of expertise level Other Servers: • Cedar/Cypress (128-processor SGI/Altix) a large shared memory system • Topsail (4160-processor Dell Linux Cluster) homogeneous capability cluster with fast interconnect Mass Storage • Account access its.unc.edu 8 What is a compute cluster? Some Typical Components Compute Nodes Interconnect Shared File System Software Operating System (OS) Job Scheduler/Manager Mass Storage its.unc.edu 9 Emerald is a Heterogeneous Cluster Compute Nodes • Xeon blades, IBM Power 4 and Power5 Interconnect • Gigabit Ethernet (aka gigE or GbE) Shared File Systems • AFS, NFS, and GPFS Mass Storage Software • much licensed and public domain s/w in package space Operating Systems (OS) • RH5 (64bit), RH4 (32 bit) and AIX (64 bit) Job Scheduler/Manager • all handled by LSF • ~/ms its.unc.edu 10 Emerald Overview its.unc.edu 11 Advantages of Using Emerald High performance Large capacity Parallel processing Many available software packages Variety of compiling options Shared file systems Mass storage its.unc.edu 12 Emerald Compute Nodes Mostly IBM BladeCenter Xeon blades • all are dual Socket Intel Xeons • 1, 2, or 4 cores/socket (i.e. 2,4,8 processors/node) • 2.0, 2.8, 3.0, 3.2 GHz processors • varying memory, mostly 2 or 4 GB per core IBM Power 4 and 5 • large memory, varying processor speeds Cluster is constantly evolving its.unc.edu 13 Emerald Blades A chassis with 14 blades its.unc.edu 14 Emerald Summary Over 200 host blade nodes, Intel Xeon • Over 800 blade cores • typically 2-4 GB memory per core 4 IBM AIX p575’s, Power 5 • 64 cores, large memory 2 large memory Intel “Nehalem” X5570 nodes • 8 cores, 96 GB memory, 2.93 Ghz cpu Gigabit Ethernet switching fabric Running 32 and 64 bit Linux and 64 bit AIX its.unc.edu 15 Emerald Details Run the lshosts command to see resources for each node (host). Note host, model, ncpus, maxmem, resources %lshosts • HOST_NAME type model cpuf ncpus maxmem maxswp server • • • • RESOURCES bc12-n01 X86_64 Xeon_3_2 12.0 2 3954M 996M Yes (X64bit blade blade12 L26 lammpi mem3 mem4 mpich2 mpichp4 RH5 tmp25G xeon32) bc10-n10 X86_64 Xeon_2_8 11.7 2 3954M 996M Yes (X64bit blade blade10 L26 lammpi mem3 mem4 mpich2 mpichp4 RH5 tmp25G xeon28) bc09-n01 X86_64 Xeon_2_8 11.7 2 3954M 996M Yes (X64bit blade blade9 L26 lammpi mem3 mem4 mpich2 mpichp4 RH5 tmp25G xeon28) bc01-n01 X86_64 Xeon_3_0 11.9 8 32190M 29313M Yes (X64bit blade blade1 L26 lammpi mem32 mpich2 mpichp4 RH5 tmp100G xeon30) its.unc.edu 16 Accessing Emerald Logging Into Emerald UNIX/Linux/OSX • ssh my_onyen@emerald.unc.edu • ssh –l my_onyen emerald.unc.edu Windows: SSH Secure Shell • X windows software -> shareware.unc.edu • Setting up a Profile for Emerald • Forwarding X11 packets its.unc.edu 18 Head Nodes Emerald has multiple head nodes or login nodes for • login and basic file manipulation • compiling • testing short (~ <1 min), small memory jobs Login nodes run the Linux operating system • take the Introduction to Linux class or see some of the many online tutorials if you are unfamiliar with Linux its.unc.edu 19 Home Directory on Emerald Home Directory • /afs/isis/home/m/y/my_onyen/ • 250 MB quota • ~/private/ • Files backed up daily [ ~/OldFiles ] • Space quota/usage in Home Directory: fs lq its.unc.edu 20 Work Directories on Emerald No space limit but periodically cleaned Not backed up!!! Work Directories: • /netscr/my_onyen, /nas/my_onyen, /nas2/my_onyen totals 26.2 TB • /largefs optimized for large file operations (> 1MB) 23 TB • /smallfs optimized for small file operations (< 1MB) 16 TB its.unc.edu 21 File Permissions Your home directory is in AFS space. AFS is a distributed networked file system. Permissions are determined by ACLs (access control lists) • see Introduction to AFS (http://help.unc.edu/215) The other files systems, /largefs, /netscr, etc. are controlled by the usual Linux file permissions • making everything under /netscr/myOnyen accessible: chmod –R a+rX /netscr/myOnyen its.unc.edu 22 Mass Storage access via ~/ms looks like ordinary disk file system – data is actually stored on tape “limitless” capacity data is backed up For storage only, not a work directory (i.e. don’t run jobs from here) if you have many small files, use tar or zip to create a single file for better performance Sign up for this service on onyen.unc.edu its.unc.edu “To infinity … and beyond” - Buzz Lightyear 23 Job Scheduling and Management What does a Job Scheduler and batch system do? Manage Resources its.unc.edu allocate user tasks to resource monitor tasks process control manage input and output report status, availability, etc enforce usage policies 25 LSF All Research Computing clusters use LSF to do job scheduling and management LSF (Load Sharing Facility) is a (licensed) product from Platform Computing • Fairly distribute compute nodes among users • enforce usage policies for established queues most common queues: int, now, week, month • RC uses Fair Share scheduling, not first come, first served (FCFS) LSF commands typically start with the letter b (as in batch), e.g. bsub, bqueues, bjobs, bhosts, … • see man pages for much more info! its.unc.edu 26 Simplified view of LSF Jobs Queued job_J Login Node job dispatched to run on available host which satisfies job requirements job_F myjob job routed to queue job_7 bsub –R X64bit –q week myjob user logged in to login node submits job its.unc.edu 27 Common batch commands bsub - submit jobs bqueues – view info on defined queues • bqueues –l week bkill – stop/cancel submitted job bjobs – view submitted jobs • bjobs –u all bhist – job history • bhist –l <jobID> bhosts – status and resources of hosts (nodes) its.unc.edu 28 Common batch commands bpeek – display output of running job Use man pages to get much more info! • man bjobs bfree – query LSF to find job slots currently available that fit your resource requirement • this is a RC command extension • bfree –help (or –h) jobmon – monitor changes in job status • this is a RC command, typically runs in a separate window its.unc.edu 29 Submitting Jobs: bsub Command Submit Jobs - bsub • All files must be in scratch space, e.g. /netscr, /largefs, /smallfs Home directory is not mounted on compute nodes • bsub [- bsub_opts] executable [-exec_opts] its.unc.edu 30 bsub continued Common bsub options: • –o <filename> –o out.%J • -q <queue name> -q now • -R “resource specification” -R xeon30 • -n <number of processes> used for parallel, MPI jobs • -a <application specific esub> -a mpichp4 (used on MPI jobs) its.unc.edu 31 Two methods to submit jobs: bsub example: submit the executable job, myexe, to the week queue to run on a 64 bit Linux OS and redirect output to the file out.<jobID> (default is to mail output) Method 1: Command Line • bsub –q week –R X64bit –o out.%J myexe Method 2: Create a file (details to follow) called, for example, myexe.bsub, and then submit that file. Note the redirect symbol, < • bsub < myexe.bsub its.unc.edu 32 Method 2 cont. The file you submitted will contain all the bsub options you want in it, so for this example myexe.bsub will look like this • #BSUB –q week • #BSUB –o out.%J • #BSUB –R X64bit • myexe This is actually a shell script so the top line could be the normal #!/bin/csh, etc and you can run any commands you would like. • if this doesn’t mean anything to you then nevermind :) its.unc.edu 33 Parallel Job example Batch Command Line Method bsub –q week –o out.%J -n 30 -a mpichp4 mpirun.lsf myParallelExe Batch File Method bsub < myexe.bsub where myexe.bsub will look like this #BSUB –q week #BSUB –o out.%J #BSUB –a mpichp4 #BSUB –n 30 mpirun.lsf myexe its.unc.edu 34 Submitting Jobs: Specialty Scripts Running a SAS job through batch (2 ways) • bsub -q week -R blade sas program.sas • bsas test.sas Running a Matlab job through batch (2 ways) • bsub -q week -R blade matlab -nodisplay nojvm -nosplash program.m -logfile program.log • bmatlab test.m its.unc.edu 35 Interactive Jobs: Setup X-Windows • Linux/OSX X11 client • Windows X-Win32 Offered on UNC Software Acquisition site https://shareware.unc.edu Port forwarding on SSH Secure Shell Setting up a session on X-Win32 its.unc.edu 36 Interactive Jobs: Submission –Ip or -Is • bsub –q int –R blade –Ip sas • bsub –q int –R blade –Ip gv • bsub –q int –R blade –Ip matlab • bsub –q int –Is tcsh Specialty Scripts • xsas • xstata its.unc.edu 37 Software Licensed Software over 20 licensed software applications (some are site licensed, others restricted) • Matlab, Maple, Mathematica, Gaussian, Accelrys Materials Studio and Discovery Studio modules, Sybyl, Schrodinger, SAS, Stata, ArcGIS, NAG, IMSL, Totalview, and more. compilers (licensed and otherwise) • intel, PGI, absoft, gnu, IBM Numerous other packages provided for research and technical computing • including BLAST, PyMol, SOAP, PLINK, NWChem, R, Cambridge Structural Database, Amber, Gromacs, Petsc, Scalapack, Netcdf, Babel, Qt, Ferret, Gnuplot, Grace, iRODS, XCrySDen, and more. its.unc.edu 39 Available Software Most of the software is installed under AFS and is made available through package space. AFS (Andrew File System) is a distributed networked file system. Your home directory and software packages are mounted in AFS space. • A new token is issued at login and it expires after 24 hours. Use klog to renew this. Changes made to your package space are preserved over login sessions. its.unc.edu 40 Package Space Use ipm (Isis Package Manager) to manage your packages. ipm commands • ipm add (ipm a) • ipm remove (ipm r) • ipm query (ipm q) Available packages • http://help.unc.edu/1689 man ipm its.unc.edu 41 Compiling Compiling on Emerald Compilers • FORTRAN 77/90/95 • C/C++ Parallel Computing • MPI (MPICH, LAM/MPI, MPICH-GM) • OpenMP its.unc.edu 43 Compiling Details on Emerald Compiler Package name Intel intel_fortran, intel_CC ifort, icc, icpc Portland Group pgi pgf77, pgf90,pgcc,pgCC Absoft profortran f77, f90 GNU gcc gfortran, g77, gcc, g++ its.unc.edu Command 44 Compiling MPI programs Use the MPI wrappers to compile your program • mpicc, mpiCC, mpif90, mpif77 • the wrappers will find the appropriate include files and libraries and then invoke the actual compiler • for example, mpicc will invoke either gcc, icc, or pgcc depending upon which package you have loaded its.unc.edu 45 Compiling Details on Emerald Add a compiler into your working environment • ipm add package_name Compile a code • command code.c –o executable Run executable on a compute node using the bsub command • bsub –q week –R blade executable its.unc.edu 46 Contacting Research Computing Questions? For assistance with Emerald, please contact the Research Computing Group: • Email: research@unc.edu • Phone: 919-962-HELP • Submit help ticket at http://help.unc.edu its.unc.edu 47