Introduction to HPC at Case Western Reserve University Friday, September 10, 2021 Virtual Zoom Event, 11a.m. - 1p.m. Email: hpc-support@case.edu Research Computing Services • High Performance Computing Support ● Pre-award Consultation • Research Networking services ● Education and Awareness • Research Storage and Archival solutions ● Database Design • Secure Research Environment for computing on regulated data ● Facilitator for off-premise services (XSEDE, OSC, AWS/Comm. Cloud) Cyberinfrastructure ● Programming Services https://case.edu/utech https://sites.google.com/a/case.edu/hpcc Search Button HPC Account You need to have an HPC account, sponsored by the PI • Username (UID) based on Case Network ID (CaseID) • Password (same as the Case SSO) • Primary group (PI's name or CaseID) either research or class group e.g. gxb43 or sxr358_csds651 • /home/<UID> directory where you can keep all your important files. Group quota is assessed • Directory where you can keep additional data: e.g. /mnt/pan/courses • Default shell: bash Documentation https://sites.google.com/a/case.edu/hpcc HPC Cluster Resources Components Cartoon University Network Edge rider/markov.case.edu Admin Nodes Data Transfer Nodes Web Portal ondemand.case.edu Head Nodes (e.g. hpc3, hpctransfer) RDS RDS RDS Research Storage Science DMZ SLURM Master HPC Storage Batch nodes GPU nodes SMP nodes (comptxxx) (gputxxx) (smptxx) Resource Manager CWRU HPC Web Portal ● ● ● ● Access via browser at https://ondemand.case.edu Authenticate using SSO and Duo 2FA No need to connect with VPN from outside of campus Available tools: ○ ○ ○ ○ Interactive shells/desktop File Manager Job Submission and Status Specific interactive applications: Jupyter, Jupyter with Tensorflow and RStudio Demonstration Authenticate to OnDemand HPC Access via Terminal Using ssh command on a Terminal+GUI or MobaXterm • ssh [options] <user>@<hostname> e.g. ssh -X <CaseID>@rider.case.edu - - results in, • a network connection through 'rider.case.edu’ to a head node • from the ‘-X’ option, a graphical comm channel is created • Usage notes: • need to run for each local shell session • creates an independent process • needs VPN connection from outside of campus - Duo 2FA Documentation https://sites.google.com/a/case.edu/hpcc/guides-and-training/faqs /accessing-hpcc#h.p_mOFBl5SFfnna Arriving in the cluster: Your home cd cd ~ cd ~<CaseID> #can be someone elses' home cd $HOME cd /home/<CaseID> $HOME is an environment variable that points to /home/<CaseID> Keep your home tidy: Create subdirectories underneath your /home/<CaseID>, ideally each job has its own subdirectory HPC File Structure Overview / [root] /home /scratch permanent storage /home/<caseid> temp (14-day) storage /scratch/pbsjobs/ job.<jobid>.hpc [e.g. job.16241609.hpc] for scheduled jobs /mnt purchased, large-scale storage /mnt/pan High-Performance storage /mnt/rstor Research Storage /usr system files: read-only /usr/bin /usr/lib /usr/local installed software Orienting Yourself Key Ideas ● ● ● ● ● Organize files with directories or folders 'Working directory' is the current directory where you are working File Structure: Collections of directories Paths: the locations of directories and files Common orientation commands: ○ pwd : list the current working directory full path (where am i?) ○ ls : list the files in the current working directory (what is located here?) Demonstration From 'home', list common directories and paths File and filenames ● A file is a basic unit of storage ● The name can be long and contain any characters, but typically just choose characters and numbers and - or _ ● Avoid , / : ; ! * & " " (i.e. blank space) ● Linux is case-sensitive ● The file data will have a format, such as: ○ Text ○ Binary ○ Specific: hdf5, netcdf, etc. ● Suffix is more important to you than to linux shell Relative vs. absolute RELATIVE PATH: cd .. Go one directory up ./myexec Run the executable "myexec" that can be found in the working directory ABSOLUTE PATH: cd /home/CaseID/polymer /home/CaseID/rna/bowtie #will fail if 'polymer' is a file #run 'bowtie' from directory, '/home/CaseID/rna' Linux help and preferences Whenever in doubts about a command, access the man page: - man <command> - man -k <keyword> - <command> --help - Web Search for the command You can also set preferences that run during the terminal startup, like certain environment variables or aliases/shortcuts - For bash: ~/.bash_profile or ~/.bashrc - For csh or tcsh: ~/.cshrc More file commands cp - copy files mv - move files or also rename files rm - remove files [watch out! - no 'take backs'] rm –rf remove all files/directory underneath will remove without warning rm -rf * the most dangerous command in the world Recursive directory : all directory/files underneath cp -r /home/CaseID/job1 /home/CaseID/job2 Demonstration Types of path, use of 'cp' and 'rm' Permissions 3 levels: user, group, world 3 accesses: read, write, execute use ls -l gr drwxr-x--2 hxd58 dormidontova 4096 Aug o -rw-r----1 hxd58 dormidontova 0 Aug u user owner p group group world Default access: users can read-write-execute group can read-execute, world cannot do anything 4 13:59 anydir 4 13:58 anyfile Changing Ownership and Permissions chmod :modify the permission of the file chmod 700 - only users can access chmod +x - make an executable (file)/traversal (dir) permission chmod 770 - shared folder within the group, full permissions chown, chgrp will change owner, group owner of the file, respectively chown -R CaseID <directory> chown CaseID:Groupname filename chmod, chown, and chgrp will allow -R recursive option groups [uid] Verify group membership Editing and reading files To read a text file: cat <filename> less <filename> To edit a file, use a text editor program (maintain a text file format) vi or vim -- essentially all linux shells emacs gedit -- available on cluster headnodes nano "Simpler interface" Demonstration Review permissions, use 'cat' and 'less' and 'vim' Searching for files and Redirection Necessary to be able to find your files again: find <location path> -name <filename> find . -name foobar You can redirect and write your output to a particular file ./myexec > myoutputfile or append to a particular file ./myexec >> myoutputfile Or just use the redirection, if printing to screen is too long cat myexec > myoutputfile Pipes to filter your output Commands can be linked together by pipes, from left to right To find the number of files in your directory: ls | wc -l To print the first 10 lines of a file head <filename> To print the last 5 lines of a file tail -5 <filename> To print your command history history | grep ssh Creating a specific query Find how many times ssh is called in my command history $ history | grep ssh | wc -l Can use other Linux commands such as 'awk', 'sed', 'cut', etc. $ sed -i 's/foo/bar/g' myfile.txt will replace all 'foo's and replace with 'bar's $ history | awk '{print $2}' | sort | uniq will list out all unique commands in the history, sorted Demonstration Pipe and redirection examples Storage space management Check your storage routinely to avoid exceeding the quota panfs_quota panfs_quota -G #gives your current usage #usage for all of your groups, quota in bytes To check space on a directory, including all sub-directories: du -sh # h is 'human readable', using K,M,G,T To check the partitions mounted on a system: df -h File compression The most common compression in Linux is tar that is combined with gzip tar czvf compressedfile.tgz <filenames/directories> tar xzvf compressedfile.tgz or with bzip2, which uses jcvf and jxvf option with .tbz2 extension Note: if you transfer files from Windows, the text files may have hidden characters. Use dos2unix command to remove the trailing character ^M. (See also unix2dos) Demonstration Check group usage, compare compressed data size Linux processes top: will list out all the processes running on the node, starting from the most consuming ones - detailed and refreshed top also lists out the load (in cpu units) and memory utilized To list my processes: ps -ef | grep CaseID To kill a process: kill <pid> - terminates a process of yours <Ctrl-C> - terminates the running process <Ctrl-D> - terminates the current session| Running process in the background Running process in the background (using ampersand symbol &) allow you to continue working on the current terminal ./myprocess -options & If the process is already running, use the following sequence: <Ctrl-Z > bg To bring the process back to the front: fg Demonstration Manipulate 'du -sh' process in the shell Copy files to and from cluster globus.org use scp, rsync or sftp from command line cwru#dtn3 WinSCP Documentation https://sites.google.com/a/case.edu/hpcc /data-transfer scp and rsync To copy file/directory from one to the other scp <-r> SOURCE TARGET scp -r input_directory CaseID@markov.case.edu:. scp -r CaseID@hpctransfer.case.edu:output_directory . rsync behaves very similarly to the scp, but it allows you to "synchronize" between the source and the target directory rsync -av --delete SOURCE TARGET Or just write the difference without deleting the existing files rsync -av CaseID@arc-login.case.edu:dir/* desktop/dir/. Demonstration scp from campus server Environment Variables on HPC Keeping the environment organized, controlling change Traditional linux approach is direct manipulation of values: export PATH=<some new path>:$PATH ‣ echo $PATH # locations shell looks for commands & programs /usr/local/intel-17/openmpi/2.0.1/bin:/usr/local/intel/17/compilers_and_libraries_2017/linux/bin/intel64: /usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/dell/srvadmin/bin ‣ echo $LD_LIBRARY_PATH # locations shell looks for run-time libraries /usr/local/intel-17/openmpi/2.0.1/lib:/usr/local/intel/17/tbb/lib/intel64/gcc4.7:/usr/local/intel/17/compile rs_and_libraries_2017/linux/mkl/lib/intel64:/usr/local/intel/17/compilers_and_libraries_2017/linux/lib/in tel64:/usr/local/lib HPC uses a different approach Managing Shell Environment 'module' command - for interactive and batch shells ● Functionally, modulefiles, written in lua, hold information about: ○ ○ ○ ○ ● file structure organization dependencies software name software version and, instructions to manipulate the environment variables ○ ○ ○ ○ set or push values unset or remove values read existing variables construct appropriate path names perform conditional testing 'module --help', or 'man 1 module' for a synopsis Documentation https://lmod.readthedocs.io Gcc + OpenMPI Hierarchy Understanding Module Availability On HPC, you might need to load a particular version of a compiler and OpenMPI in order to find your module. Command Outcome module avail Shows the list of the current loadable modules of a hierarchy. It also shows, visually, which modules are loaded. module spider Shows the list of all modules and versions available. module spider <module>/<ver> Shows how to load the specific module version Modules and Environment ● Default modules: StdEnv, intel/17, openmpi/2.0.1 ● 'module' subcommands: load, display, unload, swap, purge ○ module load samtools fftw ○ module display fftw ○ module unload fftw ○ module swap intel gcc ○ module purge ● Caution: 'swap' will favor efficiency over accuracy After 'module purge', to restore default modules Modules and Environment Case Study: gcc-6_3_0/openmpi-2_0_1 Consider Python: what versions are installed/available? 1. which python: 2. ‘module avail python’ 3. ‘module spider python’ — information about python package Demonstration Assess Python versions in the shell, using 'module' Modules and Environment Case Study: gcc-6_3_0/openmpi-2_0_1 Consider Python: what versions are installed/available? 1. which python: /bin/python # system install, ver 2.7.5 2. ‘module avail python’ ------------------ /usr/local/share/modulefiles/MPI/gcc/6.3.0/openmpi/2.0.1 --------------------python/3.6.6 python/3.7.0 python2/2.7.13 spyder/3.2.0-python2 ------------------------- /usr/local/share/modulefiles/Compiler/gcc/6.3.0 -------------------------python/3.8.6 (D) 3. ‘module spider python’ — information about python package Versions: python/3.5.1 python/3.7.0 python/3.6.6 python/3.8.6 Other possible module matches: Python python2 python3 A Few Closing Items For bash: For tcsh: export <ENVAR>=<value> export PATH=$PATH:/new/path alias ll='ls -lrt' setenv <ENVAR> <value> set PATH= ($PATH /new/path) alias ll "ls -lrt" Customize Prompt: export PS1="[\u@\h \W]\\$" Customize Prompt: setenv PROMPT "[%n@%m:%c]%#" Monitoring Resources Storage: ‘quotachk <CaseID>’ - will list out all storage available* to the user 'quotagrp <resgrpname>' - will list out user and group quota belonging to the group in multiple storage locations below: /home, /scratch/pbsjobs, /scratch/users $ quotagrp gxb43_sybb412 | grep 412 Output: Partition User/Group /home group:gxb43_sybb412 /pan/courses group:gxb43_sybb412 Usage Soft Quota Hard Quota 338.48 GB 225.65 130.18 120.27 GB 0 0 #Files 140.72 38.01 ‘quotapan <volume name>’ - will list out the volume quota and usage of the volume $ quotapan courses Volume /pan/courses BladeSet AS20 RAID Object RAID6+ Space Used 718.59 GB Soft Quota 9.50 TB Hard Quota 10.00 TB Status Online footnote * quotachk, grp read from static files; not real-time info Monitoring Resources Compute: 'i' - #Discussion of Slurm tools during Requesting Resources session [mrd20@classct001 Desktop]$ i ****Your SLURM's CPU Quota**** tas35 24 ****Your Current Jobs**** JOBID PRIOR ST ACCOUNT PARTITION NODES CPU MIN_MEMORY TIME_LIMIT NODELIST 15527412 642 R tas35 batch 1 6 36G 10:00:00 compt314 ****Group's Jobs**** Account:tas35 JOBID USER PRIOR ST PARTITION NODES CPU MIN_MEMORY TIME_LIMIT NODELIST [CG]-Completing [PD]-Pending [R]-Running HPC Cluster Glossary • • • Head Nodes: Development, Analysis, Job Submission -- not production Compute Nodes: CPU-, GPU- and SMP-computers Disk Storage • • • • • • • • /home/<caseid> — limited, shared within group, backed up /scratch/pbsjobs — supports jobs; 14-day deletion rule /mnt/pan — high-throughput project-expansion space /mnt/projects — research data project storage space Internal network infrastructure (transparent to you, but essential!) SLURM: Cluster workload manager & Scheduler Data Transfer Nodes: [hpctransfer, dtn3, dtn2, dtn1].case.edu Science data routing: Lowest “impedence” External Data Pathway Summary • Access to shell sessions, mainly through OnDemand web portal • Tools to manage work in Multiuser, Multigroup Environment • HPC Module System helps manage shell(script) environment • Keep a watchful eye on the cluster resources. The research group, and cluster resources are shared. Play nicely. • Google for specific keyword or task • RCCI Staff on-hand for aid - Jump in and learn! hpc-support@case.edu RCCI Team: Roger Bielefeld, Mike Warfe, Hadrian Djohari Brian Christian, Em Dragowsky, Jeremy Fondran, Cal Frye, Sanjaya Gajurel, Matt Garvey, Theresa Grigger, Cindy Martin, Sean Maxwell, Nasir Yilmaz, Lee Zickel Thanks for attending and reviewing this material.