Uploaded by Arland Zatania

HPC+Intro+%2B+Linux+-+September+2021

advertisement
Introduction to HPC
at
Case Western Reserve
University
Friday, September 10, 2021
Virtual Zoom Event, 11a.m. - 1p.m.
Email: hpc-support@case.edu
Research Computing Services
• High Performance Computing
Support
● Pre-award Consultation
• Research Networking services
● Education and Awareness
• Research Storage and Archival
solutions
● Database Design
• Secure Research Environment for
computing on regulated data
● Facilitator for off-premise services
(XSEDE, OSC, AWS/Comm. Cloud)
Cyberinfrastructure
● Programming Services
https://case.edu/utech
https://sites.google.com/a/case.edu/hpcc
Search Button
HPC Account
You need to have an HPC account, sponsored by the PI
• Username (UID) based on Case Network ID (CaseID)
• Password (same as the Case SSO)
• Primary group (PI's name or CaseID) either research or class group
e.g. gxb43 or
sxr358_csds651
• /home/<UID> directory where you can keep all your important files.
Group quota is assessed
• Directory where you can keep additional data: e.g. /mnt/pan/courses
• Default shell: bash
Documentation
https://sites.google.com/a/case.edu/hpcc
HPC Cluster Resources
Components Cartoon
University
Network
Edge
rider/markov.case.edu
Admin
Nodes
Data
Transfer
Nodes
Web Portal
ondemand.case.edu
Head Nodes
(e.g. hpc3, hpctransfer)
RDS
RDS
RDS
Research Storage
Science
DMZ
SLURM
Master
HPC Storage
Batch nodes
GPU nodes
SMP nodes
(comptxxx)
(gputxxx)
(smptxx)
Resource
Manager
CWRU HPC Web Portal
●
●
●
●
Access via browser at https://ondemand.case.edu
Authenticate using SSO and Duo 2FA
No need to connect with VPN from outside of campus
Available tools:
○
○
○
○
Interactive shells/desktop
File Manager
Job Submission and Status
Specific interactive applications:
Jupyter, Jupyter with Tensorflow and RStudio
Demonstration
Authenticate to OnDemand
HPC Access via Terminal
Using ssh command on a Terminal+GUI or MobaXterm
• ssh [options] <user>@<hostname>
e.g. ssh -X <CaseID>@rider.case.edu - - results in,
• a network connection through 'rider.case.edu’ to a head node
• from the ‘-X’ option, a graphical comm channel is created
• Usage notes:
• need to run for each local shell session
• creates an independent process
• needs VPN connection from outside of campus - Duo 2FA
Documentation
https://sites.google.com/a/case.edu/hpcc/guides-and-training/faqs
/accessing-hpcc#h.p_mOFBl5SFfnna
Arriving in the cluster: Your home
cd
cd ~
cd ~<CaseID> #can be someone elses' home
cd $HOME
cd /home/<CaseID>
$HOME is an environment variable that points to
/home/<CaseID>
Keep your home tidy: Create subdirectories underneath your
/home/<CaseID>, ideally each job has its own subdirectory
HPC File Structure Overview
/ [root]
/home
/scratch
permanent
storage
/home/<caseid>
temp (14-day)
storage
/scratch/pbsjobs/
job.<jobid>.hpc
[e.g. job.16241609.hpc]
for scheduled jobs
/mnt
purchased,
large-scale storage
/mnt/pan
High-Performance
storage
/mnt/rstor
Research Storage
/usr
system files:
read-only
/usr/bin
/usr/lib
/usr/local
installed software
Orienting Yourself
Key Ideas
●
●
●
●
●
Organize files with directories or folders
'Working directory' is the current directory where you are working
File Structure: Collections of directories
Paths: the locations of directories and files
Common orientation commands:
○ pwd : list the current working directory full path (where am i?)
○ ls : list the files in the current working directory (what is located here?)
Demonstration
From 'home', list common directories and paths
File and filenames
● A file is a basic unit of storage
● The name can be long and contain any
characters, but typically just choose
characters and numbers and - or _
● Avoid , / : ; ! * & " " (i.e. blank space)
● Linux is case-sensitive
● The file data will have a format, such as:
○ Text
○ Binary
○ Specific: hdf5, netcdf, etc.
● Suffix is more important to you than to
linux shell
Relative vs. absolute
RELATIVE PATH:
cd ..
Go one directory up
./myexec Run the executable "myexec"
that can be found in the working directory
ABSOLUTE PATH:
cd /home/CaseID/polymer
/home/CaseID/rna/bowtie
#will fail if 'polymer' is a file
#run 'bowtie' from directory, '/home/CaseID/rna'
Linux help and preferences
Whenever in doubts about a command, access the man page:
- man <command>
- man -k <keyword>
- <command> --help
- Web Search for the command
You can also set preferences that run during the terminal startup, like certain
environment variables or aliases/shortcuts
- For bash: ~/.bash_profile or ~/.bashrc
- For csh or tcsh: ~/.cshrc
More file commands
cp - copy files
mv - move files or also rename files
rm - remove files [watch out! - no 'take backs']
rm –rf remove all files/directory underneath
will remove without warning
rm -rf * the most dangerous command in the world
Recursive directory : all directory/files underneath
cp -r /home/CaseID/job1 /home/CaseID/job2
Demonstration
Types of path, use of 'cp' and 'rm'
Permissions
3 levels: user, group, world
3 accesses: read, write, execute
use ls -l
gr
drwxr-x--2 hxd58 dormidontova 4096 Aug
o
-rw-r----1
hxd58
dormidontova
0
Aug
u
user
owner
p
group
group
world
Default access: users can read-write-execute
group can read-execute, world cannot do anything
4 13:59 anydir
4 13:58 anyfile
Changing Ownership and Permissions
chmod :modify the permission of the file
chmod 700 - only users can access
chmod +x
- make an executable (file)/traversal (dir) permission
chmod 770
- shared folder within the group, full permissions
chown, chgrp will change owner, group owner of the file, respectively
chown -R CaseID <directory>
chown CaseID:Groupname filename
chmod, chown, and chgrp
will allow -R recursive option
groups [uid]
Verify group membership
Editing and reading files
To read a text file:
cat <filename>
less <filename>
To edit a file, use a text editor program (maintain a text file format)
vi or vim -- essentially all linux shells
emacs
gedit
-- available on cluster headnodes
nano
"Simpler
interface"
Demonstration
Review permissions, use 'cat' and 'less' and 'vim'
Searching for files and Redirection
Necessary to be able to find your files again:
find <location path> -name <filename>
find . -name foobar
You can redirect and write your output to a particular file
./myexec > myoutputfile
or append to a particular file
./myexec >> myoutputfile
Or just use the redirection, if printing to screen is too long
cat myexec > myoutputfile
Pipes to filter your output
Commands can be linked together by pipes, from left to right
To find the number of files in your directory:
ls | wc -l
To print the first 10 lines of a file
head <filename>
To print the last 5 lines of a file
tail -5 <filename>
To print your command history
history | grep ssh
Creating a specific query
Find how many times ssh is called in my command history
$ history | grep ssh | wc -l
Can use other Linux commands such as 'awk', 'sed', 'cut', etc.
$ sed -i 's/foo/bar/g' myfile.txt
will replace all 'foo's and replace with 'bar's
$ history | awk '{print $2}' | sort | uniq
will list out all unique commands in the history, sorted
Demonstration
Pipe and redirection examples
Storage space management
Check your storage routinely to avoid exceeding the quota
panfs_quota
panfs_quota -G
#gives your current usage
#usage for all of your groups, quota in bytes
To check space on a directory, including all sub-directories:
du -sh
# h is 'human readable', using K,M,G,T
To check the partitions mounted on a system:
df -h
File compression
The most common compression in Linux is tar that is combined with gzip
tar czvf compressedfile.tgz <filenames/directories>
tar xzvf compressedfile.tgz
or with bzip2, which uses jcvf and jxvf option with .tbz2 extension
Note: if you transfer files from Windows, the text files may have hidden
characters. Use dos2unix command to remove the trailing character ^M.
(See also unix2dos)
Demonstration
Check group usage, compare compressed data size
Linux processes
top: will list out all the processes running on the node, starting from the
most consuming ones - detailed and refreshed
top also lists out the load (in cpu units) and memory utilized
To list my processes:
ps -ef | grep CaseID
To kill a process:
kill <pid> - terminates a process of yours
<Ctrl-C> - terminates the running process
<Ctrl-D> - terminates the current session|
Running process in the background
Running process in the background (using ampersand symbol &) allow you
to continue working on the current terminal
./myprocess -options &
If the process is already running, use the following sequence:
<Ctrl-Z >
bg
To bring the process back to the front:
fg
Demonstration
Manipulate 'du -sh' process in the shell
Copy files to and from cluster
globus.org
use scp, rsync or sftp from
command line
cwru#dtn3
WinSCP
Documentation
https://sites.google.com/a/case.edu/hpcc
/data-transfer
scp and rsync
To copy file/directory from one to the other
scp <-r> SOURCE TARGET
scp -r input_directory CaseID@markov.case.edu:.
scp -r CaseID@hpctransfer.case.edu:output_directory .
rsync behaves very similarly to the scp, but it allows you to "synchronize"
between the source and the target directory
rsync -av --delete SOURCE TARGET
Or just write the difference without deleting the existing files
rsync -av CaseID@arc-login.case.edu:dir/* desktop/dir/.
Demonstration
scp from campus server
Environment Variables on HPC
Keeping the environment organized, controlling change
Traditional linux approach is direct manipulation of values:
export PATH=<some new path>:$PATH
‣ echo $PATH # locations shell looks for commands & programs
/usr/local/intel-17/openmpi/2.0.1/bin:/usr/local/intel/17/compilers_and_libraries_2017/linux/bin/intel64:
/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/dell/srvadmin/bin
‣ echo $LD_LIBRARY_PATH # locations shell looks for run-time libraries
/usr/local/intel-17/openmpi/2.0.1/lib:/usr/local/intel/17/tbb/lib/intel64/gcc4.7:/usr/local/intel/17/compile
rs_and_libraries_2017/linux/mkl/lib/intel64:/usr/local/intel/17/compilers_and_libraries_2017/linux/lib/in
tel64:/usr/local/lib
HPC uses a different approach
Managing Shell Environment
'module' command - for interactive and batch shells
●
Functionally, modulefiles, written in lua, hold information about:
○
○
○
○
●
file structure organization
dependencies
software name
software version
and, instructions to manipulate the environment variables
○
○
○
○
set or push values
unset or remove values
read existing variables construct appropriate path names
perform conditional testing
'module --help', or 'man 1 module' for a synopsis
Documentation
https://lmod.readthedocs.io
Gcc + OpenMPI Hierarchy
Understanding Module Availability
On HPC, you might need to load a particular version of a compiler and OpenMPI in order to
find your module.
Command
Outcome
module avail
Shows the list of the current loadable modules of a hierarchy.
It also shows, visually, which modules are loaded.
module spider
Shows the list of all modules and versions available.
module spider
<module>/<ver>
Shows how to load the specific module version
Modules and Environment
● Default modules: StdEnv, intel/17, openmpi/2.0.1
● 'module' subcommands: load, display, unload, swap, purge
○ module load samtools fftw
○ module display fftw
○ module unload fftw
○ module swap intel gcc
○ module purge
● Caution: 'swap' will favor efficiency over accuracy
After 'module purge',
to restore default
modules
Modules and Environment
Case Study: gcc-6_3_0/openmpi-2_0_1
Consider Python: what versions are installed/available?
1. which python:
2. ‘module avail python’
3. ‘module spider python’ — information about python package
Demonstration
Assess Python versions
in the shell, using 'module'
Modules and Environment
Case Study: gcc-6_3_0/openmpi-2_0_1
Consider Python: what versions are installed/available?
1. which python: /bin/python # system install, ver 2.7.5
2. ‘module avail python’
------------------ /usr/local/share/modulefiles/MPI/gcc/6.3.0/openmpi/2.0.1 --------------------python/3.6.6
python/3.7.0
python2/2.7.13
spyder/3.2.0-python2
------------------------- /usr/local/share/modulefiles/Compiler/gcc/6.3.0 -------------------------python/3.8.6 (D)
3. ‘module spider python’ — information about python package
Versions:
python/3.5.1
python/3.7.0
python/3.6.6
python/3.8.6
Other possible module matches:
Python
python2
python3
A Few Closing Items
For bash:
For tcsh:
export <ENVAR>=<value>
export PATH=$PATH:/new/path
alias ll='ls -lrt'
setenv <ENVAR> <value>
set PATH= ($PATH /new/path)
alias ll "ls -lrt"
Customize Prompt:
export PS1="[\u@\h \W]\\$"
Customize Prompt:
setenv PROMPT "[%n@%m:%c]%#"
Monitoring Resources
Storage:
‘quotachk <CaseID>’ - will list out all storage available* to the user
'quotagrp <resgrpname>' - will list out user and group quota belonging to the group in
multiple storage locations below: /home, /scratch/pbsjobs, /scratch/users
$ quotagrp gxb43_sybb412 | grep 412
Output: Partition User/Group
/home
group:gxb43_sybb412
/pan/courses
group:gxb43_sybb412
Usage
Soft Quota Hard Quota
338.48 GB
225.65
130.18
120.27 GB
0
0
#Files
140.72
38.01
‘quotapan <volume name>’ - will list out the volume quota and usage of the volume
$ quotapan courses
Volume
/pan/courses
BladeSet
AS20
RAID
Object RAID6+
Space
Used
718.59 GB
Soft
Quota
9.50 TB
Hard
Quota
10.00 TB
Status
Online
footnote
* quotachk, grp read from static files; not real-time info
Monitoring Resources
Compute:
'i' - #Discussion of Slurm tools during Requesting Resources session
[mrd20@classct001 Desktop]$ i
****Your SLURM's CPU Quota****
tas35
24
****Your Current Jobs****
JOBID PRIOR
ST
ACCOUNT PARTITION NODES CPU MIN_MEMORY TIME_LIMIT
NODELIST
15527412
642 R
tas35 batch 1
6
36G
10:00:00 compt314
****Group's Jobs****
Account:tas35
JOBID
USER PRIOR
ST
PARTITION NODES CPU MIN_MEMORY TIME_LIMIT NODELIST
[CG]-Completing [PD]-Pending [R]-Running
HPC Cluster Glossary
•
•
•
Head Nodes: Development, Analysis, Job Submission -- not production
Compute Nodes: CPU-, GPU- and SMP-computers
Disk Storage
•
•
•
•
•
•
•
•
/home/<caseid> — limited, shared within group, backed up
/scratch/pbsjobs — supports jobs; 14-day deletion rule
/mnt/pan — high-throughput project-expansion space
/mnt/projects — research data project storage space
Internal network infrastructure (transparent to you, but essential!)
SLURM: Cluster workload manager & Scheduler
Data Transfer Nodes: [hpctransfer, dtn3, dtn2, dtn1].case.edu
Science data routing: Lowest “impedence” External Data Pathway
Summary
• Access to shell sessions, mainly through OnDemand web portal
• Tools to manage work in Multiuser, Multigroup Environment
• HPC Module System helps manage shell(script) environment
• Keep a watchful eye on the cluster resources. The research group,
and cluster resources are shared.
Play nicely.
• Google for specific keyword or task
• RCCI Staff on-hand for aid - Jump in and learn!
hpc-support@case.edu
RCCI Team: Roger Bielefeld, Mike Warfe, Hadrian Djohari
Brian Christian, Em Dragowsky, Jeremy Fondran, Cal Frye,
Sanjaya Gajurel, Matt Garvey, Theresa Grigger, Cindy Martin,
Sean Maxwell, Nasir Yilmaz, Lee Zickel
Thanks for attending and reviewing this material.
Download