High Performance Computing John Zaitseff September 2014 High Performance Computing High Performance Computing architecture Massively Parallel Distributed Computational Cluster • • • • Many individual servers (“nodes”): dozens to thousands Multiple processors per node: between 8 and 64 cores Interconnected by fast networks Almost always run Linux – In our case: Rocks Linux Distribution on top of CentOS 6.x The Trentino cluster Image credit: John Zaitseff, UNSW Compute Node m-n Compute Node m-4 Compute Node m-3 Chassis 1 Compute Node m-2 Compute Node m-1 Compute Node 1-n Compute Node 1-4 Head Node Compute Node 1-3 Compute Node 1-2 Compute Node 1-1 Compute Node n Compute Node 4 Compute Node 3 Compute Node 2 Compute Node 1 High Performance Computing architecture Internet Storage Node Internal Network Switch Chassis m The Newton cluster: newton.mech.unsw.edu.au • • • • • 10 × Dell R415 server nodes – Head node: newton – Compute nodes: newton01 to newton09 160 × AMD Opteron 4386 3.1GHz processor cores – Two physical processors per node – Eight CPU cores per processor – Only four floating-point units per processor 320 GB of main memory (32 GB per node) 12 TB of storage: 6 × 3 TB drives in RAID 6 1Gb Ethernet network interconnect http://cfdlab.unsw.wikispaces.net/ The Newton cluster Image credit: John Zaitseff, UNSW The Trentino cluster: trentino.mech.unsw.edu.au • • • • • 16 × Dell R815 server nodes – Head node: trentino – Compute nodes: trentino01 to trentino15 1024 × AMD Opteron 6272 2.1GHz processor cores – Four physical processors per node – Sixteen CPU cores per processor – Only eight floating-point units per processor 2048 GB of main memory (128 GB per node) 30 TB of storage: 12 × 3 TB drives in RAID 6 4×1Gb Ethernet network interconnect http://cfdlab.unsw.wikispaces.net/ The back of the Trentino cluster Image credit: John Zaitseff, UNSW The Leonardi cluster: leonardi.eng.unsw.edu.au • • • • • • • 7 × HP BladeSystem c7000 blade enclosures 1 × HP ProLiant DL385 G7 server: leonardi 56 × HP BL685c G7 compute nodes – Compute nodes: ec01b01-ec07b08 2944 × AMD Opteron 6174 2.2GHz processor cores and Opteron 6276 2.3GHz processor cores – Four physical processors per node – Twelve or sixteen CPU cores per processor 5888 GB of main memory (96 or 128 GB per node) 95 TB of storage: 60 × 2 TB drives in RAID 60 2×10Gb Ethernet network interconnect http://leonardi.unsw.wikispaces.net/ Nodes in the Leonardi cluster Image credit: John Zaitseff, UNSW The Raijin cluster: raijin.nci.org.au • • • • • • • 3592 × Fujitsu blade server nodes Multiple login nodes Multiple management nodes 57,472 Intel Xeon E5-2670 2.60GHz processors 160 TB of main memory 10 PB of storage using the Lustre distributed file system 14Gb Infiniband FDR network interconnect Image credit: National Computational Infrastructure http://nci.org.au/nci-systems/national-facility/peak-system/raijin/ Compute Node m-n Compute Node m-4 Compute Node m-3 Chassis 1 Compute Node m-2 Compute Node m-1 Compute Node 1-n Compute Node 1-4 Head Node Compute Node 1-3 Compute Node 1-2 Compute Node 1-1 Compute Node n Compute Node 4 Compute Node 3 Compute Node 2 Compute Node 1 High Performance Computing architecture Internet Do not run your jobs here! Storage Node Internal Network Switch Chassis m Connecting to a HPC system • Use the Secure Shell protocol (SSH) – Under Linux: ssh username@hpcsystemname – Under Windows: PuTTY (Start » All Programs » PuTTY » PuTTY) – Can install Cygwin: “that Linux feeling under Windows” • Command line prompt – Will look something like: z9693022@newton:~ $ – May be different in different systems; may be customised • Try it now: PuTTY, Host name newton.mech.unsw.edu.au – RSA2 fingerprint: 69:7e:64:75:57:67:ad:4c:21:8e:90:7d:8e:97:70:ce – User name: your zID; Password: your zPass • To exit: exit Simple Linux commands • • • • List files in a directory: ls [pathname ...] – [] indicates optional parameters, ... indicates one or more parameters – Italic fixed-width font indicates replaceable parameters To show the current directory: pwd To change directories: cd directory – ~ is the home directory – .. is the directory above the current one – ~user is the home directory of user user Try it now: cd ~z9693022/src/trader-7.6 ls # List files in current directory cd src pwd; ls # More than one command at a time! cd ..; pwd # You don’t have to enter the comments... Directories and files: paths and pathnames • Files and directories are organised into a hierarchical tree structure • The top of the tree is called the root directory (or simply root), and is denoted as / (slash) • The root directory contains directories, which in turn contain files and directories of their own: / bin etc home share z9693022 apps bin Modules ansys matlab 14.5 15.0 usr share local bin Absolute pathnames • • Any file or directory can be represented as an absolute pathname: – gives the full name of the file or directory – starts with the root “/” – lists each directory along the way – has a “/” to separate each path (or pathname) component For example: the directory /share/apps/ansys/15.0 / bin etc home share z9693022 apps bin Modules ansys matlab 14.5 15.0 usr share local bin Relative pathnames • • • • • • Second way of denoting a file or directory (a pathname) Relative to the current working directory Does not start with the root directory “/” Path components are still separated with slashes “/” Current directory is denoted by “.” (dot) Going up a level is denoted by “..” (dot-dot) • Often just contains a filename with no directories listed • Examples: Assume current directory is /home/z9693022/src/trader-7.6: README → /home/z9693022/src/trader-7.6/README src/trader.c → /home/z9693022/src/trader-7.6/src/trader.c ../trader-7.6.tar.xz → /home/z9693022/src/trader-7.6.tar.xz src/.././README → /home/z9693022/src/trader-7.6/README ./README → /home/z9693022/src/trader-7.6/README Important directories • • • • Home directory: /home/user (e.g., /home/z9693022) Scratch directory for temporary files: /share/scratch/user (but not available on Newton!) Binary directories for utility programs: – /bin — for essential utilities – /usr/bin — for other utilities and some applications – /usr/local/bin — for local utilities and applications – /home/user/bin — for your own utilities On our clusters, applications: /share/apps On our clusters, module files: /share/apps/Modules • Note synonyms: path, pathname, filename • More with pathnames • • • • • To change directories: cd dir To change to your home directory: cd ~ or cd $HOME or cd (by itself) To get current working directory: pwd To show the directory tree structure: tree, tree -d (directories only) To view a file page by page: less filename, “q” to quit, “h” for help • Try it now: cd /home/z9693022/src/trader-7.6 tree -d less README less src/trader.c cd src; pwd less README less ../README # Different from README! Getting help • • • • • Many commands have a myriad of command line options For a brief summary of command line options, try command --help For a full explanation, try man command For some commands, try pinfo command To search for a keyword in the manual: man -k keyword • Remember, “Google is your friend” • Try it now: ls --help cd --help man ls pinfo coreutils man less man cd # Does this work? # See “See Also” section at end # “q” to quit # 1571 lines! # What is “BASH_BUILTINS”? The Bourne Again (Bash) shell • Official manual page entry: Bash is an sh-compatible command language interpreter that executes commands read from the standard input or from a file. Bash also incorporates useful features from the Korn and C shells (ksh and csh). Bash is intended to be a conformant implementation of the Shell and Utilities portion of the IEEE POSIX specification (IEEE Standard 1003.1). Bash can be configured to be POSIX-conformant by default. • • • • • • Interprets your typed commands and executes them Just another Linux program: nothing special about it! Started by the system when you log in You can then start another shell, if you like (e.g., ksh, tcsh, even python) You can start a subshell by running bash To exit a subshell (or the main shell): exit Some features of Bash • • Powerful command line facilities (shortcuts): – Tab completion (press the TAB key to complete commands and pathnames, TAB TAB to list all possibilities) – Command line editing: try ↑ (Up-Arrow) to recall previous commands, CTRL-R (C-R or ^R) to search for previous commands, ← and → to move along current command line A full programming and scripting language: – Variables and arrays – Loops (for; while; until), control statements (if ... then ... else; case) – Functions and coprocesses – Text processing (“expansion” and “parameter substitution”) – Simple arithmetic calculations – Input/output redirection (e.g., redirect output to different files) – Much, much more! (The man page runs to over 5,300 lines) Trying out some features of Bash • Try it now: – cd ~z9693022/src/trader-7.6/src – – – – • Type “less”, then space, but do not press ENTER yet Press TAB once: nothing appears Press TAB a second time: all relevant completions appear Type “f”, then press TAB: the filename is completed to “fileio.” – Press TAB TAB again: two files are listed – Type “h” to select the second file, then press ENTER (and “q” to quit) Try it now: – Press CTRL-R, then type “ls” (but do not press ENTER): previous commands with “ls” in them are listed – Press CTRL-R again a few times: will even list “pinfo coreutils” – Press ENTER when you get to the command you wish to execute – Press CTRL-C if you do not wish to execute any command Listing files and directories • • Already know the ls command: List directory contents In full: ls [options] [pathname ...] • Some options: – “-a” for all files (including those starting with “.”) – “-l” for long (detailed) listing – Options sometimes can be combined: “-alF” Try it now: ls -laF or dir (an alias to “ls -laF”); ll (“ls -lF”) • • • Example of a line in a long listing: -rw-r--r-- 1 z9693022 unsw 1266 May 24 07:59 README The columns of information are: file permissions, number of links (usually 1 for files, 2 or more for directories), file owner, group owner, size in bytes (here, 1266), date last modified, the actual filename (README), with perhaps a trailing “*” for executable files and “/” for directories. File and directory patterns • • • • The Bash shell interprets certain characters in the command line by replacing them with matching pathnames Called pathname expansion, pattern matching, wildcards or globbing For existing pathnames: “*” matches any string, “?” matches any single character, “[...]” matches any one of the enclosed characters Try it now: cd ~z9693022/src/trader-7.6/src; echo 1 2 3 echo *c # All filenames ending in “c”: “.” is not special echo ????.c # All filenames six characters long (4 + “.c”) echo M*m # All filenames starting with “M” and ending with “m” echo [it]* # All filenames starting with either “i” or “t” echo ../lib/uni* # All filenames in ../lib starting with “uni” echo ../*/*.c More file and directory patterns • • Glob patterns “*”, “?” and “[...]” only match existing pathnames Even for pathnames that do not exist: “{alt1,alt2,...}” lists alternatives, “{n..m}” lists all numbers between n and m, “{n..m..s}” in steps of s • Technically called brace expansion • Try it now: ls test-* # “No such file or directory” echo test-* # What happens? echo test-{one,two,three} echo newdir/{one,two,three} echo test-{1..100} echo test-{001..100} # Zero-padding echo test-{1..100..3} # By steps of three echo test-{100..1..-3} # By steps of negative three Naming files and directories • Linux allows any characters in filenames except “/” and the NUL byte • You may create filenames with “weird” characters in them: – spaces and tabs – starting with “-”: conflicts with command line options – question marks “?”, asterisks “*”, brackets and braces – other characters with special meanings: “!”, “$”, “&”, “#”, “"”, etc. • • • • • Just because you can does not mean you should! To match such files: use the glob characters “*” and “?” Linux file systems are case-sensitive: README.TXT is different from readme.txt, which is different from Readme.txt and ReadMe.txt! File type suffixes (e.g., “.txt”) are optional but recommended Filenames starting with “.” are usually hidden from globs and ls output. • Recommendation: Use “a” to “z”, “A” to “Z”, “0” to “9”, “-”, “_” and “.” only. Managing directories • • • To create a directory: mkdir dir ... To create parent directories as well: mkdir -p dir ... To remove an empty directory: rmdir dir ... • Try it now: cd ~; ls mkdir gsoe9400/dir{1,2,3} # Why does this fail? mkdir -p gsoe9400/dir{1,2,3,99} gsoe9400/x ls gsoe9400 rmdir gsoe9400/dir? ls gsoe9400 # Should list dir99 and x only rmdir gsoe9400/* # Be careful... Managing files • • To output one or more file’s contents: cat filename ... To view one or more files page by page: less filename ... • • • • To copy one file: cp source destination To copy one or more files to a directory: cp filename ... dir To preserve the “last modified” time-stamp: cp -p To copy recursively: cp -pr source destination • • • To move one or more files to a different directory: mv filename ... dir To rename a file: mv oldname newname To remove files: rm filename ... • Recommendation: use “ls filename ...” before rm or mv: what happens if you accidentally type “rm *”? or “rm * .c”? (note the space!) Managing files and directories, continued • To copy whole directory trees: cp -pr filename ... destination • • • To copy to and from another Linux system (e.g., from Leonardi to Trentino), use Secure Copy: scp [-p -r] source ... destination – Either source or destination (but not both) can contain a remote system identifier followed by a colon: [user@]system: Can also use rsync or insync: insync [-d] source destination Examples: cp -pr ~z9693022/src/trader-7.6 . scp -p ~/file1.txt leonardi:file2.txt scp -p john@zap.org.au:src/README . mkdir dir1; insync ~/orig dir1 insync /share/scratch/$USER/data1 $HOME/data1 insync leonardi:/share/scratch/$USER/data2 . Managing files and directories, continued • Try it now: cd ~/gsoe9400 cp -pr ~z9693022/src/trader-7.6 .; ls cd trader-7.6; pwd cat build-aux/bootstrap ls */*.c rm */*.c; ls */*.c # What is the output of ls? insync ~z9693022/src/trader-7.6 . mkdir ../new; cp src/trader.c ../new cd ../new; ls mv trader.c new.c; rm new.c cp -p ../trader-7.6/src/trader.* . cp trader.c new.c ls -l trader.c new.c # What is the difference between these files? Transferring files • • To copy files to another Linux system: use scp, rsync or insync To copy files to and from a Windows machine: use WinSCP or scp, rsync or insync under Cygwin • Try it now: – Start WinSCP (Start » All Programs » WinSCP » WinSCP) o Host name newton.mech.unsw.edu.au o RSA2 fingerprint: 69:7e:64:75:57:67:ad:4c:21:8e:90:7d:8e:97:70:ce o User name: your zID; Password: your zPass – Copy ~/gsoe9400/new/new.c to the Windows desktop – Rename it to newnew.c (using the usual Windows right-click or F2) – Copy it back – Under PuTTY: ls newnew.c More Linux commands • • • • • • • • • • • • What machine am I on? hostname What is the date and time? date Who is logged in? who But who is user z1234567? finger [username ...] What is the user name for someone? finger part-of-name What files contains a particular string? grep 'pattern' filename ... What is the difference between two files? diff [-u] file1 file2 How do I rename multiple files at once? rename or prename Where is a file named filename? find dir ... -name filename How big is a file or directory? du -h [filename ...] How much space is available in a directory? df -h [dir ...] How much disk quota do I have? quota -s – “Blocks” is how many disk blocks you are using, in chunks of 1 kB – On Newton: “limit” is 10240M = 10 GB Redirecting input and output • • • • • • • The terminal is treated as just another file (/dev/tty); use CTRL-D to signify the end of file Other special files: /dev/null (an empty file), /dev/zero (an infinite number of binary zeros—can use up your quota in a hurry!) Input and output from a program can be redirected to a file or even piped to another program To redirect output to filename, use “>filename” To append output to filename, use “>>filename” To redirect input from filename, use “<filename” • To connect the output from one program to the input of another (pipes), use “program1 | program2” Multiple pipes are allowed: “program1 | program2 | ... | programn” • • Many utility programs are designed to be used in this way, as filters Output can be substituted into a command line: $(commandline) Redirecting input and output, continued • Try it now: cd ~/gsoe9400/trader-7.6 ls > ../dir-list1 cat ../dir-list1 cat ../dir-list1 | wc -l # How many lines in ../dir-list1? ls ~/gsoe9400/trader-7.6 | wc -l # Same as above rm ../dir-list1 ls -l | grep May # How many files were last modified in May? ls -l | grep May | sort -nk4 # Same, but sort by file size (4th field) who | awk '{print $1}' # Just list first field of “who” output finger $(who | awk '{print $1}') # Full details of who is logged in finger $(who | awk '{print $1}') | less # One page at a time Simple scripting • • • • Shell scripts are just files containing a list of commands to be executed First line (“magic identifier”) must be #!/bin/bash Comments are introduced with “#” The script file must be made executable: chmod a+x filename • Variables: – To set a variable, use varname=value (no spaces!) – To use a variable, use $varname or ${varname} • – Variable names start with a letter, may contain letters, numbers and “_” – Variable names are case-sensitive (as with most things Linux) Functions (parameters are accessed using $1, $2, ...): funcname () { body of function } Simple scripting, continued • For loops: for varname in list ...; do process using ${varname} done • Control statements (multiple “elif” allowed; “elif” and “else” clauses are optional): if [ comparison ]; then if-true statements elif [ second-comparison ]; then if-second-true statements else if-false statements fi Example of comparisons: string1 = string2 (is equal) – See the manual page for test (“man test”) for more information • Simple scripting, continued • While loops: while [ comparison ]; do while-true statements done • Until loops: until [ comparison ]; do while-false statements done • • Many, many other programming features available! Read the manual page: man bash • Some books: – Cameron Newham, Learning the bash Shell, 3rd Edition, O’Reilly Media, March 2005. ISBN 9780596009656, 9780596158965 – William E. Shotts Jr., The Linux Command Line, No Starch Press, January 2012. ISBN 9781593273897, 9781593274269 Editing files under Linux • • • • Use an editor to edit text files Many choices, leading to “religious wars”! Some options: GNU Emacs, Vim, Nano Nano is very simple to use: nano filename • – CTRL-X to exit (you will be asked to save any changes) GNU Emacs and Vim are highly customisable and programmable – For example, see the file ~z9693022/.emacs – Debra Cameron et al., Learning GNU Emacs, 3rd Edition, O’Reilly Media, December 2004. ISBN 9780596006488, 9780596104184 – Arnold Robbins et al., Learning the vi and Vim Editors, 7th Edition, O’Reilly Media, July 2008. ISBN 9780596529833, 9780596159351 • Try it now: cd ~/gsoe9400; nano script1 Creating a simple script file • Try it now, continued: Enter the following text: #!/bin/bash # How much disk quota am I using? # (We want only the last line of "quota" output: # use the "tail" utility) blocks_used=$(quota | tail -n 1 | awk '{print $1}') blocks_limit=$(quota | tail -n 1 | awk '{print $3}') percent=$(( ${blocks_used} * 100 / ${blocks_limit} )) echo "I am using ${blocks_used} blocks (${percent}%)" • Save the file and exit the editor, then: chmod a+x ./script1 ./script1 # Execute the script! (Note the use of “./”) Creating a script with loops • Try it now: – Create and run the file script2, containing the following. What is the output? (Hint: remember “chmod a+x ./script2; ./script2”) #!/bin/bash module load matlab/2014a for n in {01..10}; do echo "n = $n;" >script${n}.m echo "sqrtn = sqrt(n);" >>script${n}.m echo "save('data${n}.txt', 'sqrtn', '-ascii');" \ >>script${n}.m echo "quit" >>script${n}.m matlab -nojvm -r script${n} >/dev/null cat data${n}.txt done Applications on the cluster • • • Applications are managed using the module system Applications are stored in /share/apps Module files are stored in /share/apps/Modules • • Module files set shell environment variables such as PATH PATH controls where applications are searched (the search path) – Try it now: echo $PATH • • • • To see all available applications: module avail To see currently loaded applications: module list To load an application: module load application[/version] To unload an application: module unload application[/version] Submitting jobs to the cluster • • • So far, everything has been run on the head node: a very bad idea! To submit a job to the cluster compute nodes: – Create a shell script file as per normal – Add #PBS directives as required directly after “#!/bin/bash” – Add “cd $PBS_O_WORKDIR” – Execute qsub ./scriptfile – Wait for the job to run, checking its status as required Common #PBS directives (“man qsub” for full details): – – – – – – – #PBS #PBS #PBS #PBS #PBS #PBS #PBS -N -M -m -l -l -l -q scriptname — Set a name for the script email — Send notifications to an email address abe — What notifications to send walltime=hh:mm:ss — How much time is required vmem=sizegb — How much memory is required (GB) nodes=1:ppn=n — Request n processors on one node queuename — Which queue to submit to Checking your job status • • Submit your jobs using “qsub” You will be given a job number in the form jobnumber.systemname • • • Check job status: qstat [jobnumber] Another way: showq Yet another way: pestat or pestat | less -S • – Use ← and → keys to scroll left and right (or expand your terminal!) Show which nodes are reserved: showres -n | less -S • Get overall information about the cluster: visit http://systemname/ganglia/ – e.g., http://newton.mech.unsw.edu.au/ganglia/ – Currently only available within UNSW • Try it now: view the Ganglia page for the Newton cluster. Managing your jobs • • • • To see which nodes exist on the cluster: rocks list host or pestat To see jobs belonging to you: qstat | grep $USER To see when your job will start: showstart jobnumber For more detailed information: checkjob jobnumber • • • • To delete a queued job (whether running or not): qdel jobnumber ... To place a job on hold: qhold jobnumber ... To release a job currently on hold: qrls jobnumber ... To rerun a job (kill it and then restart it): qrerun jobnumber ... • To move a job from one queue to another: qmove destqueue jobnumber ... Submitting and checking a job • Try it now: – Create and change to the directory ~/gsoe9400/job1: mkdir ~/gsoe9400/job1; cd ~/gsoe9400/job1 – Copy the previously created script file script2: cp ../script2 job1 – Edit the file job1 and add the following lines just after “#!/bin/bash”: #PBS -N job1 #PBS -M J.Zaitseff@unsw.edu.au # Do not replace—used to #PBS -m abe # assess you for this class! #PBS -l walltime=00:10:00 #PBS -l vmem=2gb #PBS -l nodes=1:ppn=1 cd $PBS_O_WORKDIR – Submit the script: qsub ./job1 Conclusion You have begun your journey to using High Performance Computing clusters effectively. Well done! John Zaitseff J.Zaitseff@unsw.edu.au Available for consultations on Tuesdays 9:30am–4pm by appointment only. Image credit: John Zaitseff, UNSW http://www.engineering.unsw.edu.au/hpc