Introduction to HPC Workshop October 9 2014 Introduction Rob Lane HPC Support Research Computing Services CUIT Introduction HPC Basics Introduction First HPC Workshop Yeti • 2 head nodes • 101 execute nodes • 200 TB storage Yeti • 101 execute nodes – 38 x 64 GB – 8 x 128 GB – 35 x 256 GB – 16 x 64 GB + Infiniband – 4 x 64 GB + nVidia K20 GPU Yeti • CPU – Intel E5-2650L – 1.8 GHz – 8 Cores – 2 per Execute Node Yeti • Expansion Round – 66 new systems – Faster CPU – More Infiniband – More GPU (nVidia K40) – ETA January 2015 Yeti HP S6500 Chassis HP SL230 Server Job Scheduler • Manages the cluster • Decides when a job will run • Decides where a job will run • We use Torque/Moab Job Queues • Jobs are submitted to a queue • Jobs sorted in priority order • Not a FIFO Access Mac Instructions 1. Run terminal Access 1. 2. 3. 4. 5. Windows Instructions Search for putty on Columbia home page Select first result Follow link to Putty download page Download putty.exe Run putty.exe Access Mac (Terminal) $ ssh UNI@yetisubmit.cc.columbia.edu Windows (Putty) Host Name: yetisubmit.cc.columbia.edu Work Directory $ cd /vega/free/users/your UNI • Replace “your UNI” with your UNI $ cd /vega/free/users/hpc2108 Copy Workshop Files • Files are in /tmp/workshop $ cp /tmp/workshop/* . Editing No single obvious choice for editor • vi – simple but difficult at first • emacs – powerful but complex • nano – simple but not really standard nano $ nano hellosubmit “^” means “hold down control” ^a : go to beginning of line ^e : go to end of line ^k: delete line ^o: save file ^x: exit hellosubmit #!/bin/sh # Directives #PBS #PBS #PBS #PBS #PBS #PBS -N -W -l -M -m -V HelloWorld group_list=yetifree nodes=1:ppn=1,walltime=00:01:00,mem=20mb UNI@columbia.edu abe # Set output and error directories #PBS -o localhost:/vega/free/users/UNI #PBS -e localhost:/vega/free/users/UNI # Print "Hello World" echo "Hello World" # Sleep for 10 seconds sleep 10 # Print date and time date hellosubmit #!/bin/sh # Directives #PBS #PBS #PBS #PBS #PBS #PBS -N -W -l -M -m -V HelloWorld group_list=yetifree nodes=1:ppn=1,walltime=00:01:00,mem=20mb UNI@columbia.edu abe # Set output and error directories #PBS -o localhost:/vega/free/users/UNI #PBS -e localhost:/vega/free/users/UNI # Print "Hello World" echo "Hello World" # Sleep for 10 seconds sleep 10 # Print date and time date hellosubmit #!/bin/sh # Directives #PBS #PBS #PBS #PBS #PBS #PBS -N -W -l -M -m -V HelloWorld group_list=yetifree nodes=1:ppn=1,walltime=00:01:00,mem=20mb UNI@columbia.edu abe hellosubmit #!/bin/sh # Directives #PBS #PBS #PBS #PBS #PBS #PBS -N -W -l -M -m -V HelloWorld group_list=yetifree nodes=1:ppn=1,walltime=00:01:00,mem=20mb UNI@columbia.edu abe hellosubmit #!/bin/sh # Directives #PBS #PBS #PBS #PBS #PBS #PBS -N -W -l -M -m -V HelloWorld group_list=yetifree nodes=1:ppn=1,walltime=00:01:00,mem=20mb UNI@columbia.edu abe hellosubmit #!/bin/sh # Directives #PBS #PBS #PBS #PBS #PBS #PBS -N -W -l -M -m -V HelloWorld group_list=yetifree nodes=1:ppn=1,walltime=00:01:00,mem=20mb UNI@columbia.edu abe hellosubmit #!/bin/sh # Directives #PBS #PBS #PBS #PBS #PBS #PBS -N -W -l -M -m -V HelloWorld group_list=yetifree nodes=1:ppn=1,walltime=00:01:00,mem=20mb UNI@columbia.edu abe hellosubmit #!/bin/sh # Directives #PBS #PBS #PBS #PBS #PBS #PBS -N -W -l -M -m -V HelloWorld group_list=yetifree nodes=1:ppn=1,walltime=00:01:00,mem=20mb UNI@columbia.edu abe hellosubmit #!/bin/sh # Directives #PBS #PBS #PBS #PBS #PBS #PBS -N -W -l -M -m -V HelloWorld group_list=yetifree nodes=1:ppn=1,walltime=00:01:00,mem=20mb UNI@columbia.edu abe hellosubmit #!/bin/sh # Directives #PBS #PBS #PBS #PBS #PBS #PBS -N -W -l -M -m -V HelloWorld group_list=yetifree nodes=1:ppn=1,walltime=00:01:00,mem=20mb UNI@columbia.edu abe hellosubmit #!/bin/sh # Directives #PBS #PBS #PBS #PBS #PBS #PBS -N -W -l -M -m -V HelloWorld group_list=yetifree nodes=1:ppn=1,walltime=00:01:00,mem=20mb UNI@columbia.edu abe hellosubmit #!/bin/sh # Directives #PBS #PBS #PBS #PBS #PBS #PBS -N -W -l -M -m -V HelloWorld group_list=yetifree nodes=1:ppn=1,walltime=00:01:00,mem=20mb UNI@columbia.edu abe hellosubmit #!/bin/sh # Directives #PBS #PBS #PBS #PBS #PBS #PBS -N -W -l -M -m -V HelloWorld group_list=yetifree nodes=1:ppn=1,walltime=00:01:00,mem=20mb UNI@columbia.edu n hellosubmit #!/bin/sh # Directives #PBS #PBS #PBS #PBS #PBS #PBS -N -W -l -M -m -V HelloWorld group_list=yetifree nodes=1:ppn=1,walltime=00:01:00,mem=20mb UNI@columbia.edu n hellosubmit # Set output and error directories #PBS -o localhost:/vega/free/users/UNI #PBS -e localhost:/vega/free/users/UNI hellosubmit # Set output and error directories #PBS -o localhost:/vega/free/users/UNI #PBS -e localhost:/vega/free/users/UNI hellosubmit # Print "Hello World" echo "Hello World" # Sleep for 10 seconds sleep 10 # Print date and time date hellosubmit $ qsub hellosubmit hellosubmit $ qsub hellosubmit 298151.elk.cc.columbia.edu $ hellosubmit $ qsub hellosubmit 298151.elk.cc.columbia.edu $ qstat $ qsub hellosubmit 298151.elk.cc.columbia.edu $ qstat 298151 Job ID Name User Time Use S Queue ---------- ------------ ---------- -------- - ----298151.elk HelloWorld hpc2108 0 Q batch1 hellosubmit $ qsub hellosubmit 298151.elk.cc.columbia.edu $ qstat 298151 Job ID Name User Time Use S Queue ---------- ------------ ---------- -------- - ----298151.elk HelloWorld hpc2108 0 Q batch1 hellosubmit $ qsub hellosubmit 298151.elk.cc.columbia.edu $ qstat 298151 Job ID Name User Time Use S Queue ---------- ------------ ---------- -------- - ----298151.elk HelloWorld hpc2108 0 Q batch1 hellosubmit $ qsub hellosubmit 298151.elk.cc.columbia.edu $ qstat 298151 Job ID Name User Time Use S Queue ---------- ------------ ---------- -------- - ----298151.elk HelloWorld hpc2108 0 Q batch1 hellosubmit $ qsub hellosubmit 298151.elk.cc.columbia.edu $ qstat 298151 Job ID Name User Time Use S Queue ---------- ------------ ---------- -------- - ----298151.elk HelloWorld hpc2108 0 Q batch1 hellosubmit $ qsub hellosubmit 298151.elk.cc.columbia.edu $ qstat 298151 Job ID Name User Time Use S Queue ---------- ------------ ---------- -------- - ----298151.elk HelloWorld hpc2108 0 Q batch1 hellosubmit $ qsub hellosubmit 298151.elk.cc.columbia.edu $ qstat 298151 Job ID Name User Time Use S Queue ---------- ------------ ---------- -------- - ----298151.elk HelloWorld hpc2108 0 Q batch1 $ qstat 298151 qstat: Unknown Job Id Error 298151.elk.cc.columbia.edu hellosubmit $ ls -l total 4 -rw------- 1 hpc2108 yetifree 398 Oct -rw------- 1 hpc2108 yetifree 0 Oct -rw------- 1 hpc2108 yetifree 41 Oct 8 22:13 hellosubmit 8 22:44 HelloWorld.e298151 8 22:44 HelloWorld.o298151 hellosubmit $ ls -l total 4 -rw------- 1 hpc2108 yetifree 398 Oct -rw------- 1 hpc2108 yetifree 0 Oct -rw------- 1 hpc2108 yetifree 41 Oct 8 22:13 hellosubmit 8 22:44 HelloWorld.e298151 8 22:44 HelloWorld.o298151 hellosubmit $ ls -l total 4 -rw------- 1 hpc2108 yetifree 398 Oct -rw------- 1 hpc2108 yetifree 0 Oct -rw------- 1 hpc2108 yetifree 41 Oct 8 22:13 hellosubmit 8 22:44 HelloWorld.e298151 8 22:44 HelloWorld.o298151 hellosubmit $ ls -l total 4 -rw------- 1 hpc2108 yetifree 398 Oct -rw------- 1 hpc2108 yetifree 0 Oct -rw------- 1 hpc2108 yetifree 41 Oct 8 22:13 hellosubmit 8 22:44 HelloWorld.e298151 8 22:44 HelloWorld.o298151 hellosubmit $ ls -l total 4 -rw------- 1 hpc2108 yetifree 398 Oct -rw------- 1 hpc2108 yetifree 0 Oct -rw------- 1 hpc2108 yetifree 41 Oct 8 22:13 hellosubmit 8 22:44 HelloWorld.e298151 8 22:44 HelloWorld.o298151 hellosubmit $ cat HelloWorld.o298151 Hello World Thu Oct 9 12:44:05 EDT 2014 hellosubmit $ cat HelloWorld.o298151 Hello World Thu Oct 9 12:44:05 EDT 2014 Any Questions? Interactive • • • • Most jobs run as “batch” Can also run interactive jobs Get a shell on an execute node Useful for development, testing, troubleshooting Interactive $ cat interactive qsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb Interactive $ cat interactive qsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb Interactive $ cat interactive qsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb Interactive $ cat interactive qsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb Interactive $ cat interactive qsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb Interactive $ cat interactive qsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb Interactive $ qsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb qsub: waiting for job 298158.elk.cc.columbia.edu to start Interactive qsub: job 298158.elk.cc.columbia.edu ready .--. ,-,-,--(/o o\)-,-,-,. ,' // oo \\ ', ,' /| __ |\ ', ,' //\,__,/\\ ', , /\ /\ , , /'`\ /' \ , | /' `\ /' '\ | | \ ( ) / | ( /\| /' '\ |/\ ) \| /' /'`\ '\ |/ | /' `\ | ( ( ) ) `\ \ /' /' `\ \ /' /' / / \ \ v v v v v v +--------------------------------+ | | | You are in an interactive job. | | | | Your walltime is 00:05:00 | | | +--------------------------------+ Interactive $ hostname charleston.cc.columbia.edu Interactive $ exit logout qsub: job 298158.elk.cc.columbia.edu completed $ GUI • Can run GUI’s in interactive jobs • Need X Server on your local system • See user documentation for more information User Documentation • hpc.cc.columbia.edu • Go to “HPC Support” • Click on Yeti user documentation Job Queues • Scheduler puts all jobs into a queue • Queue selected automatically • Queues have different settings Job Queues Queue Time Limit Memory Limit Max. User Run Batch 1 12 hours 4 GB 512 Batch 2 12 hours 16 GB 128 Batch 3 5 days 16 GB 64 Batch 4 3 days None 8 Interactive 4 hours None 4 qstat -q $ qstat -q server: elk.cc.columbia.edu Queue Memory CPU Time Walltime Node ---------------- ------ -------- -------- ---batch1 4gb -12:00:00 -batch2 16gb -12:00:00 -batch3 16gb -120:00:0 -batch4 --72:00:00 -interactive --04:00:00 -interlong --48:00:00 -route ----- Run Que Lm --- --- -42 15 -129 73 -148 261 -11 12 -0 1 -0 0 -0 0 ------ ----330 362 State ----E R E R E R E R E R E R E R yetifree • Maximum processors limited – Currently 4 maximum • Storage quota – 16 GB • No email support yetifree $ quota -s Disk quotas for user hpc2108 (uid 242275): Filesystem blocks quota limit grace hpc-cuit-storage-2.cc.columbia.edu:/free/ 122M 16384M 16384M files quota limit 8 4295m 4295m grace yetifree $ quota -s Disk quotas for user hpc2108 (uid 242275): Filesystem blocks quota limit grace hpc-cuit-storage-2.cc.columbia.edu:/free/ 122M 16384M 16384M files quota limit 8 4295m 4295m grace email from: to: date: subject: root <hpc-noreply@columbia.edu> hpc2108@columbia.edu Wed, Oct 8, 2014 at 11:41 PM PBS JOB 298161.elk.cc.columbia.edu PBS Job Id: 298161.elk.cc.columbia.edu Job Name: HelloWorld Exec host: dublin.cc.columbia.edu/4 Execution terminated Exit_status=0 resources_used.cput=00:00:02 resources_used.mem=8288kb resources_used.vmem=304780kb resources_used.walltime=00:02:02 Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161 Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161 email from: to: date: subject: root <hpc-noreply@columbia.edu> hpc2108@columbia.edu Wed, Oct 8, 2014 at 11:41 PM PBS JOB 298161.elk.cc.columbia.edu PBS Job Id: 298161.elk.cc.columbia.edu Job Name: HelloWorld Exec host: dublin.cc.columbia.edu/4 Execution terminated Exit_status=0 resources_used.cput=00:00:02 resources_used.mem=8288kb resources_used.vmem=304780kb resources_used.walltime=00:02:02 Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161 Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161 email from: to: date: subject: root <hpc-noreply@columbia.edu> hpc2108@columbia.edu Wed, Oct 8, 2014 at 11:41 PM PBS JOB 298161.elk.cc.columbia.edu PBS Job Id: 298161.elk.cc.columbia.edu Job Name: HelloWorld Exec host: dublin.cc.columbia.edu/4 Execution terminated Exit_status=0 resources_used.cput=00:00:02 resources_used.mem=8288kb resources_used.vmem=304780kb resources_used.walltime=00:02:02 Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161 Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161 email from: to: date: subject: root <hpc-noreply@columbia.edu> hpc2108@columbia.edu Wed, Oct 8, 2014 at 11:41 PM PBS JOB 298161.elk.cc.columbia.edu PBS Job Id: 298161.elk.cc.columbia.edu Job Name: HelloWorld Exec host: dublin.cc.columbia.edu/4 Execution terminated Exit_status=0 resources_used.cput=00:00:02 resources_used.mem=8288kb resources_used.vmem=304780kb resources_used.walltime=00:02:02 Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161 Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161 email from: to: date: subject: root <hpc-noreply@columbia.edu> hpc2108@columbia.edu Wed, Oct 8, 2014 at 11:41 PM PBS JOB 298161.elk.cc.columbia.edu PBS Job Id: 298161.elk.cc.columbia.edu Job Name: HelloWorld Exec host: dublin.cc.columbia.edu/4 Execution terminated Exit_status=0 resources_used.cput=00:00:02 resources_used.mem=8288kb resources_used.vmem=304780kb resources_used.walltime=00:02:02 Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161 Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161 email from: to: date: subject: root <hpc-noreply@columbia.edu> hpc2108@columbia.edu Wed, Oct 8, 2014 at 11:41 PM PBS JOB 298161.elk.cc.columbia.edu PBS Job Id: 298161.elk.cc.columbia.edu Job Name: HelloWorld Exec host: dublin.cc.columbia.edu/4 Execution terminated Exit_status=0 resources_used.cput=00:00:02 resources_used.mem=8288kb resources_used.vmem=304780kb resources_used.walltime=00:02:02 Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161 Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161 email from: to: date: subject: root <hpc-noreply@columbia.edu> hpc2108@columbia.edu Wed, Oct 8, 2014 at 11:41 PM PBS JOB 298161.elk.cc.columbia.edu PBS Job Id: 298161.elk.cc.columbia.edu Job Name: HelloWorld Exec host: dublin.cc.columbia.edu/4 Execution terminated Exit_status=0 resources_used.cput=00:00:02 resources_used.mem=8288kb resources_used.vmem=304780kb resources_used.walltime=00:02:02 Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161 Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161 email from: to: date: subject: root <hpc-noreply@columbia.edu> hpc2108@columbia.edu Wed, Oct 8, 2014 at 11:41 PM PBS JOB 298161.elk.cc.columbia.edu PBS Job Id: 298161.elk.cc.columbia.edu Job Name: HelloWorld Exec host: dublin.cc.columbia.edu/4 Execution terminated Exit_status=0 resources_used.cput=00:00:02 resources_used.mem=8288kb resources_used.vmem=304780kb resources_used.walltime=00:02:02 Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161 Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161 Intern • Research Computing Services (RCS) is looking for an intern • Paid position • ~10 hours a week • Will be on LionShare next week MPI • Message Passing Interface • Allows applications to run across multiple computers MPI • Edit MPI submit file • Load MPI environment module • Compile sample program MPI #!/bin/sh # Directives #PBS #PBS #PBS #PBS #PBS #PBS -N -W -l -M -m -V MpiHello group_list=yetifree nodes=3:ppn=1,walltime=00:01:00,mem=20mb UNI@columbia.edu abe # Set output and error directories #PBS -o localhost:/vega/free/users/UNI #PBS -e localhost:/vega/free/users/UNI # Load mpi module. module load openmpi # Run mpi program. mpirun mpihello MPI $ module load openmpi $ which mpicc /usr/local/openmpi/bin/mpicc $ mpicc -o mpihello mpihello.c MPI $ qsub mpisubmit 298501.elk.cc.columbia.edu Questions? Any questions?