• Local Resources
– HPCL Cluster
– PARIO Cluster
– PERE Cluster
– MU Grid hpcl.mscs.mu.edu
pario.eng.mu.edu
pere.marquette.edu
• Regional Resources
– Milwaukee Institute
– SeWhip
• National Resources
– NCSA
– ANL http://www.ncsa.illinois.edu/ http://www.anl.gov/
– TeraGrid Resources http://www.teragrid.org/
• Commercial Resources
– Amazon EC2 http://aws.amazon.com/ec2/
To MARQNET
134.48.
Head Node
Compute Node #1
Compute Node #2
Compute Node #3
128 HP ProLiant BL280c G6 Server Blade
1024 Intel Xeon 5550 Cores (Nehalem)
50 TB raw storage
3 TB main memory
Compute Node #128
1. Get the source code
– You can do it either on your local computer and then transfer to hpcl.mscs.mu.edu, or
– Use vi to edit a new one on hpcl.mscs.mu.edu
2. Compile your source code using mpicc, mpicxx or mpif77
3. Write a submission script for your job
– vi myscript.sh
4. Use qsub to submit the script.
– qsub myscript.sh
hello.c
You can write the code on your development machine using IDE and then transfer the code to the cluster. (Recommended)
For small code, you can also directly edit it on the cluster.
• Method 1: sftp (text or GUI) sftp mscs6060@pere.marquette.edu
put simple.c
bye
• Method 2: scp scp simple.c mscs6060@pere.marquette.edu:example/
• Method 3: rsync rsync -rsh=ssh -av example \ mscs6060@pere.marquette.edu:
• Method 1: Using MPI compiler wrappers
– mpicc: for c code
– mpicxx/mpic++/mpiCC: for c++ code
– mpif77, mpif90: for FORTRAN code
Examples: mpicc –o hello hello.c
mpif90 –o hello hello.f
• Method 2: Using standard compilers with mpi library
Note: MPI is just a library, so you can link the library to your code to get the executables.
Examples: gcc -o ping ping.c \
-I/usr/mpi/gcc/openmpi-1.2.8/include \
-L/usr/mpi/gcc/openmpi-1.2.8/lib64 -lmpi
• A kind of software that provide
– Job submission and automatic execution
– Job monitoring and control
– Resource management
– Priority management
– Checkpoint
– ….
• Usually implemented as master/slave architecture
Commonly used Job Schedulers
PBS: PBS Pro/TORQUE
SGE (Sun Grid Engine, Oracle)
LSF (Platform Computing)
Condor (UW Madison)
• Access
– ssh <your-marquette-id>@pere.marquette.edu
• Account management
– Based on Active Directory, you use the same username and password to login Pere as the one you are using for your Marquette email.
– Need your professor to help you sign up.
• Transfer files from/to Pere
• The Modules package is used to customize your environment settings.
– control what versions of a software package will be used when you compile or run a program.
• Using modules
– module avail check which modules are available
– module load <module> set up shell variables to use a module
– module unload
– module list
– module help remove a module show all loaded modules get help on using module
• Multiple MPI compilers available, each may need different syntax
– OpenMPI compiler (/usr/mpi/gcc/openmpi-1.2.8)
• mpicc –o prog prog.c
• mpif90 –o prog prog.f
– mvapich compiler (/usr/mpi/gcc/mvapich-1.1.0)
• mpicc –o prog prog.c
• mpif90 –o prog prog.f
– PGI compiler (/cluster/pgi/linux86-64/10.2)
• pgcc –Mmpi –o prog prog.c
• pgf90 –Mmpi –o prog prog.f
– Intel compiler
• icc –o prog prog.c –lmpi
• ifort –o prog prog.f -lmpi
• Pere current runs PBS/TORQUE
• TORQUE usage
– qsub myjob.qsub
– qstat
– qdel job-id
– pbsnodes
– pbstop submit job scripts view job status delete job show nodes status show queue status
• #!/bin/sh
•
•
#PBS -N hpl
Assign a name to the job
Request resources: 64 nodes, each with 8 processors, 1 hour
• #PBS -q batch
Submit to batch queue
• #PBS -j oe
Merge stdout and stderr output
• #PBS -o hpl-$PBS_JOBID.log
Redirect output to a file
• cd $PBS_O_WORKDIR
• cat $PBS_NODEFILE
Change work dir to current dir
Print allocated nodes (not required)
• mpirun -np 512 --hostfile `echo $PBS_NODEFILE` xhpl
• Contact me.
• User’s guide for pere
• Resources:
– http://www.cs.wisc.edu/condor/tutorials/
1. Write a submit script – simple.job
Universe = vanilla
Executable = simple
Arguments = 4 10
Log = simple.log
Output = simple.out
Error = simple.error Queue
2. Submit the script to condor pool condor_submit simple.job
3. Watch the job run condor_q condor_q –sub <youusername>
Can put a collections of jobs in the same submit scripts to do a parameter sweep.
Universe = vanilla
Executable = simple
Arguments = 4 10
Log = simple.log
Output = simple .$(Process).
out
Error = simple .$(Process).
error
Queue
Arguments = 4 11
Queue
Arguments = 4 12
Queue
Tell condor to use different output for each job
Use queue to tell the individual jobs
Can be run independently
DAGMAn, lets you submit complex sequences of jobs as long as they can be expressed as a directed acylic graph
Each job in the DAG can only one queue.
Commands: condor_submit_dag simple.dag
./watch_condor_q
Difference from serial jobs: use MPI universe machine_count > 1
When there is no shared file system, transfer executables and output from/to local systems by specifying should_transfer_file and when_to_transfer_output
• How to implement parameter sweep using
SGE/PBS?
• How to implement DAG on SGE/PBS?
• Is there better ways to run the a large number of jobs on the cluster?
• Which resource I should use and where I can find help?
To MARQNET
134.48.
Head Node
Compute Node #1
Compute Node #2
Compute Node #3
Compute Node #4
On Windows: Using SSH Secure Shell or PUTTY
On Linux: Using ssh command
Identify Problem & Analyze
Requirement
Analyze Performance
Bottleneck
Designing Parallel Algorithm
Coding
Writing Parallel Code
Building Binary Code
(Compiling)
Testing Code
Solving Realistic Problems
(Running Production Release)
Compiling
Running
1. Get the source code
– You can do it either on your local computer and then transfer to hpcl.mscs.mu.edu, or
– Use vi to edit a new one on hpcl.mscs.mu.edu
2. Compile your source code using mpicc, mpicxx or mpif77
They are located under /opt/openmpi/bin. Use which command to find it location;
If not in your path, add the next line to your shell initialization file
(e.g., ~/.bash_profile) export PATH=/opt/openmpi/bin:$PATH
3. Write a submission script for your job
– vi myscript.sh
4. Use qsub to submit the script.
– qsub myscript.sh
hello.c
You can write the code on your development machine using IDE and then transfer the code to the cluster. (Recommended)
For small code, you can also directly edit it on the cluster.
• Method 1: sftp (text or GUI) sftp mscs6060@hpcl.mscs.mu.edu
put simple.c
bye
• Method 2: scp scp simple.c mscs6060@hpcl.mscs.mu.edu:example/
• Method 3: rsync rsync -rsh=ssh -av example \ mscs6060@hpcl.mscs.mu.edu:
• Method 4: svn or cvs svn co \ svn+ssh://hpcl.mscs.mu.edu/mscs6060/example
• Method 1: Using MPI compiler wrappers
– mpicc: for c code
– mpicxx/mpic++/mpiCC: for c++ code
– mpif77, mpif90: for FORTRAN code
Examples: mpicc –o hello hello.c
mpif90 –o hello hello.f
Looking the cluster documentation or consulting system administrators for the types of available compilers and their locations.
• Method 2: Using standard compilers with mpi library
Note: MPI is just a library, so you can link the library to your code to get the executables.
Examples: gcc -o ping ping.c \
-I/usr/mpi/gcc/openmpi-1.2.8/include \
-L/usr/mpi/gcc/openmpi-1.2.8/lib64 -lmpi
• A kind of software that provide
– Job submission and automatic execution
– Job monitoring and control
– Resource management
– Priority management
– Checkpoint
– ….
• Usually implemented as master/slave architecture
Commonly used Job Schedulers
PBS: PBS Pro/TORQUE
SGE (Sun Grid Engine, Oracle)
LSF (Platform Computing)
Condor (UW Madison)
• HPCL cluster using SGE as job scheduler
• Basic commands
– qsub submit a job to the batch scheduler
– qstat examine the job queue
– qdel delete a job from the queue
• Other commands
– qconf SGE queue configuration
– qmon graphical user's interface for SGE
– qhost show the status of SGE hosts, queues, jobs
simple.sh
force to use bash for shell interpreter
Request Parallel Environment orte using 64 slots (or processors)
Run the job in specified director
Merge two output files (stdout, stderr)
Redirect output to a log file
Run mpi program
For your program, you may need to change the processor number, the program name at the last line, and the job names.
• SUN Grid Engine User’s Guide http://docs.sun.com/app/docs/doc/817-6117
• Command used commands
– Submit job: qsub
– Check status: qstat
– Delete job: qdel
– Check configuration: qconf
• Check the manual of a command
– man qsub