HPC Basics

advertisement
High Performance Computing Basics
April 17, 2007
Dr. David J. Haglin
17-April-2007
Outline
 What is the HPC?
 Where did it come from?
 How can you get an account on hpc.mnsu.edu?
 How can you use it for your research?
 Where do you go from here?
17-April-2007
What is the HPC?
 Many AMD Opteron
Computers (nodes) in a
rack
 Connected by a highspeed network
 In the IT Services Secure
area (third floor of the
library)
 All nodes run linux
 http://www.mnsu.edu/hpc
17-April-2007
What is the HPC?
 Head node has 8GB RAM;
7.4 TB of Disk
•Head Node
 Head node is for doing
administrative work and
starting long jobs
 The 34 Worker nodes are
for doing long
computations
•Worker 1
17-April-2007
•…
•Worker 34
 Each worker has 8GB
RAM; 80 GB Hard Disk; 2
dual-core AMD Opteron
What is the HPC?
 Software Installed:
< GNU languages: C/C++ (gcc/g++), Fortran (gfortran)
< Message Passing Interface library OpenMPI
 Software soon to be installed:
<
<
<
<
MATLAB
Fluent
Portland Group Fortran and C/C++
IMSL
 Email is “local delivery only”
17-April-2007
Where did it come from?
 National Science Foundation
Grant
< MRI Program (Major
Research Instrumentation)
< $140,000
< Institutional Equipment
funds upgraded machine by
adding five nodes
 PIs: Patrick Tebbe, Rebecca
Bates, David Haglin
 Proposal focused on a
college-wide need for HPC
 Vendor: PSSC Labs, Inc.
17-April-2007
How can you get an account?
 We must submit a final report to NSF after July 31, 2009
 Part of the final report must include how much it was used
within CSET (and within MSU).
 We need to track usage (research projects).
 To get an account, send an email to haglin@mnsu.edu with
information as described:
< http://www.mnsu.edu/hpc/accounts.html
< Your students can get accounts too!
 We are very interested in knowing about publications you
obtain as a result of using hpc.mnsu.edu.
17-April-2007
Your Research
 Okay, so you got an account.
 Now What?
17-April-2007
Your Research
 Learning to use HPC.
 Learning to use the OpenPBS/Torque job
queuing software.
 Learning to “design” your usage.
 Tutorials will be maintained at
www.mnsu.edu/hpc
17-April-2007
Your Research
 Connect to hpc.mnsu.edu (head node) using ssh
< ssh on unix
< PuTTY or SSH Windows Client (IT Services)
< Firewall is pretty tight, may need to request a new
opening in the firewall from your location
 Line-mode (command-line) interface
 Basic unix commands:
< http://www.mnsu.edu/hpc/tutorials/linux_basics.doc
17-April-2007
Your Research
 Disks on hpc:
17-April-2007
Your Research
 Using OpenPBS/Torque job queuing software:
<
<
<
<
<
<
qstat
-- Inspect current job queue
qsub
-- Add a new job to the queue
qdel
-- Delete one of your jobs from the Q
pbsmon.py
-- See the state of the entire machine
xpbsmon
-- Uses X11 to display machine state
firefox localhost/ganglia
 Detailed information available at:
< http://www.clusterresources.com/torquedocs21/users
manual.shtml
17-April-2007
Your Research
 Designing your usage.
< Assume you have a program you want to run for
different parameter values of 1 through 1000
< Ex:
$ myProgram -p1
$ myProgram -p2
.
.
$ myProgram -p1000
17-April-2007
Your Research
 Create 1000 “start scripts” to queue 1000 jobs to
the master queue.
 Start your jobs and monitor their progress
 Combine results when they are all done.
 Organize experiments/runs in folders
 Use scripting languages such as python to
generate start scripts.
17-April-2007
Your Research
 Input and Output for your jobs:
< Your script will start on a worker node
< You can log in to a worker node to see filesystem:
 ssh n04
 df
< Standard Output and Standard Error are separate
< Files are written alongside your script when jobs
completes
< No way to monitor progress of your computation
17-April-2007
Your Research
 Sample script to run from 501 to 505:
17-April-2007
Where do you go from here?
 www.mnsu.edu/hpc is a communication portal
 Find colleagues who can help
 Learn more about the capabilities:
< New software
< Parallel programming (MPI)
< Parallel libraries: e.g., ScaLAPACK.
 Keep this machine computing fast
 Other ideas?
17-April-2007
Download