VIPBG LINUX SERVER INTRODUCTION

advertisement
VIPBG LINUX CLUSTER
By
Helen Wang
Sept. 10, 2014
What is a Beowulf cluster?
In Short
A Beowulf clusters is a group of commodity
computers - connected together through a
local area network. Each computer or
node, which can have a single or multiple
processors, runs its own copy of an Open
Source Unix-like operating system such as
Linux,BSD, or Solaris.
Basic Beowulf Cluster Structure
What is our cluster configuration
•
VIPBG hosts a Linux Beowulf cluster, installed with applications supporting
computationally intensive processing. The following description can be
used for the "Computing Resources" section of grant applications.
- 21 Dell PE R610/R620 servers with CentOS6 64 bits Linux OS
- 200 cores using Intel Xeon 56XX processors (2.67GHz to 3.4GHz)
- Total 1.072TB RAM ( 24GB-64GB per node)
- 100TB network attached storage with 50TB backup storage
- 2.28TB internal disk storage ( 120GB per node)
- GBE /10GBE network connections to all nodes and storage.
- Fail-over redundant master servers
Software available on cluster
•
•
•
•
•
•
•
•
•
•
•
R 3.1.1 with CRAN packages and BIOCONDUCTOR packages
C++/G++ and Fortran compilers
JAVA compiler
Perl compiler
Python 2.7 and 3.4 and Biopython compiler
SAS-9.3 for Linux 64bit
PLINK
HAPLO
MPI and OpenMPI
PBS Pro 11.0.2 (a portable batch system for cluster)
Additional open source software upon user request
Biostat Beowulf Cluster Login Info
•
•
•
•
•
Server Name: group.vipbg.vcu.edu
IP: 128.172.85.5
2nd server as failover: light.vipbg.vcu.edu
IP: 128.172.85.6 (invisible on mission)
Software recommended to access servers:
PC USERS:
1. MobaXterm
http://mobaxterm.mobatek.net/
2. ssh /open ssh / putty / winscp
MAC USERS: Mac Terminal, Fetch
samba connections on window
Access Cluster
Server and nodes
Master node (master1 / master2): group.vipbg.vcu.edu
Running CentOS ( redhat kenrnel)Version 6.4, x86-64
Open source or Software download – choose 64 bits CentOS or RHEL 6 if possible
Purposes:
front-end user interface;
slow; - not for running any jobs. Jobs running on master will be terminated without
notice
accessible from outside through VPN;
Slave nodes (nodes):
node2–node19: dual quart core or 6 core Intel Xeon processors with 64GB GB RAM
Purposes:
computation;
not prefer to access user interface, accessible via master and managed by portable
batch management ( PBS );
fast;
internal network; -10.0.0.X, not accessible directly from outside
Access Group from your computer
How to use MobaXterm or ssh to access server
• Outside of VCU please use VCU VPN Client
http://www.ts.vcu.edu/software-center/general-purpose/juniper-vpn/
• Open new session then SSH to add the server name
• Open “SSH settings” to fill the information
remote hostname: 128.172.85.5
username: YOUR_ACCOUNT_NAME
port number: 22
• Open session settings to put merlot in Session Name
• ssh –X USERACCT@SERVER_IP for graphical access
• Select the server to test the connection and exchange keys by
giving password
• Create profile or bookmark for the easy access every time
UNIX Commands You Need to Know
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
pwd
clear
mkdir
cp
cd
mv
ls
head
more less
wc
man
rm
chmod
grep with head and tail
nano
sed
cut
top
scp acct1@server1:/yourpath/yourfiles acct2@server2:/yourpath/
How to use cluster to submit jobs
IMPORTANT
•
The logon machine (GROUP.VIPBG.VCU.EDU) is only used for login.
Jobs running on merlot will be terminated without notice.
What do you need to submit a job via PBS
•
an executable script, the script can be a program code ( R, SAS or other language) , shell script or
a collection of command lines.
•
test how much resources and time you may need before you submit multiple jobs.
•
which queue to use to submit a job
Node and queue configuration
•
Nodes: Nodes are the physical computer servers incorporated together to make the cluster.
•
Queues: Queues are being used by pbs scheduler to send jobs to different nodes, each node is
assigned to a queue to handle different type of jobs.
•
Most queue limits can be checked by running the command
qstat -q.
Note that if you need more job permissions, please send request to system admin and your
supervisor to get a temporary expansion on job submissions
Nodes and queues configuration
Queue
Name
Node
assigned
Job Limit
(proc/user)
Total Cores
/Ram Size
Comments
express
2
2
4 cores /
24GB
run small urgent jobs
serial
4-7, 11-16
20
120 cores /
600GB
Run R and generic
jobs RAM<5GB
openmx
8-10
10
69 cores /
192GB
Openmx and parallel
jobs
workq
17-19
10
24 cores /
128GB
Run large memory
jobs RAM >5GB
mxq
3
10
24 cores /
128GB
Run traditional mx
jobs
Submitting a Job
Jobs are submitted to a PBS queue so that PBS can dispatch
them to be run on one or more of a cluster's compute nodes.
There are two main types of PBS jobs:
• Non-interactive Batch Jobs: This is the most common PBS job. A
job script is created that contains PBS resource requests and the
commands necessary to execute the job. The job script is then
submitted to PBS to be run non-interactively.
• Interactive Batch Jobs: This is a way to get an interactive terminal
on one or more of the compute nodes of a cluster. Commands can
then be run interactively through that terminal directly on the
compute nodes for the duration of the job. Interactive jobs are
helpful for such things as program debugging and running many
short jobs.
A PBS script is a standard Unix/Linux shell script that contains a few extra comments
at the beginning that specify directives to PBS. These comments all begin with #PBS.
•
The most important PBS directives are:
PBS Directives
Description
#PBS –I walltime=HH:MM:SS
This directive specifies the maximum walltime (real time, not CPU time)
that a job should take. If this limit is exceeded, PBS will stop the job.
Keeping this limit close to the actual expected time of a job can allow a
job to start more quickly than if the maximum walltime is always
requested.
#PBS -l pmem=SIZEgb
This directive specifies the maximum amount of physical memory used
by any process in the job. For example, if the job would run four
processes and each would use up to 2 GB (gigabytes) of memory,
then the directive would read #PBS -l pmem=2gb
#PBS -l nodes=N:ppn=M
This specifies the number of nodes (nodes=N) and the number of
processors per node (ppn=M) that the job should use. PBS treats a
processor core as a processor, so a system with eight cores per
compute node can have ppn=8 as its maximum ppn request. Note that
unless a job has some inherent parallelism of its own through something
like MPI or OpenMPI, requesting more than a single processor on a
single node is usually wasteful and can impact the job start time.
#PBS -q queuename
This specifies what PBS queue a job should be submitted to. This is only
necessary if a user has access to a special queue. This option can and
should be omitted for jobs being submitted to a system's default queue.
#PBS -j oe
Normally when a command runs it prints its output to the screen. This
output is often normal output and error output. This directive tells PBS to
put both normal output and error output into the same output file.
An example of PBS script
#This is a sample PBS script. It will request 1 processor on 1 node for 10 hours.
#
#Request 1 processors on 1 node
#
#PBS -l nodes=1:ppn=1
#
#Request 10 hours of walltime
#
#PBS -l walltime=10:00:00
#
#Request 1 gigabyte of memory per process
#
#PBS -l mem=1gb
#
#Request that regular output and terminal output go to the same file
#
#PBS -j oe
#
#The following is the body of the script. By default, PBS scripts execute in your home directory, not the
#directory from which they were submitted. The following line places you in the directory from which the job
#was submitted.
#
cd $PBS_O_WORKDIR
#
#Now we want to run the program "hello". "hello" is in the directory that this script is being submitted from,
#$PBS_O_WORKDIR.
#
echo " "
echo " "
echo "Job started on `hostname` at `date`"
./hello
echo " "
echo "Job Ended at `date`"
echo " "
Template used on cluster
• Modify template to create your own pbs script for running programs
#!/bin/bash
#PBS -q serial
#PBS -N MYSCRIPT
#
#
# cd to the directory from which I submitted the job.
# Otherwise it will execute in my home directory.
#
set WORKDIR = ~/YOURWORDIR
#PBS -V
#echo “PBS batch job id is $PBS_JOBID“
echo "Working directory of this job is: " $WORKDIR
#
echo "Beginning to run job“
Command line you need to execute the job ( /home/huan/bin/calculate PARAMETEERS)
• Job Submission Syntax
qsub SCRIPTFILE
Existing job submission scripts
–
–
–
–
–
/usr/local/bin/q*
R USERS qR YOUR_R_SCRIPT
Large memory R jobs qRL YOUR_R_SCRIPT
SAS USERS qsas YOUR_SAS_CODE
Generic or other resources qsub YOUR_OWN_SCRIPT
Interactive Batch Jobs
•
Interactive PBS jobs are similar to non-interactive PBS jobs in that they are submitted to PBS via the command qsub.
When submitting an interactive PBS job, PBS script is not necessary. All PBS directives can be specified on the command line.
The syntax for qsub for submitting an interactive PBS job is:
qsub -I ... pbs directives..
•
The -I flag above tells qsub that this is an interactive job.
•
The following example shows using qsub to submit an interactive job using one processor on one node for four hours
merlot:~$ qsub -I -l nodes=1:ppn=1 -l walltime=4:00:00
qsub: waiting for job 1064159.merlot.bis.vcu.edu start
qsub: job 1064159.merlot.bis.vcu.edu ready
node12:~$
•
There are two things of note here. The first is that the qsub command doesn't exit when run with the interactive -I flag. Instead, it
waits until the job is started and gives a prompt on the first compute node assigned to a job. The second thing of note is the
prompt node12:~$ - this shows that commands are now being executed on the compute node node12.
Monitoring and Managing Jobs
•
Check Job Status using qstat
Command
Description
qstat
Shows the status of all PBS jobs. The time displayed
is the CPU time used by the job.
qstat –s
qstat -a
Shows the status of all PBS jobs. The time displayed
is the walltime used by the job.
qstat –u USERID
Shows the status all PBS jobs submitted by the user
userid. The time displayed is the walltime used by the
job.
qstat -n
Shows the status all PBS jobs along with a list of
compute nodes that the job is running on.
qstat –f JOBID
Shows detailed information about the job jobid.
Job Running Status
State
meaning
Q
The job is queued and is waiting to start.
R
The job is currently running
E
The job is currently ending.
H
The job has a user or system hold on it and will not be eligible
to run until the hold is removed.
Managing jobs
• Deleting jobs
- qdel JOBID
delete a job by Job_ID
- qdel $(qselect –u USERNAME) delete all jobs owned by USERNAME
• View job output
If the PBS directive #PBS -j oe is used in a PBS script, the non-error and the
error output are both written to the Jobname.oJob_ID file.
JobName.oJobID : This file would contain the non-error output that would
normally be written to the screen.
JobName.eJobID: This file would contain the error output that would
normally be written to the screen.
More to monitor a node
• To check a node configuration
$pbsnodes NODE#
• To check a node status
nodestatus NODE#
• Limitation for the name of the SCRIPT
No more than 10 characters
no space in between
no special characters.
use a temporary name if necessary and change it back when the job
is done.
At Last
• Edit file using nano or vi
http://www.ts.vcu.edu/faq/unix/picoeditor.html
http://www.ts.vcu.edu/faq/unix/vieditor.html
• use samba connection to map a network drive on PC,
recommending to use “EditPad Lite”
• Some one uses Rmate for macbook
• Useful links
http://www.ts.vcu.edu/faq/unix/docs.html
• Wiki page for vipbg cluster – need vcu eID to login
https://wiki.vcu.edu/display/vipbgit/VIPBG+Cluster+System
Download