Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Professional Administration training Rajiv Jaisankar Technical Specialist Altair APAC Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Chapter One: Understanding PBS Professional Chapter One What is PBS Professional? History of PBS Professional PBS Works Online Store PBS Professional Documentation Altair Global Offices & Technical Support Broad Hardware and Operating System Support Supported MPI Libraries PBS Professional Components & Roles 2 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. What is PBS Professional? Workload management solution that maximizes the efficiency and utilization of high-performance computing (HPC) resources and improves job turnaround. Robust Workload Management Floating flex-based licenses Scalability, with flexible queues Job arrays User and administrator interface Job suspend/resume Application checkpoint/restart Automatic file staging Accounting logs Access control lists Advanced Scheduling Algorithms Resource-based scheduling Preemptive scheduling Optimized node sorting Enhanced job placement Advance & standing reservations Cycle harvesting across workstations Scheduling across multiple complex Network topology scheduling Manages both batch and interactive work Reliability, Availability and Scalability Server failover feature Automatic job recovery Provides system monitoring Provides integration with MPI solutions Tested to manage 1,000,000+ jobs per day 3 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. History of PBS Professional 1993-97: Developed for NASA to replace NQS 2000: Veridian formed commercial version of PBS; Released PBS Professional 5.0 2003: Altair acquired PBS Professional technology and engineering; Released PBS Professional 5.3 2004: Released PBS Professional 5.4 2005: Released PBS Professional 7.0 and 7.1 2006: Released PBS Professional 8.0 2007: Released PBS Professional 9.0 and 9.1 2008: Released PBS Professional 9.2 2008: Released PBS Professional 10.0 2009: Released PBS Professional 10.1 2009: Released PBS Professional 10.2 2010: Released PBS Professional 10.4 2010: Released PBS Professional 11.0 2011: Released PBS Professional 11.1 4 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Broad Hardware & Operating System Support AMD-Linux and Windows Intel-Linux and Windows IBM AIX on Power IBM Linux on Power HP-UX on Itanium 2 Cray X2, XT, XT3, XT4, XT5, and XT6 SGI Altix ICE, XI, and UV SUN Solaris on SPARC Windows 7, XP, Vista, Server 2003, and Server 2008 Red Hat Enterprise 4, 5, and 6 SLES 9, 10, and 11 Note: For a detailed list of supported systems & OS please refer to the latest release notes 5 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Supported MPI Libraries Currently supported MPI libraries integrated with PBS: • MPICH 1.2.5, 1.2.6 on Linux 2.4 on, x86, AMD64, EM64T, Itanium2 • MPICH 1.2.5, 1.2.6 on Linux 2.6 on x86, AMD64, EM64T • MPICH 1.2.7 on x86 Linux • MPICH-GM on Linux • Intel MPI 2.0.22 on Linux • MPICH2 1.0.3, 1.0.5, 1.0.7 on Linux • IBM POE on AIX 5.x, and 6.x , including HPS support • HP MPI 1.08.03 on HP-UX 11 on Itanium 2 • HP MPI 2.0.0 on Linux 2.4 & 2.6 on x86, AMD64, EM64T, Itanium 2 • LAM/MPI 6.5.9/7.0.6/7.1.1 on Linux 2.4/2.6 on x86, AMD64, EM64T, Itanium 2 • SGI MPI (MPT) on Linux on Altix / Itanium 2/x86_64 and XE • SGI MPI (MPT) over Infiniband • MVAPICH 1.2.7/2.0 on Linux • OpenMPI 1.4.2 on Linux 6 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Professional Components & Roles Batch Server Scheduler MOM * - referred to as the PBS Server - central focus for a PBS complex - routes job to compute host * - processes all PBS related commands * - provides the basic batch services * - server maintains its own server and queue settings * - daemon executes as pbs_server.bin - referred to as the PBS Scheduler - queries list of running and queued jobs from the PBS Server * - queries queue, server, and node properties * - queries resource consumption and availability from the PBS MOM * - sorts available jobs according to local scheduling policies - determines which job is eligible to run next - daemon executing as pbs_sched - referred to as the PBS MOM - executes jobs at request of PBS Scheduler - monitors resource usage of running jobs - enforces resource limits on jobs - reports system resource limits, configuration * - daemon executing as pbs_mom This information is for debugging purposes only. It may change in future releases and should not be relied upon. 7 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Complex Configurations Single Execution System Server MOM Scheduler All 3 PBS components on a single host. 8 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Complex Configurations, cont. Multiple Execution System MOM Server Front End System MOM Scheduler MOM Note: PBS Server machine maybe a different architecture (UNIX/LINUX) from the execution hosts A PBS complex can be either UNIX/Linux or Windows, but not both. 9 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Chapter Two - Installation of PBS Professional Chapter Two Pre-Installation Basic Installation Post-Installation PBS Installed Directory Structure 10 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Post Installation – PBS Configuration File How does the PBS init script determine which services to invoke? • The init script reads the configuration file: “/etc/pbs.conf” • Format of a pbs.conf file: PBS_EXEC=/opt/pbs/default PBS_HOME=/var/spool/PBS PBS_START_SERVER=1 PBS_START_MOM=1 PBS_START_SCHED=1 0 will prevent init from starting or stopping the daemon 1 will have init start or stop the daemon PBS_SERVER=traintb16 PBS_DATA_SERVICE_USER=pbsuser01 11 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. File System and File Transfer Sites will need to determine how users will access data files • Most common file sharing methods used by PBS customers: • • NFS GFS Network File System Global File System (most widely used) What method of file copy will be used? • • • rcp scp cp remote copy (default used by PBS) secure copy Linux/Unix copy 12 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. User’s PBS Environment Delivery of STDOUT/STDERR files • PBS should be able to copy user’s STDOUT and STDERR files to the appropriate directory without password challenge Stage input/output files • Users may need to import/export files related to the job before/after execution Users’ Data Transfer • Users should be able to transfer data without having to supply password, (e.g. rcp/scp) Users must have a valid account • Users should be able to log onto execution host(s) and should have a valid username and group 13 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Altair LM-X License Management PBS Professional 11.0 is now licensed by Altair License Management System (ALM) based on X-Formation’s LM-X license management system Altair’s ALM package for PBS can be downloaded from: https://secure.altair.com/UserArea/ We recommend that Altair’s ALM be installed and configured before installing PBS Professional v11.0 For additional information on Altair’s ALM refer to the Altair License Manager System 11.0 Installation and Operations Guide 14 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Chapter Three - PBS Administration Chapter Seven • Process flow of a PBS job • PBS installed directory structure • Directory structure of $PBS_HOME • Directory structure of $PBS_EXEC • Understanding the PBS configuration file • Manually starting and stopping PBS daemons • Impact of PBS daemons restarts on running jobs • Network ports used by PBS • Status of PBS complex 15 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Process Flow of a PBS Job – User Level 6.traintb16 6.traintb16 PBS SERVER 6.traintb16 on HOST A 1. User submits job 2. PBS Server returns a job ID 3. PBS Scheduler requests a list of resources from the Server * PBS SCHEDULER 4. PBS Scheduler sorts all the resources and jobs * 5. PBS Scheduler informs PBS Server which host(s) that job can run on * 6. PBS Server pushes job script to execution host(s) HOST A HOST B HOST C ncpus mem host 7. PBS MOM executes job script 8. PBS MOM periodically reports resource usage back to PBS Server * 9. When job is completed PBS MOM kills the job script 10. PBS Server de-queues job from PBS complex Note: * This information is for debugging purposes only. It may change in future releases. 16 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Installed Directory Structure PBS Professional software is installed in two separate directories • $PBS_EXEC “/opt/pbs/default” contains: PBS daemons Libraries Man pages Support tools Administrator and user PBS commands • $PBS_HOME “/var/spool/PBS” contains: PBS daemon configurations PBS daemon logs Other various file-related directories 17 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Directory Structure - PBS_HOME Directory structure of $PBS_HOME PBS_HOME server_priv mom_priv daemon configuration directories sched_priv server_logs mom_logs daemon log directories sched_logs spool undelivered checkpoint aux misc directories/files pbs_environment pbs_version datastore This information is for debugging purposes only. It may change in future releases. 18 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Directory Structure - PBS_EXEC Directory structure of $PBS_EXEC PBS_EXEC bin sbin binaries of PBS daemons and user/admin PBS commands lib man include etc tcltk libraries, manual pages, and header files unsupported python pgsql This information is for debugging purposes only. It may change in future releases. 19 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Directory Structure of $PBS_HOME /server_priv Detailed structure of $PBS_HOME/server_priv * server_priv accounting db_password hooks jobs prov_tracking server.lock svrlive * directory containing daily accounting logs database password - encrypted directory containing custom hook definitions directory containing users’ job scripts OS provisioning directory PBS server PID lock file used for failover configuration tracking PBS license related file usedlic PBS license related file This information is for debugging purposes only. It may change in future releases. 20 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Configuration File – pbs.conf PBS installs a configuration file “pbs.conf” located in “/etc/” directory. This configuration file is used by PBS to determine: • Which daemons to start/stop • What PBS server to communicate with • What file copy mechanism to use Default contents of pbs.conf PBS_EXEC=/opt/pbs/default PBS_HOME=/var/spool/PBS PBS_START_SERVER=1 PBS_START_MOM=1 PBS_START_SCHED=1 PBS_SERVER=hostname.domain PBS_DATA_SERVICE_USER=pbsuser01 Each server/scheduler, execution, and client host has a pbs.conf file installed Refer to Administrator’s Guide; Chapter 13; Section 13.1.3; pages 715-716 for a complete listing of configuration file variables 21 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Configuration File – pbs.conf, cont. How pbs.conf differs between the PBS Server and PBS MOM hosts: PBS SERVER HOST PBS_EXEC=/opt/pbs/default PBS_HOME=/var/spool/PBS PBS_START_SERVER=1 PBS_START_MOM=0 PBS_START_SCHED=1 PBS_SERVER=traintb16 PBS_DATA_SERVICE_USER=pbsuser01 PBS EXECUTION HOST PBS_EXEC=/opt/pbs/default PBS_HOME=/var/spool/PBS PBS_START_SERVER=0 PBS_START_MOM=1 PBS_START_SCHED=0 PBS_SERVER=traintb16 Note: Only 1 active instance of a PBS Server and PBS Scheduler can be running within a PBS complex 22 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Configuration File – pbs.conf, cont. The variable PBS_START_<daemon> sets which daemon should be allowed to start when the “/etc/init.d/pbs” script runs. For example: /etc/pbs.conf This is the expected behavior when executing “/etc/init.d/pbs start”: PBS_EXEC=/opt/pbs/default PBS_HOME=/var/spool/PBS PBS_START_SERVER=1 pbs_server daemon will be invoked PBS_START_MOM=0 pbs_mom daemon will not be invoked PBS_START_SCHED=1 pbs_sched daemon will be invoked PBS_SERVER=traintb16 PBS_DATA_SERVICE_USER=pbsuser01 23 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Starting/Stopping PBS Using Start/Stop Script Starting/stopping PBS • Why use start/stop script? • Vnode definitions are created only when the start script is used; they are not created when the daemons are started manually • Vnode definitions are required if PBS is to manage cpusets on a machine • The pbs_mom daemon on the Altix and the Cray must be started via the start script • Using the pbs start/stop script to stop PBS will preserve jobs (the server gets a ‘qterm -t quick’) • Location of start/stop script (Linux) /etc/init.d/pbs start 24 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Status of PBS Complex Use qstat -Bf to view the status of a PBS complex Server: traintb16 server_state = Active server_host = traintb16.prog.altair.com scheduling = True total_jobs = 0 state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun :0 default_queue = workq log_events = 511 mail_from = adm query_other_jobs = True resources_default.ncpus = 1 default_chunk.ncpus = 1 scheduler_iteration = 600 FLicenses = 33 resv_enable = True node_fail_requeue = 310 max_array_size = 10000 pbs_license_info = 7788@localhost pbs_license_min = 1 pbs_license_max = 2147483647 pbs_license_linger_time = 3600 license_count = Avail_Global:32 Avail_Local:1 Used:0 High_Use:0 pbs_version = PBSPro_11.0.0.103450 eligible_time_enable = False max_concurrent_provision = 5 25 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Manually Starting/Stopping PBS Daemons Manually starting/stopping PBS daemons • PBS Server • • Start • $PBS_EXEC/sbin/pbs_server Stop • $PBS_EXEC/bin/qterm –t [quick|delay|immediate] • PBS Scheduler • • Start • $PBS_EXEC/bin/pbs_sched Stop • $PBS_EXEC/bin/qterm –s • kill –INT <pbs_sched_pid> • PBS MOM • • Start • $PBS_EXEC/sbin/pbs_mom Stop • • $PBS_EXEC/bin/qterm –m This will shut down all the MOMs kill –INT <pbs_mom_pid> 26 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Network Ports Used By PBS Daemons UNIX/Linux network ports Daemon Port Number Protocol Connection pbs 15001 TCP Client/Scheduler to Server pbs_server 15001 UDP Server to MOM via RPP pbs_mom 15002 TCP MOM to/from Server pbs_resmon 15003 TCP MOM resource requests pbs_resmon 15003 UDP MOM resource requests pbs_sched 15004 TCP PBS Scheduler pbs_mom_globus 15005 TCP MOM Globus pbs_mom_globus 15006 TCP MOM Globus resource requests pbs_mom_globus 15006 UDP MOM Globus resource requests 27 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Chapter Four - Job Management Chapter Four Defining a Job Script Types of Jobs Submitting Jobs Process Flow of a PBS Job Querying PBS Jobs Setting Job Attributes Requesting Job Resources Default Job Attributes Order of Default Resources Assigned to Jobs Job Exit Codes 28 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Defining a Job Script What is a job script? • A file that contains a set of instructions to execute a series of commands. Also known as a “batch job”. Example of a job script: Shell interpreter commands #!/bin/bash sleep 5 /home/altair/scripts/optistruct –cpu 2 handlebar.fem 29 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Submitting Jobs - Using “qsub” Submitting a job script to PBS • Using “qsub” command Usage: qsub Example: qsub <job_attributes/resources> –l select=1:ncpus=1 <job_script> test_script • If the job is accepted by PBS, a job identifier is returned. This job identifier is comprised of the job number and the submitted server host name: 0.traintb16 Note: - If a job is rejected it will not return a job identifier, but it will increment the job ID - Largest possible job ID is 7 digits: 9,999,999. Once reached it will reset to zero 30 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Requesting Job Resources – Built in Resources Resource Description arch System architecture cput Amount of CPU time used by the job for all the processes on all the chunks mem Amount of physical memory allocated to a job ncpus Number of processors requested for a job walltime Time requested for the job to run Note: For complete listing refer to PBS Reference Documentation Guide pages 336-340 31 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Types of Jobs There are two types of PBS jobs • Batch Job - A script that contains commands or tasks to execute site specific applications • Interactive Job - Runs like a batch job, but when it runs, the user’s terminal input and output are connected to the execution host; similar to a login session. • Allows users to debug a job script • Verify a new application properly runs 32 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Setting Job Attributes – Using PBS Directives Job attributes can be set in 2 different ways: • Method 1: on the qsub command line qsub –N <job_name> <job_script> • Method 2: within a job script as a PBS directive #!/bin/bash #PBS –N test_run_01 #PBS –l select=4:ncpus=4:mem=16GB #PBS –l place=scatter #PBS- j oe #PBS –o /home/pbsuser01/OUTPUTS optistruct –ncpu 2 handlebar.fem Note: - PBS expects the directives to begin on the second line, and be on consecutive lines thereafter. Once started, the interpreter stops processing directives at the first line that contains an executable line. It will ignore comment lines. - Command line arguments will override PBS directives. 33 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Requesting Job Resources – Understanding Resources What are job resources? • Applications sometimes need certain types and amounts of system resources such as: - memory ncpus scratch space • During job submission, required resources can be requested How can these resources be requested within PBS? • PBS defines these resources as chunks or as job-wide resources What are “chunks”? What are “job-wide resources”? • set of resources that are allocated as a unit to a job • resources that are associated with the entire job • smallest set of resources that are allocated to a job • for example: placement of jobs, walltime • for example: ncpus, mem • requested in a “select” statement qsub –l select=<#>:ncpus=<#>:mem=<#> 34 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Requesting Job Resources – Using Chunks & Select Requesting resources in chunks • Resources which are to be allocated as a unit to a job - Smallest set of resources to be allocated to a single job Host/Vnode level request Syntax: qsub –l select=[ N: ] chunk[ + [N:] chunk….] For example: 1. Job requesting: 3 chunks with 2 CPUs per chunks: qsub –l select=3:ncpus=2 2. Job requesting: 2 chunks with 1 CPU each and 10GB each and another set of 3 chunks with 2 CPUs each and 8GB each of memory qsub –l select=2:ncpus=1:mem=10gb+3:ncpus=2:mem=8gb 35 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Requesting Job Resources – Job Placement Placing jobs on hosts/Vnodes • Users can specify how their multi-node job is placed within a PBS complex based on the resources requested • Place statement controls how the job is placed on the hosts/vnodes from which resources may be allocated for the job • Using the “place” statement: Usage: qsub –l place= <type>| <sharing> | <group> Example: qsub –l select=1:ncpus=2:mem=100MB –l place=pack Type type script Value Description free place job on any vnode(s), including hosts pack all chunks will be taken from one host scatter each chunk is allocated to a separate host excl only this job uses the vnodes chosen shared this job can share the vnodes chosen <resource> chunks will be grouped according to a resource sharing group 36 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Requesting Job Resources – Job Wide Resources Requesting job-wide limits • Resources that are requested outside a select statement - Such as walltime, or cput • Requesting resources at server or queue level • Resources that are not tied to specific host(s)/vnode(s) For example: qsub –l select=1:ncpus=1:mem=100MB –l walltime=01:00:00 myscript 37 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Requesting Job Resources – SMP Jobs SMP jobs are meant to run on a single execution host Submitting an SMP PBS job qsub –l select=x:ncpus=x –l place=pack Note: all chunks will be placed on a single host Additional options • Place a job on a host that already has a job running on it qsub –l select=1:ncpus=2 –l place=pack:shared • Place a job on a host on which no other jobs are running and make that host exclusive to it qsub –l select=1:ncpus=2 –l place=pack:excl 38 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Requesting Job Resources – MPI Jobs MPI jobs run on multiple hosts, using an MPI application PBS has tightly integrated wrapper scripts for various MPI implementations • Allows PBS to track spawned MPI processes • More accurate tracking of all resources being consumed across all the hosts • Accurately record CPU accounting utilization on all nodes • Accurately enforce requested job limits • Automatically "clean up" stray MPI processes on all nodes • Require no changes other than wrapping 39 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. 1777 ? Ss 0:00 /opt/pbs/default/sbin/pbs_mom 1779 ? Ss 0:00 \_ -bash 1810 ? S 0:00 \_ /bin/sh /var/spool/PBS_10.4.0.101257/mom_priv/jobs/1746.rhel5.lab.altair.com. SC 1812 ? S 0:00 \_ /opt/mpich2-install/bin/mpirun -f /var/spool/PBS_10.4.0.101257/aux/1746.rhel5.lab.altair.com /usr/local/gromacs_mpich2-1.3.2p1/bi 1813 ? S 0:00 \_ /opt/mpich2-install/bin/hydra_pmi_proxy -control-port rhel54:37470 --demux poll --pgid 0 --proxy-id 0 1814 ? R 0:14 \_ /usr/local/gromacs_mpich21.3.2p1/bin/mdrun -f /test/bench/d.dppc/grompp.mdp -c /test/bench/d.dppc/conf.gro -p /test/benc 40 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Requesting Job Resources – Submitting MPI Jobs Method 1 • Request: 4-way MPI job with 2 CPUs and 2GB memory per MPI task, with one MPI task per host, where each host has 2 CPUs and 2 GB memory qsub –l select=4:ncpus=2:mem=2GB –l place=scatter • Variable $PBS_NODEFILE contains list of vnodes VnodeA VnodeB VnodeC VnodeD • Sample of an MPI job script #!/bin/bash #PBS –l select=4:mem=2GB:mpiprocs=2 #PBS –l place=scatter mpirun –np 8 –mem 8GB file 41 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Requesting Job Resources – Submitting MPI Jobs, cont. Method 2 • Request: 4-way MPI job with 2 CPUs and 2GB memory per MPI task; request up to 4 hosts, where each host has 4 CPUs and 4 GB memory qsub –l select=4:ncpus=2:mem=2GB –l place = free • Variable $PBS_NODEFILE contains list of vnodes VnodeA VnodeB • Sample of a MPI job script #!/bin/bash #PBS –l select=4:mem=2GB:mpiprocs=2 $PBS –l place=free mpirun –np 8 –mem 8GB file 42 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Requesting Job Resources - Boolean Resources A resource that can be requested as true or false Requesting chunks that have resource ‘optistruct’, the qsub request line would be: qsub –l select=1:ncpus=1:optistruct=true The scheduler will only place this job on vnodes that have the resource “optistruct” set to “true” If a boolean resource is requested as job-wide, e.g.: qsub –l select=1:ncpus=1 –l optistruct=true PBS will check if it is available at the server or queue level – not vnode/host level 43 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Default Job Attributes PBS includes default values for resources that the user doesn’t specify during job submission The following are resource defaults assigned to a job: • default_chunk.ncpus=1 • resources_default.ncpus=1 • resources_default.walltime=<5 years> Note: Root and managers can specify additional default resources 44 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Querying Jobs – Using “qstat” To show a list of current PBS jobs’ status • Using “qstat” command Usage: qstat <-a, -n, -s, -1, -w> Example: qstat Job id ---------------6.traintb16 7.traintb16 8.traintb16 9.traintb16 Name ---------------test_script jobA test_2 test_script User ----------pbsuser01 pbsuser02 pbsuser04 pbsuser01 Time Use -------00:00:00 00:00:00 0 00:00:00 S R R Q R Queue ----workq workq workq workq Note: If a job was deleted or completed then it can no longer be listed via qstat unless the PBS complex has enabled the job history functionality 45 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Querying Jobs – Additional qstat Options -a job name, session id, # nodes req, #ncpus req, req’d mem, req’d, time, and elapsed time Job ID Username Queue Jobname SessID NDS TSK ------------- -------- -------- ---------- ------ --- --8.traintb16 pbsuser0 workq test_scrip 6556 1 8 -s same as option –a, but with comments Job ID Username -------------- -------8.traintb16 pbsuser0 Job run at Wed Jul 05 -n Req'd Req'd Elap Memory Time S Time ------ ----- - ------- R 00:07 Queue -------workq at 14:48 Jobname SessID NDS TSK ---------- ------ --- --test_scrip 5556 1 8 on (traintb16:ncpus=8) Req'd Req'd Elap Memory Time S Time ------ ----- - ------- R 00:07 same as option –a, but indicates which execution vnode(s) the job is running on Job ID Username Queue Jobname SessID NDS TSK -------------- -------- -------- ---------- ------ --- --8.traintb16 pbsuser0 workq test_scrip 5556 1 8 traintb16/0 Req'd Req'd Elap Memory Time S Time ------ ----- - ------- R 00:07 Note: - Adding an additional option “-1” will output each entry on a single line instead of wrapping around - Also using “-w” shows the full output of individual fields 46 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Querying Jobs – Job States State Description Q Job is queued waiting for execution R Job is running S Job is suspended E Job is exiting after execution H Job is held or put on hold W Job is waiting for its requested execution time or has been delayed 30 minutes because stage-in failed T Job in transition is being moved between states F Jobs that have finished; regardless if completed successfully or not M Jobs that have moved to another PBS complex 47 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Job Attributes - Viewing Job Attributes To view job attributes that were assigned to a particular job, use the qstat command. Job Id: 1.traintb16 Job_Name = sleep_job Job_Owner = pbsuser01@traintb16.prog.altair.com resources_used.cpupercent = 0 resources_used.cput = 00:00:00 resources_used.mem = 1028kb resources_used.ncpus = 1 Usage: qstat –f <job_id> resources_used.vmem = 18440kb resources_used.walltime = 00:00:00 Example: qstat –f 2.trainhp01 job_state = R queue = workq server = traintb16 Checkpoint = u ctime = Tue May 5 17:49:09 2010 Error_Path = traintb16.prog.altair.com:/home/pbsuser01/boo/sleep_job.e1 exec_host = traintb16/0 exec_vnode = (traintb16:ncpus=1) Hold_Types = n Join_Path = n Keep_Files = n Mail_Points = a mtime = Tue May 5 17:49:09 2010 Output_Path = traintb16.prog.altair.com:/home/pbsuser01/boo/sleep_job.o1 Priority = 0 qtime = Tue May 5 17:49:09 2010 Rerunable = True 48 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Job Attributes - Viewing Job Attributes, cont. Resource_List.ncpus = 1 Resource_List.nodect = 1 Resource_List.place = pack Resource_List.select = 1:ncpus=1 stime = Tue May 5 17:49:11 2010 session_id = 11535 jobdir = /home/pbsuser01 substate = 42 Variable_List = PBS_O_HOME=/home/pbsuser01,PBS_O_LANG=en_US.UTF-8, PBS_O_LOGNAME=pbsuser01, PBS_O_PATH=/home/pbsuser01/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X 11:/usr/X11R6/bin:/usr/games:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mi t/sbin:/opt/pbs/default/bin:/opt/pbs/default/sbin, PBS_O_MAIL=/var/spool/mail/pbsuser01,PBS_O_SHELL=/bin/bash, PBS_O_HOST=traintb16.prog.altair.com, PBS_O_WORKDIR=/home/pbsuser01/boo,PBS_O_SYSTEM=Linux,PBS_O_QUEUE=workq comment = Job run at Tue May 05 at 17:49 on (traintb16:ncpus=1) etime = Tue May 5 17:49:09 2010 Submit_arguments = -l select=1:ncpus=1 my_script Note: Running as root or PBS Manager will output additional information 49 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Querying Jobs – Using “tracejob” Using tracejob to obtain comprehensive information about a job • Using “tracejob” command Usage: tracejob –n<days> <job id> Example: tracejob –n4 0.traintb16 Job: 0.traintb16 05/05/2010 17:43:35 05/05/2010 17:43:35 S S 05/05/2010 05/05/2010 05/05/2010 05/05/2010 05/05/2010 05/05/2010 05/05/2010 05/05/2010 05/05/2010 17:45:08 17:45:08 17:45:08 17:45:10 17:45:10 17:45:14 17:45:14 17:45:15 17:45:15 L S M S L M M S S 05/05/2010 05/05/2010 05/05/2010 05/05/2010 05/05/2010 05/05/2010 05/05/2010 05/05/2010 05/05/2010 17:45:15 17:45:15 17:45:15 17:45:15 17:45:15 17:45:15 17:45:15 17:45:15 17:45:15 M M M M M M S M M enqueuing into workq, state 1 hop 1 Job Queued at request of pbsuser01@traintb16.prog.altair.com, owner = pbsuser01@traintb16.prog.altair.com, job name = sleep_job, queue = workq Considering job to run Job Run at request of Scheduler@traintb16.prog.altair.com on exec_vnode (traintb16:ncpus=1) Started, pid = 11491 Job Modified at request of Scheduler@traintb16.prog.altair.com Job run task 00000001 terminated Terminated Obit received momhop:1 serverhop:1 state:4 substate:42 Exit_status=0 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=3056kb resources_used.ncpus=1 resources_used.vmem=39392kb resources_used.walltime=00:00:07 task 00000001 cput= 0:00:00 traintb16 cput= 0:00:00 mem=3056kb Obit sent copy file request received staged 2 items out over 0:00:00 delete job request received dequeuing from workq, state 5 kill_job work proc outstanding S = Server L = Scheduler M = MOM Note: Information is taken from server logs, scheduler logs, and mom logs (local to that machine) past 24 hrs 50 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Querying Jobs – Deleting Jobs To delete jobs that are listed under qstat • Using “qdel” command Usage: qdel <job id> Example: qdel 0.traintb16 To delete a job from the server regardless of the job’s state Usage: qdel –W force <job id> Example: qdel –W force 0.traintb16 Note: Users can only delete their own jobs; unless that user’s name is in the manager’s list 51 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Querying Jobs – Finished Job History To view only jobs that have been deleted, moved, or finished • qstat -H Job id ---------------80.traintb16 81.traintb16 82.traintb16 83.traintb16 Name ---------------sleep5 sleep5 sleep5 sleep5 User ---------------pbsuser01 pbsuser01 pbsuser01 pbsuser01 Time Use -------00:00:00 00:00:00 00:00:00 00:00:00 S F F F F Queue ----workq workq workq workq S F F F F Q R Queue ----workq workq workq workq workq workq To view all jobs; regardless what state type • qstat -x Job id ---------------80.traintb16 81.traintb16 82.traintb16 83.traintb16 84.traintb16 85.traintb16 Name ---------------sleep5 sleep5 sleep5 sleep5 sleep5 sleep5 User ---------------pbsuser01 pbsuser01 pbsuser01 pbsuser01 pbsuser01 pbsuser01 Time Use -------00:00:00 00:00:00 00:00:00 00:00:00 0 00:00:00 Note: The PBS Server attribute job_history_enable needs to be set in order to use this option 52 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Querying jobs – Estimated start time/start order PBS can estimate the start time and start order of jobs using qstat – T option • New column: Est Start Time • Job ids are displayed in the order of estimated start time $ qstat -T traintb16: Est Req'd Req'd Start Job ID Username Queue Jobname SessID NDS TSK Memory Time --------------- -------- -------- ---------- ------ --- --- ------ ----159.traintb16 pbsuser01workq STDIN 4302 1 2 -- 00:05 164.traintb16 pbsuser01workq STDIN -1 1 -- 01:05 13:36 165.traintb16 pbsuser01workq STDIN -1 1 -- 01:05 13:36 160.traintb16 pbsuser01workq STDIN -1 1 -- 01:05 14:41 Note: The sorted job ids are NOT determined by the PBS Scheduler. 161.traintb16 pbsuser01workq STDIN -1 1 -- 01:05 14:41 S Time - ---R - Q Q Q Q 53 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Querying Jobs – Re-Queuing Jobs To re-queue a running job • Using “qrerun” command Usage: qrerun <job id> Example: qrerun 0.traintb16 To re-queue a job even if that job’s execution host is not reachable Usage: qrerun –W force <job id> Example: qrerun –W force 0.traintb16 Note: only root or managers can perform this operation 54 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Job Exit Codes The exit code from a batch job is a standard Unix termination status, the same sort of number you get in a shell script from checking the "$?" variable after executing a command. Typically, exit code 0 (zero) means successful completion. Codes 1-127 are typically generated by the job itself calling exit() with a non-zero value to terminate itself and indicate an error. Exit codes in the range 129-255 represent jobs terminated by Unix "signals". Each type of signal has a number, and what's reported as the job exit code is the signal number plus 128. Signals can arise from within the process itself (as for SEGV) or be sent to the process by some external agent (such as the batch control system). The specific meaning of the signal numbers are platform-dependent Exit codes < 0 are set by PBS 55 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Job Exit Codes, cont. # Name Description 0 JOB_EXEC_OK Job execution was successful -1 JOB_EXEC_FAIL1 Job execution failed, before files, no retry -2 JOB_EXEC_FAIL2 Job execution failed, after files, no retry -3 JOB_EXEC_RETRY Job execution failed, do retry -4 JOB_EXEC_INITABT Job aborted on MOM initialization -5 JOB_EXEC_INITRST Job aborted on MOM init, checkpoint, no migrate -6 JOB_EXEC_INITRMG Job aborted on MOM init, checkpoint, ok migrate -7 JOB_EXEC_BADREST Job restart failed -8 JOB_EXEC_GLOBUS_INIT__RETRY Initialization of globus job failed, do retry -9 JOB_EXEC_GLOBUS_INIT_FAIL Initialization of globus job failed, no retry -10 JOB_EXEC_FAILUID Invalid UID/GID for job -11 JOB_EXEC_RERUN Job rerun -12 JOB_EXEC_CHKP Job was checkpointed and killed -13 JOB_EXEC_FAIL_PASSWORD Job failed due to a bad password -14 JOB_EXEC_RERUN_ON_SIS_FAIL Job was re-queued or deleted due to communication failure between 1st head node and a sister node 56 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Accounting Records – Log Information PBS accounting logs contain information about job statistics such as: • Owner, queue, start time, end time, execution host, resources requested, exit status, and resources used Accounting logs are stored on the machine where the pbs_server daemon is running • Location: $PBS_HOME/server_priv/accounting – A new log file is created every day —file name format: [YYYYMMDD] The accounting logs are only accessible by root The accounting logs can be parsed by the “pbs-report” script 57 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Accounting Records – Details of Accounting Log Entry Sample of accounting log entry: 05/05/2010 17:45:15;E;0.traintb16;user=pbsuser01 group=users jobname=sleep_job queue=workq ctime=1241559815 qtime=1241559815 etime=1241559815 start=1241559910 exec_host=traintb16/0 exec_vnode=(traintb16:ncpus=1) Resource_List.ncpus=1 Resource_List.nodect=1 Resource_List.place=pack Resource_List.select=1:ncpus=1 session=11491 end=1241559915 Exit_status=0 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=3056kb resources_used.ncpus=1 resources_used.vmem=39392kb resources_used.walltime=00:00:07 syntax: date-time; record_type; id_string; message_text date-time Date and time stamp. Format: mm/dd/yyyy hh:mm:ss record_type Single character indicating type of record id_string Job, reservation or reservation-job identifier message_text Contains detailed information for the job or reservation 58 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Using Accounting Records: Using “pbs-report” To parse information from the accounting logs use the pbs-report script located in $PBS_EXEC/sbin directory PBS Pro Cluster Accounting Summary Statistics ----------------------------------------- Information obtained from pbs-report helps sites to determine how much work was done by PBS jobs at a site during a specified time period Sample output of pbs-report: Report from Thu Sept 15 2010 00:00:00 to Thu Sept 17 2010 12:13:32 # of Username Total Total Average jobs CPU Time Wall Time Efcy. Wait Time Muda ----- ---------- ---------- ----- ---------- ----- TOTAL 132 0 618322 0.000 2108 0.000 pbsuser01 127 0 616328 0.000 2191 0.000 pbsuser02 5 0 1994 0.000 4 0.000 Minimum 5 0 1994 0.000 4 0.000 Maximum ------------ 127 0 616328 0.000 2191 0.000 Mean 66 0 309161 0.000 1097 0.000 Deviation 61 0 307167 0.000 1093 0.000 5 0 1994 0.000 4 0.000 Median Job Set Summary Standard Minimum Maximum Mean Deviation Median ---------- ---------- ---------- ---------- ---------- CPU time 0 0 0 0 0 Wall time 0 78616 4684 17559 60 Wait time 0 67778 2108 2070 2 Suspend time 0 0 0 0 0 Note: All times displayed in seconds. 59 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Moving Jobs Between Queues Users can move jobs from one queue to local queue by using the qmove command • Using “qmove” command Usage: qmove <new_queue> <job_id> Example: qmove small_queue 0.traintb16 Jobs can also be moved to another PBS complex Example: qmove small_queue@traintb02 0.traintb16 Note: • Running or suspended jobs cannot be moved • Use qstat –H if job was moved • Must specify the fully job id.server for qstat 60 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Holding and Releasing Jobs Users can put a hold on their jobs, so that PBS will not schedule them for execution • Using “qhold” command Usage: qhold <job_id> Example: qhold 0.traintb16 To release a held job, to allow PBS to consider it for execution: • Using “qrls” command Usage: qrls <job_id> Example: qrls 0.traintb16 61 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Deferring Job Execution Users can specify a date/time for their job to be eligible for execution Usage: qsub –a date_time date_time [[[CC]YY]MM]DD]hhmm[.SS] Example: qsub –a 201008281645 my_script Note: Deferred jobs will be marked with the “W” wait state 62 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Specifying Email Notifications Users can specify what type of email notification they want, depending on job status The default is only to notify the user when the job is aborted or terminated Using qsub command with the following options, users can set their own notification: Usage: qsub –m <a|b|e|n> Example: qsub –m abe Options Description a job is aborted (default) b job has begun e job has finished execution n do not send any email 63 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Chapter Five - Site Specific Configurations Chapter Five Preserving job history Prologue/epilogue scripts PBS redundancy and failover 64 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Preserving Job History - Concept By default once a job has been de-queued from a PBS complex, the job’s history is retrievable using qstat To enable job history feature by using qmgr: Qmgr: set server job_history_enable = True • preserves job attributes • preserves job resource requested and used The default preservation time frame is 14 days Qmgr: set server job_history_duration: <time> • <time> : [[hours:]minutes:]seconds[.milliseconds] To view job history: • • View all job ids; past and present View jobs that were only finished, moved, or deleted qstat –x |f|a|n|s qstat –H |f|n|s 65 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Prologue & Epilogue Scripts Sites can be set up to run custom scripts before jobs are executed or after each job is finished or terminated • These scripts can perform tasks such as network file staging for site-specific applications, file cleanup after a job has been completed, or to output additional information to the user’s job after completion • These scripts are known as: • Prologue • Epilogue Script executed on primary execution host before the job is run Located in: $PBS_HOME/mom_priv/prologue Script executed on primary execution host after the job is run Located in: $PBS_HOME/mom_priv/epilogue • Each execution host will have it’s own prologue or epilogue script • • Only runs on primary execution host of a multinode job Runs as root • A timeout period can be set up in the PBS_HOME/mom_priv/config: $prologalarm <seconds> 66 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Prologue & Epilogue Scripts – Sequence of Events Start of a Job (Prologue) 1. 2. 3. 4. 5. Licenses are obtained Files are staged in if needed $TMPDIR is created The prologue script is executed The PBS job script is executed End of a Job (Epilogue) 1. 2. 3. 4. 5. 6. 7. 8. The PBS job script finishes The job’s cpusets are destroyed The epilogue script is run The obit is sent to the pbs server Any file stageout takes place – includes STDOUT and STDERR Files staged in or out are removed PBS Job files are deleted FLEX licenses are returned to pool 67 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Prologue & Epilogue Scripts – Sample Prologue Script Prologue Script – reordering the vnodes in the PBS_NODEFILE #!/bin/bash PBS_NODEFILE="/var/spool/PBS/aux/$1" lines=`cat $PBS_NODEFILE | wc -l` nodes=`cat $PBS_NODEFILE | uniq` nodect=`echo $nodes | wc -w` loops=$(expr $lines / $nodect) for (( times = 0; times < $loops; times++ )); do nodefile=$nodefile$nodes" " done echo $nodefile | tr " " "\n" > $PBS_NODEFILE 68 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Prologue & Epilogue Scripts – Sample Epilogue Script Epilogue Script – cleaning up files/directories #!/bin/sh # # # # # # # # # # # $Id: epilogue,v 3.3 2006/07/27 20:48:36 $1 = job id $2 = user name $3 = group name $4 = job name $5 = session id $6 = requested resource limits $7 = resources used $8 = queue name $9 = account string $10 = exit code from job UNIX95=XPG4; export UNIX95 jobid=$1 jobname=$4 user=$2 sid=$5 if [ -z "$jobid" -o -z "$jobname" -o -z "$user" ]; then echo "`basename $0`: No arguments: exiting." exit 1 fi # Defining a marker for utilization later. state=/tmp/cleanup${jobid} # Define the source location src=/scratch/`hostname`/$user/$jobname-`echo $jobid | cut -d. -f1` if [ -d $src -a ! -f $state ]; then touch $state if [ -x $src/pbs-cleanup ]; then if [ `whoami` != $user ]; then su - $user -c "$src/pbs-cleanup" else $src/pbs-cleanup fi fi if [ $? -eq 0 ]; then cd / rm -rf $src rmdir `dirname $src` 2>/dev/null fi rm -f $state fi until [ ! -f $state ]; do sleep 5 done exit 0 69 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Redundancy and Failover - Concept PBS provides the capability for a backup PBS Server to assume the workload of a failed Primary Server • Primary Server • Secondary Server - is the main PBS server - is usually inactive, but starts up when primary fails Requirements for a PBS failover configuration: • • • • Primary and secondary servers must run on two separate host machines Both servers and all the execution hosts must have the same PBS version Both servers must be the same architecture – same binary Both servers must be able to communicate with each other and all the execution hosts • The primary and secondary servers must share the same PBS_HOME directory • PBS_HOME directory should be on a file system that is not local to either of the server hosts. • Root/administrator must have full read/write access to PBS_HOME Note: Its not advisable to have a MOM running on either host 70 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Redundancy and Failover – Setting Up Configuring Failover on the Primary Server 1. 2. 3. 4. 5. 6. Install PBS on the primary server’s host Check whether PBS is able to run jobs on execution hosts If the test passes move the $PBS_HOME directory to a shared file system Check whether PBS is able to run jobs on execution hosts using the new directory If the test passes shut down the pbs_server and pbs_sched daemons Configure the /etc/pbs.conf file to include the following settings: PBS_PRIMARY=<primary_host> PBS_SECONDARY=<secondary_host> PBS_SERVER=<short name for primary host> 7. The primary server is configured to run the scheduler: PBS_START_MOM=0 PBS_START_SCHED=1 8. Start the PBS daemons by executing: /etc/init.d/pbs start 71 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Redundancy and Failover – Setting Up, cont. Configuring Failover on the Secondary Server 1. Install PBS on the secondary server’s host 2. Mount the $PBS_HOME directory to same shared file system where the primary’s $PBS_HOME is mounted to 3. Configure the /etc/pbs.conf file to include the following settings: PBS_PRIMARY=<primary_host> PBS_SECONDARY=<secondary_host> PBS_SERVER=<short name for primary_host> 4. Since only one instance of the PBS scheduler can be running, only the primary server is configured to run it; the secondary will not run it PBS_START_MOM=0 PBS_START_SCHED=0 5. Start the PBS daemons by executing: /etc/init.d/pbs start 72 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Redundancy and Failover – Setting Up, cont. Configuring Failover on Execution and Client Hosts 1. Install PBS on each execution host 2. On each execution host, configure the /etc/pbs.conf file to include the following parameters: PBS_PRIMARY=<primary_host> PBS_SECONDARY=<secondary_host> PBS_SERVER=<short name for primary host> 3. Install the client commands on each client host 4. On each client host, configure the /etc/pbs.conf file to include the following parameters: PBS_PRIMARY=<primary_host> PBS_SECONDARY=<secondary_host> PBS_SERVER=<short name for primary host> 73 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Redundancy and Failover – Behavior What type of communication occurs between the primary and secondary servers when the daemons are running? • The secondary server will periodically attempt to connect to the primary server • The primary server will send a “handshake” every few seconds to the secondary server • Doing a “qstat –Bf” will show which of the two servers is active; look at the “server_host” line What happens when the secondary server becomes active? • PBS will send an email from the email account defined in the server’s “mail_from” attribute that a failover has occurred • The Secondary will communicate with the primary’s scheduler • • If it cannot communicate then the secondary server will launch its own scheduler process The Secondary server will inform all the PBS MOM that it’s the active server How does a failover impact PBS users? • Users will not notice when a failover occurs • When a user uses a PBS command such as qstat, the command will try to connect to the primary server first. If it fails, it will try the secondary server. • If the secondary responds to the command, a local file is created so this process doesn’t repeat every time that user sends PBS commands • This file is removed after the primary becomes active 74 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Chapter Six: Limiting Resource Usage Chapter Six Concept Terminology Attributes Users Groups 75 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Resource Usage: Concept PBS allows sites to setup separate resource limits by individual users or groups, generic users or groups, and total used by all users Different methods of resources limits can be set: • • • • • • • • total number of jobs that can run in a PBS complex total number of jobs a single user can run (named or generic ) total number of jobs a group can run (named or generic) maximum amount of resource that a user can request per job maximum amount of resource that a group can request per job total number of jobs that can be queued total number of jobs that a user can have in a queue total number of jobs that a group can have in a queue Limit attributes are set within the qmgr utility • • at server level at queue level PBS managers and operator can set limit attributes 76 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Resource Usage: Terminology Terminology User Limits Description limit-spec All users A limit for the total amount of resources allocated to all users combined o:PBS_ALL Generic users A limit for any single user u:PBS_GENERIC An individual user A limit for a named user u:<username> Group Limits Description limit-spec Generic groups A limit for any group g:PBS_GENERIC An individual group A limit for a named group g:<groupname> Note: <limit-spec> is case-sensitive 77 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Resource Usage: Attributes Resource limit attributes <limit attribute> Description max_run Maximum number of jobs allowed to be running max_run_soft Soft limit of number of jobs allowed to be running max_run_res.<resource> Maximum amount of specified resource that be can allocated to running jobs max_run_res_soft.<resource> Soft limit on the amount of specified resources that be can allocate to running jobs max_queued Maximum number of jobs allowed in a queue max_queued_res.<resource> Total amount of specified resource that can be allocated to queued or running jobs Syntax • Server level set server <limit_attribute> += “ [<limit_spec=<value>] • Queue Level set queue <queue_name> <limit_attribute> += “ [<limit-spec>=<value>]” 78 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Resource Usage: Users Limit the total number of running jobs for all users within a PBS complex to 4 jobs • set server max_run = “[o:PBS_ALL=4]” Limit a set number of running jobs for each user to 4 jobs • set server max_run = “[u:PBS_GENERIC=4]” Limit the number of running jobs for user “pbsuser01” to 4 jobs • set server max_run += “[u:pbsuser01=4]” Limit the TOTAL number of running jobs for all users to 7; however allow user “pbsuser01” to run 5 • set server max_run += “[o:PBS_ALL=7] , [u:pbsuser01=5]” Generic Users =3; user “pbsuser01” = 2; user “pbsuser02”=5 • set server max_run +=“[u:PBS_GENERIC=3], [u:pbsuser01=2],[u:pbsuser02=5]” 79 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Resource Usage – Groups Limit the total number of running jobs for any groups within a PBS complex to 4 jobs • set server max_run = “[g:PBS_GENERIC=4]” Limit the number of running jobs for a named group: opti to 4 jobs • set server max_run += “[g:opti=4]” 80 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Chapter Seven - Job Attributes & Selective Query Chapter Four Altering requested job resources Handling output and error files Job’s staging and execution directory File staging Sending messages to PBS jobs Sending signals to PBS jobs Selective job querying Job dependencies Moving jobs between queues Holding and releasing jobs Deferring job execution Specifying email notifications Exercises 81 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Altering Requested Job Resources – Using “qalter” Job’s requested resources can be changed even after submitted • Using “qalter” command: Usage: qalter -l <resource_name>=<new_value> <job_id> Example: qalter -l select=1:ncpus=3 0.traintb16 Can a job’s requested resources be altered once that job has started execution? • Yes, but only certain types of resources Resource Before Execution After Execution cputime YES YES- smaller amount walltime YES YES ncpus YES NO memory YES NO Note: Managers and Operators can grant more resources even if job has started 82 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Handling Output and Error Files Users have the ability to control how their output and errors are handled when their jobs are completed • • Can be defined at qsub command line or as a PBS directive By default files are copied using rcp; scp can be configured Option #1: Specifying the path/filename of STDOUT/STDERR • -o <path><filename> • -e <path><filename> Option #2: Where to retain STDOUT/STDERR files Note: Options Description -k e STDERR to be retained in job’s staging/execution directory -k o STDOUT to be retained in job’s staging/execution directory -k oe Both files to be retained in job’s staging/execution directory -k n Neither file is retained Option #1 and #2 cannot be mixed together If .O and .E cannot be copied back it is retained on the execution host in the directory $PBS_HOME/undelivered 83 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Job’s Staging and Execution Directory Default job staging is user’s home directory jobdir = <user’s home directory> Alternative method is have PBS create a unique directory for each job; this is done by using the sandbox attribute Usage: qsub –W sandbox = <HOME | PRIVATE> Where: HOME PRIVATE user’s home directory; default PBS will create a job-specific directory • Where the PRIVATE directory name has the form: pbs. <job_id.server_name>.<id_string> pbs.21.traintb16.x8z 84 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Job’s Staging and Execution Directory, cont If using sandbox=PRIVATE: • jobdir = /home/pbsuser01/pbs.17.traintb16.x8z • .O and .E will be copied to where it was qsub • after the job is completed the PRIVATE ($jobdir) directory is deleted If using sandbox=PRIVATE with –k oe option: • jobdir = /home/pbsuser01/pbs.17.traintb16.x8z • .O and .E will remain in $jobdir directory • after the job is completed the PRIVATE ($jobdir) directory is deleted; including the .O and .E 85 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. File Staging Input/Output File Staging • Users can specify which files/directories are copied onto the execution host before their job executes. This is known as STAGE IN. • Users can specify which files/directories are returned to the submission host or specified directory after the job completes. This is known as STAGE OUT. • After a job is completed, all stage-in and stage-out files are removed. Command line input argument: qsub –W stagein = <remote_path/file@server_name>:<local_path/file> qsub –W stageout = <file>:<remote_path/file@server_name> PBS Directive: #PBS stagein = <remote_path/file@server_name>:<local_path/file> #PBS stageout =<local_path/file>:<remote_path/file@server_name> Note: By default PBS uses RCP for file copying. SCP can be used. Walltime is not charged during staging in and out of files. 86 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Sending Messages to PBS Jobs – Using “qmsg” String messages can be sent to a job’s output (.O) or error (.E) file Why? • To have external events recorded to the jobs • Useful for administrators to notify a job that system events occurred where that job was running • Using “qmsg” command: Output file: qmsg –O “<msg>” <job_id> Error file: qmsg –E “<msg>” <job_id> Note: If flag “O” or “E” is not specified, the message is sent to the error file 87 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Sending Signals to PBS Jobs - Concept Why send a signal? • To force a program to take a specific action Most signals that are used: Signal Description SIGHUP Hangs up the program process SIGTERM Terminates the program process SIGINT Interrupts the program process SIGKILL Kills now regardless of the state of the program suspend Suspends a job process resume Resumes a job process 88 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Sending Signals to PBS Jobs – Using “qsig” Sending a signal • Using “qsig” command Usage: qsig –s <signal> <job_id> Example: qsig –s suspend 0.traintb16 qsig –s resume 0.traintb16 Note: Here, <signal> can be either the name of the signal, or its corresponding unsigned number. 89 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Selective Job Querying – Using “qselect” Using qstat will output status of all current jobs The qselect command can return a list of job IDs that meet specific criteria Option Usage: qselect –<option> Value Description -N <name> Job name -q <queue> Queue name -s <job state> Job states R,Q, etc -u <user name> User name -H OP Description .eq. equal to .ne. not equal to .ge. Finished or moved jobs -l <res.OP.value> By resources -t <.sub_option.time_attribute.value> By certain time type sub_option time_attribute Description a Execution_Time time job began execution c ctime job creation time greater than or equal to e etime job end time .gt. greater than g eligible_time accrued eligible time .le. less than or equal to m mtime modification time .lt. less than q qtime job queued time s stime job start time 90 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Selective Job Querying – Using “qselect”, cont. Examples: • To find job IDs of jobs belonging to a particular user: qselect –u user01 • To find job IDs of running jobs that have requested greater than 4 ncpus: qselect –s R –l ncpus.gt.4 • To query jobs that are currently in the run state wrapped around qstat: qstat `qselect –s R` • To delete all jobs in a PBS complex wrapped around qdel: qdel `qselect` • To list all jobs in a PBS complex including finished or moved jobs: qselect –x • To list jobs between a time of start time: qselect -ts.gt.09251200 -ts.lt.09251500 Note: Using qselect without any options outputs all job IDs 91 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Job Dependencies - Concept Users have the ability to specify dependencies between their jobs, such as: • Specify order of execution • Execute the next job only if previous job finished • Place jobs on hold until a particular job starts or completes Using “qsub” command Usage: qsub –W depend=<type>:<arg_list> <job_script> Example: qsub -W depend=afterok: 1.traintb16 my_script To find out if a job has dependencies: qstat –f <jobid> job_state = H depend: afterok:1.traintb16@prog.altair.com Note: jobs that request a dependency will be placed in “H” state 92 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Job Dependencies – Dependency Types Dependency Type Description after:<arg_list> Job may be scheduled for execution after all jobs in <arg_list> have started execution afterok:<arg_list> Job may be scheduled for execution only after all in <arg_list> have terminated with no errors. afternotok:<arg_list> Job may be scheduled for execution only after all jobs in <arg_list> have terminated with errors. afterany:<arg_list> Job may be scheduled for execution after all jobs in <arg_list> have terminated with or without errors. before:<arg_list> Jobs in <arg_list> may begin execution once this job has begun execution beforeok:<arg_list> Jobs in <arg_list> may begin execution once this job terminates without errors beforenotok:<arg_list> Jobs in <arg_list> may begin execution once job terminates execution with errors beforeany:<arg_list> Jobs in <arg_list> may begin execution once this job terminates execution, with or without errors on:<count> Job may be scheduled for execution after count dependencies on other jobs have been satisfied 93 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Chapter Eight - PBS Server & Site Configurations Chapter Eight Viewing and setting server, queue, and vnode attributes Server log information Creating a backup of the PBS environment Exercises 94 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Viewing PBS Server Configuration – Using “qmgr” PBS Administrators can use the PBS utility “$PBS_EXEC/bin/qmgr” to view and modify PBS server, queue and vnode attributes. • The qmgr command prints out the commands to re-create server and queue settings. The values shown below are the defaults. create queue workq set queue workq queue_type = Execution Default queue settings set queue workq enabled = True set queue workq started = True set server scheduling = True set server default_queue = workq set server log_events = 511 set server mail_from = adm set server query_other_jobs = True set server resources_default.ncpus = 1 set server default_chunk.ncpus = 1 set server scheduler_iteration = 600 set server resv_enable = True Default server settings set server node_fail_requeue = 310 set server max_array_size = 10000 set server pbs_license_info = 7788@localhost set server pbs_license_min = 1 set server pbs_license_max = 2147483647 set server pbs_license_linger_time = 3600 set server license_count = "Avail_Global:32 Avail_Local:1 Used:0 High_Use:0" set server eligible_time_enable = False set server max_concurrent_provision = 5 95 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Qmgr Commands Helpful qmgr commands • List of qmgr commands and PBS version: qmgr: help • Print out commands to re-create server/queue: qmgr: print server|queue @default • Print server/queue attributes and their values: qmgr: list server|queue @default • Print attributes and values of a specific queue: qmgr: list queue <queue_name> • Print out commands to re-create named queue: qmgr: print queue <queue_name> • To delete a queue: qmgr: delete queue <queue_name> • Print out commands to re-create vnodes: qmgr: print nodes @default • Print attributes and values of a specific vnode: qmgr: list node <node_name> • To set the value of an attribute: qmgr: set server|queue|node <attribute> • To unset the value of an attribute: qmgr: unset server|queue|node <attribute> • To create a new queue or vnode: qmgr: create queue|node 96 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Server – Understanding PBS Server Attributes Setting server attributes allows PBS Administrators to specify who can submit jobs, how many jobs can be running, resource limits (min, max, available, and default), reservations, access control list (acl), etc. Three levels of privilege: User, Operator, and Manager. Managers have greatest privilege. • All users can list or print attributes. • Operators can additionally set or unset attribute values. • Managers can additionally create or delete queues and vnodes PBS server daemon must be running in order to execute the qmgr utility. Any changes made to server attributes via qmgr go into effect as soon as they are entered; the pbs_server daemon does not need to be restarted. 97 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Server - Server Configuration Attributes Attribute Description scheduling Specifies whether or not the scheduler will schedule jobs. T|F default_queue Queue to which jobs are sent when users don’t specify a target queue. This is set to ‘workq’ by the install script. log_events Specifies which events are logged by the server. mail_from Username from which server sends mail. Default: “adm” query_other_jobs Specifies whether users can query other users’ job stats. T|F resources_default.ncpus Default value for ncpus assigned a given job if not requested at qsub default_chunk.ncpus Default value for ncpus per chunk scheduler_iteration Time between non-event-driven scheduling iterations resv_enable Enables/disables requesting reservations node_fail_requeue Time value for the server to wait for primary execution vnode to come back up before it will re-queue or delete the vnode’s jobs max_array_size Maximum number of subjobs allowed in a job array eligible_time_enable Controls whether a job’s eligible_time attribute is used as its starving time max_concurrent_provision The maximum number of vnodes allowed to be in the process of being provisioned 98 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Queues – Understanding Queues PBS uses a resource-based scheduling system, where submitted jobs are held in a container waiting for execution. This container is known as a “queue”. There are two types of queues: Execution and Route • Execution queue – jobs waiting for execution or running jobs • Route queue –routes jobs to either another execution or another route queue Queues can be set up with attributes such as: • • • • Number of jobs running Max queued Resources available Which users/groups/hosts have access PBS comes with a predefined default execution queue: workq 99 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Queues – Attributes of an Execution Queue PBS administrators use the PBS “qmgr” utility to view, modify, and delete queues To view the attributes of queue workq: list queue workq Qmgr: list queue workq Name of queue Type of queue Number of jobs in queue Number of jobs in each state Queue workq queue_type = Execution total_jobs = 0 state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun:0 Whether queue accepts new jobs enabled = True Whether queue’s jobs can be run started = True 100 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Queues – Creating an Execution Queue Only PBS Administrators can create and delete queues To print out the commands to recreate queue workq: print queue workq Qmgr: print queue workq Creation of a given queue Indicates what type of queue True|False: jobs can be enqueued True|False: jobs can be scheduled for execution create queue workq set queue workq queue_type = Execution set queue workq enabled = True set queue workq started = True 101 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Queues – Creating an Execution Queue, cont Creating a new queue named “my_queue” 1. create queue my_queue Naming and creating the new queue 1. set queue my_queue queue_type = Execution Defining this queue as an Execution (or Route) queue 1. set queue my_queue enabled = TRUE Setting the enabled attribute to True allows job to be enqueued 1. set queue my_queue started = TRUE Setting the started attribute to True allows jobs to run from this queue 102 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Queues - Execution Queue Attributes Attribute Description max_queuable Maximum number of jobs allowed in queue max_running Maximum number of jobs allowed to be running resources_default.<res_name> Default resource assigned to a job if that resource is not specified via qsub command resources_max.<res_name> Maximum amount of resource request for jobs that are allowed into this queue resources_min.<res_name> Minimum amount of a resource request for jobs that are allowed into this queue resources_available.<res_name> Maximum amount of resource allowed to be used by all running jobs in this queue 103 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Queues - Why Use Multiple Execution Queues? Why would a PBS complex have multiple queues instead of a single queue? • Having multiple queues could help with the following: • Various types of applications • • Access by different groups of users, hosts, or groups • Long, medium, or short running jobs • Different architectures • Various resources • Assigning a dedicated queue to a host/vnode • Peering jobs to another PBS complex 104 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Queues – Setting Access Control on Queues Queues can be configured so that only certain users, groups, or hosts can submit jobs to a particular queue. • This functionality is called an Access Control List – “ACL” • There are 3 types of access level <acl_type>: “user” a list of users who are allowed to enqueue jobs “group” a list of groups who are allowed to enqueue jobs “host” a list of hosts that are allowed to enqueue jobs To set an ACL on a queue: 1. Enable the ACL functionality for that queue: set queue <queue_name> acl_<acl_type>_enable = True 2. Assign a UNIX/Linux list of users, groups, or hosts that will have access: set queue <queue_name> acl_<acl_type>s += “<list of users, groups, or hosts>” 3. To restrict a user, use the minus operator symbol: set queue <queue_name> acl_<acl_type>s = “- <user>” 105 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Queues – Creating a Routing Queue Routing queues route jobs to an execution queue or to another routing queue • How can a routing queue be beneficial? • • • Allows users to submit to one queue instead of specifying at qsub Destination queues can be set up by ACL or resource restrictions Jobs can be routed to another PBS complex To create a routing queue named “routeq”: 1. create queue routeq 2. set queue routeq queue_type = Route 3. set queue routeq route_destinations += “my_queue” 4. set queue routeq enabled = True 5. set queue routeq started = True - List of execution or route queues to be routed to - Comma-separated 106 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Queues - Routing Queue Attributes Routing queues may also be configured with queue attributes such as: • • • • • route_lifetime max_queuable resources_max resources_min access control list (ACL) To prevent users from submitting jobs directly to an execution queue (thus bypassing the route queue), you can set the following attribute: Usage: set queue < queue_name> from _route_only = True To assign multiple execution queues as “route_destinations” : Usage: set queue <queue_name> route_destinations += “queue1, queue2, queue3” 107 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Queues – Assigning Queue Priorities Queues can be assigned a priority level between -1023 and +1024 • By default a new queue has a priority level set to 0 • Setting a non-default priority level serves two functions: 1) PBS Scheduler sorts the queues from high to low using this priority level for job sorting 2) Enables queue to be an Express Queue (by default, priority >= 150) • useful in determining which job to preempt when using Preemptive Scheduling Usage: set queue <queue_name> priority = <value> Example: set queue my_queue priority = 100 108 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Queues - How Updates Affect Jobs Any modifications made via the qmgr utility take place immediately and do not require the pbs_server daemon to be restarted Certain types of attributes will affect those jobs already queued but not running Using qmgr to delete a queue that has jobs enqueued or running is not allowed Alternative Methods: • May want to stop enqueuing jobs into the queue by setting enabled=false and let the queue drain the jobs • If waiting for the queue to drain is not an option - Option 1: use qdel to delete the jobs - Option 2: use qmove to move jobs to a different queue; or to another PBS complex 109 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Queues – Queue Status To obtain status of all the queues within a PBS complex: qstat –Q[f] Output of: qstat –Q Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type ---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ---workq 0 0 yes yes 0 0 0 0 0 0 Exec Output of: qstat –Qf Queue: workq queue_type = Execution total_jobs = 0 state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun:0 resources_assigned.ncpus = 0 resources_assigned.nodect = 0 enabled = True started = True 110 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Nodes - Understanding PBS Vnodes What is a host? • An instance of a single OS running • A machine What is a PBS MOM? • • • • • Executes the job script Reports back to the server when the job is completed Enforces some job resource limits Can manage multiple vnodes Tracks job resource usage What are vnodes? • An abstract object representing a set of resources which form a usable part of a machine - Can be one of the following: host, nodeboard, or blade • A single host can be made up of multiple vnodes 111 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Nodes - Viewing Existing Vnodes There are two methods to view the list of vnodes and their attributes in a PBS complex Method 1 Method 2 Within qmgr: list nodes @default Using pbsnodes –av at command line Node traintb16 traintb16 Mom = traintb16 Mom = traintb16 Port = 15002 Port = 15002 pbs_version = PBSPro_11.0.0.103450 pbs_version = PBSPro_11.0.0.103450 ntype = PBS ntype = PBS state = free state = free pcpus = 1 pcpus = 1 resources_available.arch = linux resources_available.arch = linux resources_available.host = trantb16 resources_available.host = traintb16 resources_available.mem = 1027124kb resources_available.mem = 1027124kb resources_available.ncpus = 1 resources_available.ncpus = 1 resources_available.vnode = traintb16 resources_available.vnode = traintb16 resources_assigned.mem = 0kb resources_assigned.mem = 0kb resources_assigned.ncpus = 0 resources_assigned.ncpus = 0 resources_assigned.netwins = 0 resources_assigned.netwins = 0 resources_assigned.vmem = 0kb resources_assigned.vmem = 0kb resv_enable = True resv_enable = True sharing = default_shared sharing = default_shared 112 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Nodes – Setting Vnode Attributes Attribute Description comment Assign a comment max_running Maximum number of jobs that can run on this vnode priority Vnodes can be sorted by a priority level state Shows or sets the state of the vnode. Useful for setting a vnode’s state to online/offline queue Associate a queue to a vnode sharing Defines whether more than one job at a time can use this vnode's resources. List of resource amounts available on this resources_available.<res> vnode. If not explicitly set, amount shown is that reported by pbs_mom running on the vnode. 113 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Nodes – Using “pbsnodes” Use “pbsnodes” to obtain a detailed listing of all the hosts or vnodes in a PBS complex Usage: pbsnodes <options> Example: Options pbsnodes -a Description a List all hosts and their attributes av List all vnodes and their attributes l Lists all hosts or vnodes with state=DOWN or state=OFFLINE 114 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Nodes – Output of “pbsnodes –a” pbsnodes –a traintb16 Mom = traintb16 Port = 15002 pbs_version = PBSPro_11.0.0.103450 ntype = PBS state = free pcpus = 1 resources_available.arch = linux resources_available.host = traintb16 resources_available.mem = 1027124kb resources_available.ncpus = 1 resources_available.vnode = traintb16 resources_assigned.mem = 0kb resources_assigned.ncpus = 0 resources_assigned.netwins = 0 resources_assigned.vmem = 0kb resv_enable = True sharing = default_shared 115 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Server - Server Log Information Server logs are stored on the host where the pbs_server daemon is running • Location: $PBS_HOME/server_logs • A new log file is created every day – File name format: [YYYYMMDD] The logging level is configurable using qmgr utility Usage: set server log_events = <value> Where <value> can be between 0 and 511 — 0 nothing is logged — 511 default log level — 2047 everything is logged; useful for debugging hooks Note: When changing server’s log_event it is not necessary to restart the pbs_server daemon 116 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Server – Details of Server Log Entry Sample of Server log entry: 09/14/2010 08:17:31;0002;Server@trainhp01;Svr;Log;Log opened 09/14/2010 08:17:45;0002;Server@trainhp01;Node;traintb16.prog.altair.com;node up 09/14/2010 08:18:36;0040;Server@trainhp01;Svr;traintb16;Scheduler sent command 3 syntax: date-time;event_code;server_name;object_type;object_name;message_text date-time event_code server_name object_type object_name message_text date and time stamp, format: mm/dd/yyyy hh:mm:ss numerical code for type of event name of the Server which logged the message type of object which the message is about: Svr=server Que=queue Job=job Req=request Fil=file Node=vnode Hook=hooks name of the specific object text of the log message 117 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Backing up Server, Queue & Vnode Settings PBS Administrators can safely back up their qmgr settings at the command line: 1. Output the server and queue settings: qmgr –c “ print server” > server_queue_settings 2. This command will print all attributes for all vnodes: qmgr –c “ print node @default” > vnodal_settings 3. This command will print all attributes for all hooks: qmgr –c “print hook” > hook_definitions To restore settings: qmgr < <input_file> 118 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Chapter Nine - PBS MOM Configuration Chapter Nine What is the PBS MOM? Directory structure of $PBS_HOME/mom_priv Contents of $PBS_HOME/mom_priv/jobs Configuration parameters Enforcing resource limits Restricting user logins Checkpoint and restart MOM log information Details of MOM logs Exercises 119 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. What is the PBS MOM? The PBS MOM is the component responsible for monitoring and executing PBS jobs, as well as the following: • • • • Reports resource usage Enforces resource usage limits Notifies the server when the job has finished Executes prologue/epilogue script Each execution host (MOM) has its own configuration file • Located in $PBS_HOME/ mom_priv/config • Provides several types of runtime information - Access control Static resource names and values External resources provided by a program to be run on request via a shell script • Each parameter is on a separate line and component parts are separated by white space • Default contents of mom_priv/config: $clienthost traintb16 $restrict_user_maxsysid 499 120 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Directory Structure of $PBS_HOME /mom_priv Directory structure of $PBS_HOME/mom_priv * mom_priv config Configuration file jobs When jobs are running the job script is placed in this directory vnodemap List of vnodes in a PBS complex mom.lock MOM pid lock file * This information is for debugging purposes only. It may change in future releases. 121 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Contents of $PBS_HOME /mom_priv/jobs Contents of $PBS_HOME/mom_priv/jobs * -rw------- 1 root root 3427 Jun 10 00:40 2.traintb16.JB -rwx------ 1 pbsuser01 users drwx------ 2 root root 22 Jun 10 00:40 2.traintb16.SC 4096 Jun 10 00:40 2.traintb16.TK If a job is running on a given host it creates 2 files and 1 directory for each job in the mom_priv/jobs directory <job_id>.<server_name>.JB used Contains job information such as resources <job_id>.<server_name>.SC User job script <job_id>.<server_name>.TK Directory containing that job’s task * This information is for debugging purposes only. It may change in future releases. 122 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS MOM – MOM Configuration Parameters Parameter Description $clienthost List of hosts allowed to connect to MOM $cputmult Factor to adjust CPU time used by each job $ideal_load Declares the ideal mark for load on a vnode $max_load Declares the high water mark for load on a vnode $kbd_idle Enables idle workstation cycle harvesting $logevent Determines the kind of information logged to MOM logs $max_check_poll Maximum time between polling cycles $min_check_poll Minimum time between polling cycles $prologalarm Timeout period for prologue/epilogue script $restricted List of hosts that are allowed to connect to MOM without needing a privileged port $restrict_user Controls whether normal users without a job running can log into the host $restrict_user_maxsysid Aany user with UID less than this value is exempt from $restrict_user $suspendsig Alternative signal to suspend job instead of SIGSTOP $usecp Tells MOM to use cp instead of rcp/scp for stdout/err file transfers $wallmult Factor used to adjust walltime usage by a job $tmpdir Specifies location of job scratch directory Note: After modifying the MOM’s config file, a ‘SIGHUP” must be sent to that pbs_mom daemon $jobdir_root job-specific staging and execution directories 123 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS MOM – Enforcing Resource Limits Each MOM can be configured to enforce job resource limits by setting the $enforce parameter in the mom_priv/config file Attribute Type Description average_cpufactor float Modifies cpuaverage; ncpus limit multiplier average_percent_over int Modifies cpuaverage; percentage over ncpus limit to allow average_trialperiod int Modifies cpuaverage; minimum walltime before enforcement Default 1.025 50 120s cpuaverage boolean enforce this limit off cpuburst boolean enforce this limit off Modifies cpuburst; ncpus limit multiplier 1.5 int Modifies cpuburst; percentage over the limit to allow 50 delta_weightup float Modifies cpuburst; weighting when average is moving up 0.4 delta_weightdown float Modifies cpuburst; weighting when average is moving down 0.1 Enforces each job’s memory limit off delta_cpufactor delta_percent_over mem float boolean 124 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS MOM - Restricting User Logins PBS Professional can be configured to kill user-owned processes when that user does not have a job running on that host through PBS • To configure this functionality, add the following parameter to the $PBS_HOME/mom_priv/config file: $restrict_user on Note: When this feature is turned on, all processes belonging to any users who log onto that execution host will be terminated, thus kicking them off • To create a list of users who are allowed when this featured is enabled: $restrict_user_exceptions userA, userB, userC Note: Up to 10 user names are allowed • To restrict users whose user ID is greater than a specified number: $restrict_user_maxsysid <number> 125 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS MOM – Checkpoint and Restart PBS administrators can use their own site-defined external checkpoint facility • This is useful on systems that don’t support OS-level checkpointing • Provided by application or other external means Site-specific checkpointing is configured in the MOM configuration file mom_priv/config by using the $action parameter and an action Action Argument Description checkpoint TIME_OUT !SCRIPT_PATH ARGS[…] Specifies that the script in SCRIPT_PATH is run and the job is left running checkpoint_abort TIME_OUT !SCRIPT_PATH ARGS[…] Specifies that the script in SCRIPT_PATH is run and the job is terminated restart TIME_OUT !SCRIPT_PATH ARGS[…] Specifies the script to be used to restart the job $restart_background true|false Specifies how the job is restarted $restart_transmogrify true|false Controls how MOM launches the restart script/program 126 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS MOM - MOM Log Information Each execution host has its own MOM log files • Location: $PBS_HOME/mom_logs • A new log file is created every day - file name format: [YYYYMMDD] The logging level is configurable in $PBS_HOME/mom_priv/config Usage: $logevent <value> Where <value> can be between 0 and 0xffffffff - 0 Nothing is logged - 0xffffffff All information is logged Note: When changing the log event a SIGHUP to the pbs_mom daemon signals it to reread the mom_priv/config file 127 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS MOM – Details of MOM Log Entry Sample of MOM log entry: 09/14/2010 11:35:55;0008;pbs_mom;Job;1.traintb16;Started, pid = 24073 09/14/2010 11:36:01;0080;pbs_mom;Job;1.traintb16;task 00000001 terminated 09/14/2010 11:36:01;0008;pbs_mom;Job;1.traintb16;Terminated syntax: date-time;event_code;server_name;object_type;object_name;message_text date-time field event_code pbs_daemon object_type Date and time stamp. Format: mm/dd/yyyy hh:mm:ss Numerical code for type of event pbs_mom Type of object which the message is about: Svr=server Que=queue Job=job Req=request Fil=file Node=vnode object_name message_text Name of the specific object Text of the log message 128 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Chapter Ten - PBS Scheduler Configuration Chapter Ten What is the PBS scheduler? Directory Structure of $PBS_HOME/sched_priv Default behavior of the scheduler Scheduler configuration file Default scheduling parameters Scheduler log information 129 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. What is the PBS Scheduler? What is the PBS scheduler? • The PBS daemon that is responsible for enforcing site policy, by choosing the order in which jobs are run, and on what resources • The scheduler provides various scheduling policies such as: - First in First Out (FIFO) - Sort jobs based on multiple resources (high to low or low to high) - Sort nodes based on resources or priority level - Sort queues based on priority level or by qstat –Q output order - Allow jobs from higher priority queues to be eligible to run first - Allow jobs to move between two or more PBS complexes - Allow jobs to run in a dedicated time space - Enforce fair portions of a site’s resources and usage 130 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Directory Structure of $PBS_HOME /sched_priv Directory structure of $PBS_HOME/sched_priv * sched_priv sched_config Scheduler configuration file dedicated_time Specifies dedicated time resource_group Specifies relative percentages between fairshare entities holidays Lists holidays to be treated as “nonprimetime” sched_out Debug messages sched.lock pbs_sched pid lock file * This information is for debugging purposes only. It may change in future releases. 131 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Default Behavior of the Scheduler What events happen within a scheduling cycle? 1. Server will send list of MOM resources to the Scheduler 1. Scheduler will sort all the resources based on default scheduling policies 1. Scheduler will sort queue(s) - If one or more queues have priority attribute set then sort based on queue priority - If no queue priority is set then it will randomly sort the queues or by qstat –Q output order - If a queue’s priority is set to 150 or higher jobs from this queue will be eligible for execution first - If a queue’s priority is set to 150 or higher and preemption is enabled, then preemptive scheduling will be enforced, allowing jobs from this queue to preempt other jobs 2. Scheduler will sort the jobs from the first queue - Jobs are sorted based on when they were enqueued - If a job has been marked “starving” and if the help_starving_jobs scheduling policy is turned on, it will move that job up in sort priority 132 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Using the sched_config file Parameter format: name: value [prime | non_prime | all | none] Name Description name Name of the scheduler parameter non-changeable value Type: string, string array, integer, boolean, time case-sensitive prime Applies only to primetime period case-sensitive non_prime Applies only to non-primetime period case-sensitive all Applies to both primetime and non-primetime periods; default if prime/nonprime is not specified case-sensitive none Not used • Primetime and non-primetime period are set in the sched_priv/holidays file • Must send a “kill –HUP <pbs_sched_pid>” in order for the Scheduler to re-read the configuration file • Any modifications may affect not only queued jobs but also running jobs 133 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Default Scheduling Parameters – sched_config Parameter Value Parameter Value round_robin: false load_balancing: false by_queue: true smp_cluster_dist: pack strict_ordering: false #unknown_shares: 10 help_starving_jobs: true fairshare_usage_res: cput max_starve: 24:00:00 fairshare_entity: euser backfill: true half_life: 24:00:00 backfill_prime: false sync_time: 1:00:00 prime_exempt_anytime_queues false #fairshare_enforce_no_shares: true #prime_spill: 1:00:00 preemptive_sched: true primetime_prefix: p_ preempt_queue_prio: 150 nonprimetime_prefix: np_ preempt_prio: "express_queue, normal_jobs“ #job_sort_key: "cput LOW” preempt_order: "SCR“ node_sort_key: "sort_priority HIGH” preempt_sort: min_time_since_start sort_queues: true true dedicated_prefix: ded resources: "ncpus, mem, arch, host, vnode“ log_filter: 3328 #sched_cycle_length 20:00:00 134 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Scheduler – Scheduler Log Information Scheduler logs are stored on the machine where the pbs_sched daemon is running (default) • Location: $PBS_HOME/sched_logs • A new log file is created every day – file name format: [YYYYMMDD] The logging level is configurable in $PBS_HOME/sched_priv/sched_config: Usage: log_filter: <value> Where <value> can be between 0 and 3328 • 0 Means to log everything • 3328 Default value • 4095 Log nothing Note: When changing the scheduler log event it is necessary to do a kill –HUP on the pbs_sched pid 135 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS Scheduler – Details of Scheduler Log Entry Sample of scheduler log entry: 09/14/2010 16:48:36;0080;pbs_sched;Req;;Starting Scheduling Cycle 09/14/2010 16:48:36;0080;pbs_sched;Req;;Leaving Scheduling Cycle 09/14/2010 21:45:47;0002;pbs_sched;Svr;Log;Log closed syntax: date-time;event_code;server_name;object_type;object_name;message_text date-time field Date and time stamp. Format: mm/dd/yyyy hh:mm:ss event_code Numerical code for type of event pbs_daemon pbs_sched object_type Type of object which the message is about: Svr=server Que=queue Job=job Req=request Fil=file Node=vnode object_name Name of the specific object message_text Text of the log message 136 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Chapter Twelve - Scheduling Custom Resources Chapter Twelve Custom Resources Resource Types Resource Flags Understanding the resourcedef file Different examples of using custom resources • Host/vnode level resource • Boolean resource • Server level resource • Queue level resource • Query execution hosts • Query FLEXlm server 137 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Scheduling Resources - Custom Resources The PBS Scheduler supports arbitrary resources, e.g. to track disk space, or application licenses Limiting resource usage for users, groups, queues, and vnodes influences the order in which jobs are started Resources may be tracked in two ways: • Internally by PBS: resources which are consumed by PBS jobs only • External scripts: resources which might be consumed by PBS jobs and/or outside of PBS Resources can exist at various levels • Host (vnode) level • Server and queue level Resource matching • Via arithmetic comparison for number and size type resources • Via string matching for Boolean and string resources 138 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Scheduling Resources – Resource Types Data Types Description Consumable/NON • defined at vnode level • used within a select statement non-consumable float • values [+-] 0-9 [[0-9] …][.][[0-9]…] consumable non-consumable long • values 0-9[[0-9]…] consumable non-consumable size • number of bytes or words consumable non-consumable • string value non-consumable • multiple string values separated by comma non-consumable boolean string string_array • maximum time period that resource can be time used • format: [hh:mm:ss[.ms]] consumable non-consumable 139 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Scheduling Resources – Resource Flags Flags h n f q <no flag> Description • host level resource, static or dynamic • used within select statement • host level resource, “n” means static • must also use flag “h” • host level resource • must also use flag “h” • server level resource • queue level resource • server level resource, no flag means dynamic • queue level resource i • invisible • users cannot request or qalter this resource • users cannot view the value using qstat –f r • read only • users cannot request or qalter this resource • users can view the value using qstat -f Consumable/NON non-consumable consumable consumable @ 1st vnode consumable non-consumable 140 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Scheduling Resources – resourcedef Custom resources are defined in: $PBS_HOME/server_priv/resourcedef • File needs to be created manually • Permissions must be set to 644 Format: resource_name type=<resource type> flag=<flag> Sample of resourcedef optistruct motionsolve radioss jobtype scratch gwu type=long type=boolean type=long type=string type=size type=long flag=hn flag=h flag=q flag=h Note: Any modifications to the resourcedef file require pbs_server to be restarted 141 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Custom Resource – Unset Resources Where resources are not set either at host/server/queue level PBS assigns default values based on the type of resource Host/Vnode Level Resource Type Unset Resource Request Value boolean False False float 0.0 0.0 long 0 0 size 0 0 string “” No match value string_array “” No match value time 0:00 0:00 Server/Queue Level • Numerical resources = infinite Custom resources can be set with infinite regardless at host/server/queue by setting the scheduler parameter: resource_unset_infinite 142 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Custom Resource – Host/vnode-Level Create a custom resource to be applied at the vnode level to indicate how much of that resource is available at a given time • Define the custom resource in resourcedef: • Set the value of the custom resource in qmgr: optistruct type=long flag=hn set node traintb01 resources_available.optistruct=2 set node traintb02 resources_available.optistruct=0 • Add the custom resource to sched_config file: resources: “ncpus, mem, arch, host, vnode, optistruct” • Request the custom resource: qsub –l select=1:ncpus=2:optistruct=1 143 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Custom Resource – Boolean Resource Create a custom resource to be applied at the vnode level. This custom resource will indicate whether or not that resource is available on a given vnode Define the custom resource in resourcedef: motionsolve type=boolean flag=h Set the value of the custom resource using qmgr: set node traintb01 resources_available.motionsolve=true set node traintb02 resources_available.motionsolve=false Add the custom resource to sched_config file: resources: “ncpus, mem, arch, host, vnode, motionsolve” Request the custom resource: qsub –l select=1:ncpus=2:motionsolve=true 144 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Custom Resource – Server Level Resource Create a custom resource to be applied at the server, to track how much of that resource is available globally at a given time • Define the custom resource within resourcedef: radioss type=long flag=q • Set the value of the custom resource using qmgr: set server resources_available.radioss=8 • Add the custom resource to sched_config file: resources: “ncpus, mem, arch, host, vnode, radioss” • Request the custom resource: qsub –l select=1:ncpus=2 –l radioss=1 145 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Custom Resource – Queue Level Resource Create a custom resource to be applied at the queue, to control whether or not a job can be en-queued based on how whether the job requests this resource • Define the custom resource within resourcedef: jobtype type=string • Set the value of the custom resource using qmgr: set set set Set queue queue queue queue radioss radioss radioss radioss resources_available.jobtype=radioss resources_min.jobtype=radioss resources_max.jobtype=radioss resources_default.jobtype= “ “ • Add the custom resource to sched_config file: resources: “ncpus, mem, arch, host, vnode, jobtype” • Request the custom resource: qsub –l select=1:ncpus=2 –l jobtype=radioss 146 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Custom Resource – Query Vnodes Create a custom resource to query vnodes using a call-out script • Define the custom resource within resourcedef: scratch type=size flag=h • Add the custom resource to sched_config file: resources: “ncpus, mem, arch, host, vnode, scratch” mom_resources: “scratch” • Set the path to the script name in mom_priv/config file: scratch !/usr/local/bin/scratch.pl • Request the custom resource: qsub –l select=1:ncpus=2:scratch=1GB 147 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Custom Resource – Query FLEXlm Server Create a custom resource to query the FLEXlm server to determine if enough FLEX tokens are available for execution • Define the custom resource within resourcedef: gwu type=long • Add the custom resource to sched_config file: resources: “ncpus, mem, arch, host, vnode, gwu” • Set the path to the script in sched_config file: server_dyn_res:”gwu !/var/spool/altair/scripts/lmstat” • Request the custom resource: qsub –l select=1:ncpus=2 –l gwu=50 148 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Chapter Eleven - Various Scheduling Policies Chapter Eleven Job priorities in PBS Sorting queues Helping starving jobs Eligible time Backfill Strict ordering True FIFO Preemptive scheduling Hard & soft limits Sorting jobs Tunable formula Round robin SMP cluster scheduling Sort execution hosts Placement sets Primetime & non-primetime Dedicated time Fairshare Peer scheduling Advance reservations Exercises 149 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Chapter Fifteen– Hooks Chapter Fifteen Concept Hook commands Setting up a custom hook Viewing hook definitions Exporting hook contents Exercises 150 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Hooks - Concept What are hooks? • Custom call-out executables that give more precise control over submitting jobs • Written in Python programming language • Example applications of hooks: - Allow/disallow en-queueing jobs based on user/group ID, amount of requested resources, timeframe Allow/disallow modifying job attributes of already-submitted jobs Allow/disallow moving jobs to another execution queue or PBS complex Allow/disallow requesting an advance/standing reservation Look up 3rd party database for credentials • To view hook logging information within the server logs the server log_events attribute should be set to: 2047 • Only root can create hooks 151 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Hooks - Commands Hook commands in qmgr: Command Description list hook <hook name> List a hook’s attributes and their values print hook <hook name> Print a hook’s creation commands create hook <hook name> Create a new hook name set hook <hook name> <attribute name> = <value> Set a hook’s attribute unset hook <hook name> <attribute name> = <value> Unset a hook’s attribute import hook <hook name> <content-type> <contentencoding> <input file>|- Import a hook’s python script file export hook <hook name> <content-type> <contentencoding> <output file>|- Export a hook’s python script to a file delete hook <hook name> Remove a hook and its definition 152 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Hooks - Adding a Hook Steps to add a hook, using qmgr: 1. Add the hook name • first character must be alphabetic Qmgr: create hook <hook_name> 2. Set the type of trigger event • can have multiple events associated with a single hook, using “+=“ Qmgr: set hook <hook_name> event = <event_name> <event_name> Description queuejob To allow/disallow enqueueing a job into a queue modifyjob To allow/disallow modifying job attributes resvsub To allow/disallow reservation requests by users movejob To allow/disallow moving jobs to another queue or PBS complex runjob To allow modifications of running jobs provision To provision a host 153 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Hooks - Adding a Hook, cont. 3. Specify the path and name of the Python script Qmgr: import hook <hook_name> application/x-python <content-encoding> \ <path/filename> <content-encoding> default (7bit) Note: when importing a hook, PBS will try to evaluate the script. If it cannot, it will report the information at the command line and in the server logs base64 Additional options: 4. Relative order of hook execution; default = 1 (highest level) Qmgr: set hook <hook_name> order = <n> 5. Specify a timeout value for hook execution; default = 30 seconds Qmgr: set hook <hook_name> alarm = <n> 154 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Hooks - Adding a Hook, cont. 6. Enable or disable a particular hook; default = true Qmgr: set hook <hook_name> enabled = <Boolean> Note: The pbs_server daemon does not need to be restarted for a hook to be active 155 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Hooks - Viewing Hook Information Printing hook creation commands: Qmgr: print hook hookA create hook hookA set hook hookA type = site set hook hookA enabled = true set hook hookA event = ‘””’ set hook hookA user = pbsadmin set hook hookA alarm = 30 set hook hookA order = 1 Listing hook attributes: Qmgr: import hook hookA application/x-python base64 - list hook hookA Hook hookA type = site enabled = true event = “” user = pbsadmin alarm = 30 order = 1 156 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Hooks – Exporting Hook Contents Reasons to use the export command • To view the current script content • To make a backup of the python script • To make modifications to the python script • To export a hook’s Python script to a file To export a hook’s Python script to a file: Qmgr: export hook <hook_name> application/x-python <content-encoding> \<path/filename> Note: if output file is not specified then it will be stdout To back up hook information: qmgr –c “print hook <hook_name>” > hook_file 157 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Exercises Reject a job that doesn’t specify a walltime (event = queuejob) Prevent users from altering any of their job attributes once submitted (event = modifyjob) Prevent users from requesting a Reservation (event = resvsub) Prevent users from moving their job to another queue or to another PBS complex (event = movejob) 158 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Exercise – queuejob Hook Objective: To reject jobs at submission time that do not request walltime resource; the Python script is already provided Prerequisites: Disable any existing hooks PBS Administrator Tasks: 1. Use qmgr to create a hook called queuejob 2. Set the event as queuejob 3. The Python script is located in /root/hook_scripts/queuejob.py 4. Leave the default attribute values as they are PBS User Task: 1. Submit job without requesting any walltime resource Observation: • When submitting a job without requesting walltime resource, what, if any, message appears at the command line? • Was the job enqueued or was it rejected? 159 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Exercise – modifyjob hook Objective: To disallow users from qaltering any of their submitted jobs’ attributes/resources; the Python script is already provided Prerequisites: Disable any existing hooks PBS Administrator Tasks: 1. Using qmgr create a hook called modifyjob 2. Set the event as modifyjob 3. The Python script is located in /root/hook_scripts/modifyjob.py 4. Leave the default attribute values as they are PBS User Task: 1. Submit job 2. qalter any one of a job’s attributes Observation: • Was the user able to qalter the job? • If not, what error message, if any, was output? 160 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Exercise – resvsub hook Objective: To prevent users from requesting reservations Prerequisites: Disable any existing hooks PBS Administrator Tasks: 1. Using qmgr create a hook called resvsub 2. Set the event as resvsub 3. The Python script is located in /root/hook_scripts/resvsub.py 4. Leave the default attribute values as they are PBS User Task: 1. Request a reservation using pbs_rsub Observation: • Was the user able to request a reservation? • If not, what error message, if any, was output? 161 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Exercise – movejob hook Objective: Prevent users from moving their jobs to another queue or PBS complex Prerequisites: Disable any existing hooks Should have at least 2 active queues PBS Administrator Tasks: 1. Using qmgr create a hook called movejob 2. Set the event as movejob 3. The Python script is located in /root/hook_scripts/movejob.py 4. Leave the default attribute values as they are PBS User Task: 1. Qsub a job that should remain queued; not ready for execution 2. Using qmove command, try to move it to another queue Observation: • When trying to qmove, did it move that job where you requested? • If not, what error message if any was output? 162 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Chapter Sixteen – Miscellaneous Chapter Sixteen PBS user and administrator commands PBS_EXEC/etc directory PBS_EXEC/unsupported/pbs_diag * PBS_EXEC/unsupported/pbs_dtj * Re-Installation of PBS Professional * These scripts are not supported. 163 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. PBS User & Administrator Commands User Commands Administrator Commands Command Purpose Command Purpose nqs2pbs Convert from NQS pbs-report Report job statistics pbs_rdel Delete Reservation pbs_hostid Report host identifier pbs_rstat Status Reservation pbs_hostn Report host name(s) pbs_rsub Submit Reservation pbs_probe PBS diagnostic tool pbsdsh PBS distributed shell pbs_rcp File transfer tool qalter Alter job pbs_tclsh TCL with PBS API qdel Delete job pbsfs Show fairshare usage qhold Hold a job pbsnodes Node manipulation qmove Move job printjob Report job details qmsg Send message to job qdisable Disable a queue qorder Reorder jobs qenable Enable a queue qrls Release hold on job qmgr Manager interface qselect Select jobs by criteria qrerun Re-queue running job qsig Send signal to job qrun Manually start a job qstat Status job, queue, server qstart Start a queue qsub Submit a job qstop Stop a queue tracejob Report job history qterm Shut down PBS 164 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. $PBS_EXEC/etc directory The directory $PBS_EXEC/etc contains backup PBS configuration files such as the following, if you ever need to revert back to the default configuration: Filename Description pbs_dedicated Dedicated time file pbs_holidays Holidays file pbs_init.d PBS init run script pbs_postinstall PBS postinstall script pbs_resource_group Fairshare resource group file pbs_sched_config Scheduler configuration file 165 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. $PBS_EXEC/unsupported directory – pbs_diag The pbs_diag* script is an interactive script that collects information from PBS configuration files and job-related history Information that is collected: • • • • • • • • • • • • • • qmgr settings for server, queues, and nodes pbs_probe information about file permissions pbs.conf master configuration information pbsnodes node configuration/state information qstat information about current state of the queues and server information about existing reservations pbs_hostn name resolution information operating system version information server, scheduler, and mom configuration files tracejob and logging information for jobs specified by the user server, scheduler, and mom logs for dates specified by the user cpuset configuration information and current state if on a cpuset-aware system vnode definition files FLEXlm license server status * pbs_diag is not supported. 166 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. $PBS_EXEC/unsupported directory – pbs_dtj pbs_dtj* (Distributed TraceJob) is a command that enables a user to gather tracejob information from ALL of the nodes where a PBS Professional job ran By default, the script uses rsh to connect to the nodes, although it will check the pbs.conf file to see if PBS_SCP is set, and use ssh in that case Usage: pbs_dtj <option> Option Description -u <username> Specify a user name under which to run -r <rcommand> Override the rsh/ssh settings in pbs.conf -n Number of days of log files to query * pbs_dtj is not supported. 167 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Re-installation of PBS Professional Procedure to re-install PBS Professional either from the server or an execution host 1. Shut down any PBS daemons running on that host: /etc/init.d/pbs stop 2. Verify the PBS daemons are no longer running: ps –ef | grep pbs 3. Obtain the appropriate PBS rpm package name: rpm –qa | grep pbs 4. Remove the PBS rpm package: 5. Remove the directories $PBS_HOME and $PBS_EXEC 6. Remove the file /etc/pbs.conf 7. Remove the file /etc/init.d/pbs rpm –e pbs-11.0.0.103450 Refer to the Installation part of the PBS Professional Installation and Upgrade Guide for complete installation procedure 168 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Chapter Seventeen - Troubleshooting Chapter Seventeen pbs_probe pbs_hostn 169 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Using pbs_probe If a site has a post-installation issue, running the pbs_probe command may help identify the cause and possible fix Using the pbs_probe command returns the following information ====== System Information ======= sysname=Linux nodename=traintb16 release=2.6.22.5-31-default version=#1 SMP 2007/09/21 22:29:00 UTC machine=i686 === No PBS Infrastructure Problems Detected === Options: -v verbose mode -f fix mode (checks & fixes directory permissions) 170 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Using pbs_hostn If a PBS site has hostname resolution issue, using the pbs_hostn command will help identify the problem The command reports the results from gethostbyname and gethostbyaddr system calls Example: pbs_hostn –v traintb16 primary name: traintb16.prog.altair.com (from gethostbyname()) aliases: traintb16 address length: 4 bytes address: 204.235.21.130 (33554559 dec) name: traintb16.prog.altair.com 171 Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Conclusion - Survey Monkey Please take the opportunity to help assist us by filling out a quick online survey regarding this training class The web link is bookmarked under the Bookmarks pull down menu in FireFox Please make sure you click on “SUBMIT” when finished THANK YOU 172