PBS Job Management and Taskfarming

advertisement
PBS Job Management and Taskfarming
Joachim Wagner 2008-07-24
Outline
Why do we need a cluster?
Architecture
Machines
Software
Job Management
Running jobs
Commands
PBS Job discriptions
Taskfarming
Plans
Why do we need a cluster?
Resource conflicts
Waiting for colleague’s job to finish
Trouble, e.g. disk full
Medium-size jobs
Too big for desktop PC
Too small for ICHEC
Preparation of ICHEC runs
Learning
Cluster Architecture
School
Network
maia.computing
Separate
.dcu.ie
Network
Logins
Home
Software
Job Queue
Nodes
Installed Software
OpenMPI
SRILM
MaTrEx, Moses, GIZA++
XLE, Sicstus
Johnson & Charniak’s reranking parser
In progress:
LFG AA, incl. function labeller
PBS Job Management
User: Job
Description
Job Submission
Job Queue
Job Scheduler
Job Execution
Nodes are allocated
job-exclusive for the
duration of the job
PBS Job Management Commands
qsub myjob.pbs
submits a job
PBS description: shell script with #PBS commands
(ignored by shell, see next slide)
qstat, qstat –f jobnumber
qdel jobnumber
pbsnodes –a
list all nodes with status and properties
PBS Job Description
Number of nodes
CPUs/node
Notification: end,
begin and abort
Maximum runtime
Example: Memory-Intensive Job
Node Properties
min4GB, min8GB: at least this much
mem4GB, mem8GB: exactly this much
long: please use this property for long jobs
will leave nodes that do not have this property
available for other jobs
no limit enforced, but 24 h seems reasonable
Future:
16 and 32 GB
CPU
local disk space (/tmp and swap)
CPU-Intensive Jobs
Parallelisable, for example
Sentence by sentence processing
Cross-validation runs
Parameter search
Split into parts
Run each part on a different CPU
Taskfarming
PBS Job
Description
Taskfarming
Executable
(n instances)
child
process
reading
1 Master
Task file
(.tfm):
one task
per line
n-1 Worker
MPI
Communication
Task
execution
Taskfarming Executable
If Instance ID == 0
Run master code loop:
Read .tfm file (arg 1)
Send lines to worker
Exit if no more task and all worker finished
Else
Run worker loop:
Ask master for a task
Execute task
Exit if master has no more tasks
Example: Taskfarming PBS File
Example: Taskfarming TFM File
Example: Taskfarming Helper Script
run-package.sh
Example: Taskfarming in Action
CPU 1
CPU 2
CPU 3
CPU 4
000
001
002
003
006
005
004
008
007
011
009
010
012
idle
Master: reads .tfm and distributes tasks
time
Example: Non-Terminating Task
CPU 1
CPU 2
CPU 3
CPU 4
000
003
001
002
006 (does not terminate)
005
004
008
007
009
010
011
idle
idle
012
Master: reads .tfm and distributes tasks
Killed at
Walltime
Limit
Estimating the PBS Walltime Parameter
Collect durations from test run
Usually high variance of execution time
Long sentences
Parameters
Don’t use #packages x avg. time per package
High risk (~50 %) that more time is needed
Instead: Random sampling with observed
package durations:
/home/jwagner/tools/walltime.py
Effect of Task Size
Job will wait for last task to finish (or be killed
when walltime limit is reached)
What if a task crashes?
Results are incomplete
Next tasks is executed
What if a task does not terminate?
Results are incomplete
Fewer CPUs available for remaining tasks
Overhead of starting tasks
Considering Multiple CPUs per Node
4 CPU cores per node
8 GB node -> 2 GB per core
CPUs compete for RAM
Swapping of one task effects the 3 other tasks
Relatively slow CPUs in nodes
Compared to new desktop PCs
Optimise throughput of cluster / EUR
Not: throughput of node or CPU
Depends on application
Plans
Fix sporadic errors of taskfarm.py
re-implentation in C
XML-RPC-based taskfarming
http-based
Run master on maia
Run workers also outside the cluster
Set parameters at runtime
Add more nodes (CNGL)
Install additional software
Questions
?
Contact:
Joachim Wagner
CNGL System Administrator
jwagner@computing.dcu.ie
(01) 700 6915
Download