Cost-effective clustering with OpenPBS Ben Webb WGR Research Group

advertisement
Cost-effective clustering with
OpenPBS
Ben Webb
WGR Research Group
Physical and Theoretical Chemistry Lab.
University of Oxford
13/02/2003
Cost-effective clustering with OpenPBS
1
Overview
• History of PBS
• Interests of the WGR group
• OpenPBS architecture: portability,
security, scheduling
• Grid integration
• Alternatives
13/02/2003
Cost-effective clustering with OpenPBS
2
History of PBS
•
•
•
•
PBS is the “Portable Batch System”
Developed from 1993 to 1997 for NASA
Intended to replace NQS
Currently available as:
– OpenPBS (open source)
– PBSPro (commercial)
13/02/2003
Cost-effective clustering with OpenPBS
3
Interests of the WGR group
• High throughput
– Virtual screening (cancer screensaver)
– Met by loose “grid” of over 2 million PCs;
United Devices/Intel
• High performance
– Ab initio chemistry
– Simulation of chemical reactions (free energy)
– Met by OpenPBS at zero software cost
13/02/2003
Cost-effective clustering with OpenPBS
4
OpenPBS architecture
• Server: keeps track of all jobs
• Scheduler: tells the server when and where to
run jobs
• MOM (Machine Oriented Miniserver): runs on
each node to start, monitor, and terminate jobs,
under instruction from the server
• POSIX compliant batch system
• Supports file staging for executables and data
• No need for shared filesystem (e.g. NFS)
although this does simplify communication
13/02/2003
Cost-effective clustering with OpenPBS
5
An example OpenPBS setup
13/02/2003
Cost-effective clustering with OpenPBS
6
Advantages of PBSPro
•
•
•
•
•
Pre-emptive job scheduling
Scheduler backfilling
Improved fault tolerance
“Desktop Cycle Harvesting”
Paid support (all OpenPBS support is via
mailing lists)
• Largely compatible with OpenPBS
13/02/2003
Cost-effective clustering with OpenPBS
7
Portability
• Runs on most Unix-like systems: e.g.
Linux/Irix/Unicos/HPUX/IA64 etc.
• MOMs for various architectures take
advantage of system-specific features
– e.g. checkpointing supported on certain
architectures
• Full server/client/MOM support for
heterogeneous networks
13/02/2003
Cost-effective clustering with OpenPBS
8
Queues and nodes
• Unlike NQS, PBS does not rely on queues
for scheduling decisions
• Queues are not tied to nodes, but can
specify resources
• Routing queues can pass jobs to
execution queues, possibly on different
PBS servers
• Nodes can have any number of virtual
processors
13/02/2003
Cost-effective clustering with OpenPBS
9
Resource definition
• Server-defined properties group nodes
into classes - e.g. “intel” for all Intel
architecture machines
• Additional resources (e.g. tape drives,
software licences) can be specified by
each MOM
– Custom resources are not utilised by the
default scheduler
13/02/2003
Cost-effective clustering with OpenPBS
10
Resource usage
• Timeshared nodes: balanced by load
• Cluster nodes: jobs allocated to virtual
processors, usually exclusively
• MOMs track jobs and kill any that exceed
resource limits (e.g. CPU or wall time, memory)
• No unified mechanism for accounting of running
and finished jobs
– qstat for running jobs
– Server accounting logs for finished jobs
13/02/2003
Cost-effective clustering with OpenPBS
11
Scheduling
•
•
•
•
Scheduler is just a privileged client
Well-defined PBS scheduling API
Facilities to write schedulers in C/BaSL/Tcl
OpenPBS provides a simple FIFO scheduler, as
well as custom schedulers to take advantage of
system-specific features
• Maui scheduler (third party) also integrates with
other batch systems, and provides powerful
scheduling
13/02/2003
Cost-effective clustering with OpenPBS
12
Security
• Uses rhosts mechanism for authentication of
clients to the server (consistent user name
space not required), but does not require rsh
• MOMs can use rsh, ssh or cp (via NFS) to stage
files in and out
• Access Control Lists can also be used to provide
extra security
• PBS daemons use non-random port numbers,
and TCP for most communication, allowing
straightforward firewalling
• All daemons run as root! (No reported
vulnerabilities to date, however.)
13/02/2003
Cost-effective clustering with OpenPBS
13
Parallel support
• Conventional MPI mechanisms rely on wellbehaved users, and lack resource tracking
• OpenPBS provides a Task Manager (TM) API
– Allows parallel PBS jobs to spawn processes on
nodes other than the master
– mpiexec (third party) allows start-up of MPI jobs via
the TM mechanism (MPICH/EMP/LAM)
– Current LAM CVS also has a PBS-TM boot SSI
(system services interface) for job start-up
13/02/2003
Cost-effective clustering with OpenPBS
14
Customisation
• Full source code available, for commercial or
non-commercial use
• Site-specific modification routines allow easy
customisation of “likely targets”
• Defined C API for job submission, query etc.
• Third-party projects and patches, e.g. mpiexec,
Cplant (fault tolerance), PyPBS, scalability
patches, AFS token management
13/02/2003
Cost-effective clustering with OpenPBS
15
Grid integration
• Globus Resource Allocation Manager
(GRAM) available for PBS
• Maui scheduler or PBSPro default
scheduler support advance reservations
• Silver metascheduler is grid-aware, has
full support for PBS, and can work with or
without Globus
13/02/2003
Cost-effective clustering with OpenPBS
16
Comparison with Sun Grid Engine
• Both systems perform balancing of
jobs/load between managed nodes
• PBS server is a single point of failure;
SGE supports shadow masters
• SGE seems to now be more actively
developed than OpenPBS
13/02/2003
Cost-effective clustering with OpenPBS
17
Summary and acknowledgements
• OpenPBS is a cheap solution for Linux
clustering, conventional supercomputer
management, and/or use of idle
workstations
• Can upgrade easily to PBSPro if desired
PBS includes software developed by NASA Ames Research Center, Lawrence
Livermore National Laboratory, and Veridian Information Solutions, Inc. Visit
www.OpenPBS.org for OpenPBS software support, products, and information.
WGR group webpages: http://bellatrix.pcl.ox.ac.uk/
13/02/2003
Cost-effective clustering with OpenPBS
18
Download