PBS and ARC integration to P

advertisement
PBS, LSF and ARC integration
Zoltán Farkas
zfarkas@sztaki.hu
MTA SZTAKI LPDS
06/08/10
Outline
•Introduction
•Requirements
•PBS and LSF
•ARC
•Architecture of P-GRADE Portal runtime layer
•PBS/LSF integration
•ARC integration
•Summary
06/08/10
PBS, LSF and ARC
2
Introduction
•P-GRADE Portal supported gLite, Globus
•ETHZ requirement:
•Make use of PBS local clusters
•Make use of LSF local clusters (Brutus)
•Sometimes make use of ARC grid resources
•All this should be integrated within P-GRADE
Portal
06/08/10
PBS, LSF and ARC
3
PBS (and LSF)
•Portable Batch Scheduler
•(Load Sharing Facility)
•Schedule users' jobs on a cluster
•Interactive login to a submission node
•Users execute different commands:
•qsub (bsub): submit
•qstat (bjobs): status
•qdel (bkill): abort
Submission
Node
Scheduler
node
Cluster
node
Cluster
node
Cluster
node
06/08/10
PBS, LSF and ARC
Cluster
node
4
Cluster
node
ARC
•Advanced Resource Connector
•Complete grid middleware with:
•Information system
•Command-line clients with integrated broker
•Data management stack (GridFTP)
•Usable through client programs:
•Job description: xRSL
•ngsub: submit
•ngstat: status update
•ngkill: cancel
•ngget: get results
06/08/10
PBS, LSF and ARC
5
P-GRADE Portal Architecture
•Workflow Editor-related components
•Portlet-related components
•Workflow data storage
•Execution layer
•See next slide!
06/08/10
PBS, LSF and ARC
6
P-GRADE Portal Machine
Apache Tomcat servlet container
GridSphere portal framework
P-GRADE
Portal
Portlet
P-GRADE
Portal
Portlet
P-GRADE
Portal
Portlet
P-GRADE
Portal
Portlet
Workflow
Editor
Servlet
Workflow
Editor
Client
P-GRADE Portal's filesystem
DAGMan
Globus scripts
Common workflow and
job execution scripts
PBS scripts
Globus Grid
06/08/10
EGEE scripts
PBS
Cluster
PBS, LSF and ARC
User
Workflow
Data
EGEE Grid
LSF and PBS integration I.
•Principal idea:
•User should be able to configure a remote ssh connection to
submission nodes through the Settings portlet
•Connection is established using ssh keypairs
•Established connections are reused in order to minimize ssh
connection attempts
•Connections are used on a:
•Per-user,
•Per-resource bassis
→ a given user's connection isn't accessible by other users
→ different resources use different connections
06/08/10
PBS, LSF and ARC
8
LSF and PBS integration II.
Portal Machine
PUB
LSF resource 1
PUB
PBS resource 1
PUB
LSF resource 2
PUB
LSF resource 3
PUB
PBS resource 2
Connection Pool User 1
PRIV
Connection Pool User 2
PRIV
06/08/10
PBS, LSF and ARC
9
LSF and PBS integration III.
•Job preparation:
•wkf_pre_LSF.sh: prepare job, wrapper, collect files
•wkf_pre_PBS.sh: prepare job, wrapper, collect files
•Job execution:
•wkf_LSF.sh: submit and observe job using b* commands
•wkf_PBS.sh: submit and observer job using q* commands
•Wrappers:
•LSF_fake.sh: handle generator and collector jobs, run exe
•PBS_fake.sh: handle generator and collector jobs, run exe
•Job post-processing:
•No real task (wkf_post_LSF.sh and wkf_post_PBS.sh)
06/08/10
PBS, LSF and ARC
10
LSF and PBS integration features
•Full PS support
•Very quick response time compared to grid
middlewares
•Support for any kind of executable
06/08/10
PBS, LSF and ARC
11
ARC integration I.
•Very similar to the EGEE support
•An ARC client stack has to be installed on the PGRADE Portal machine
•Users can gain access with X.509 proxy certs
•Two possible resource selections:
•User can specify the target cluster
•Cluster can be selected by client broker
06/08/10
PBS, LSF and ARC
12
ARC integration II.
•Job preparation: wkf_pre_nordugrid.sh
•Wrapper script preparation
•Generator-related cleanups (as needed)
•Autogenerator-related file uploads (as needed)
•Job execution: wkf_nordugrid.sh
•xRSL prepared based on job properties
•Job submission and management using ng* commands
•Wrapper script: manage generator and collector jobs if
needed
•Job post-processing: wkf_post_nordugrid.sh
•No real job to perform
06/08/10
PBS, LSF and ARC
13
ARC integration features
•Full PS support
•Offers the possibility to select execution
resource
•Support for any kind of executable
•Multi-node job support
•Offers possibility to specify
runTimeEnvironment attributes
06/08/10
PBS, LSF and ARC
14
Summary
•PBS, LSF and ARC integration was relatively
simple thanks to the pluggable architecture of PGRADE Portal
•However, the devil is in the details:
•Ssh connection sharing + parallel connection limits
•Proper LSF job cancel
•…
06/08/10
PBS, LSF and ARC
15
Download