PBS, LSF and ARC integration Zoltán Farkas zfarkas@sztaki.hu MTA SZTAKI LPDS 06/08/10 Outline •Introduction •Requirements •PBS and LSF •ARC •Architecture of P-GRADE Portal runtime layer •PBS/LSF integration •ARC integration •Summary 06/08/10 PBS, LSF and ARC 2 Introduction •P-GRADE Portal supported gLite, Globus •ETHZ requirement: •Make use of PBS local clusters •Make use of LSF local clusters (Brutus) •Sometimes make use of ARC grid resources •All this should be integrated within P-GRADE Portal 06/08/10 PBS, LSF and ARC 3 PBS (and LSF) •Portable Batch Scheduler •(Load Sharing Facility) •Schedule users' jobs on a cluster •Interactive login to a submission node •Users execute different commands: •qsub (bsub): submit •qstat (bjobs): status •qdel (bkill): abort Submission Node Scheduler node Cluster node Cluster node Cluster node 06/08/10 PBS, LSF and ARC Cluster node 4 Cluster node ARC •Advanced Resource Connector •Complete grid middleware with: •Information system •Command-line clients with integrated broker •Data management stack (GridFTP) •Usable through client programs: •Job description: xRSL •ngsub: submit •ngstat: status update •ngkill: cancel •ngget: get results 06/08/10 PBS, LSF and ARC 5 P-GRADE Portal Architecture •Workflow Editor-related components •Portlet-related components •Workflow data storage •Execution layer •See next slide! 06/08/10 PBS, LSF and ARC 6 P-GRADE Portal Machine Apache Tomcat servlet container GridSphere portal framework P-GRADE Portal Portlet P-GRADE Portal Portlet P-GRADE Portal Portlet P-GRADE Portal Portlet Workflow Editor Servlet Workflow Editor Client P-GRADE Portal's filesystem DAGMan Globus scripts Common workflow and job execution scripts PBS scripts Globus Grid 06/08/10 EGEE scripts PBS Cluster PBS, LSF and ARC User Workflow Data EGEE Grid LSF and PBS integration I. •Principal idea: •User should be able to configure a remote ssh connection to submission nodes through the Settings portlet •Connection is established using ssh keypairs •Established connections are reused in order to minimize ssh connection attempts •Connections are used on a: •Per-user, •Per-resource bassis → a given user's connection isn't accessible by other users → different resources use different connections 06/08/10 PBS, LSF and ARC 8 LSF and PBS integration II. Portal Machine PUB LSF resource 1 PUB PBS resource 1 PUB LSF resource 2 PUB LSF resource 3 PUB PBS resource 2 Connection Pool User 1 PRIV Connection Pool User 2 PRIV 06/08/10 PBS, LSF and ARC 9 LSF and PBS integration III. •Job preparation: •wkf_pre_LSF.sh: prepare job, wrapper, collect files •wkf_pre_PBS.sh: prepare job, wrapper, collect files •Job execution: •wkf_LSF.sh: submit and observe job using b* commands •wkf_PBS.sh: submit and observer job using q* commands •Wrappers: •LSF_fake.sh: handle generator and collector jobs, run exe •PBS_fake.sh: handle generator and collector jobs, run exe •Job post-processing: •No real task (wkf_post_LSF.sh and wkf_post_PBS.sh) 06/08/10 PBS, LSF and ARC 10 LSF and PBS integration features •Full PS support •Very quick response time compared to grid middlewares •Support for any kind of executable 06/08/10 PBS, LSF and ARC 11 ARC integration I. •Very similar to the EGEE support •An ARC client stack has to be installed on the PGRADE Portal machine •Users can gain access with X.509 proxy certs •Two possible resource selections: •User can specify the target cluster •Cluster can be selected by client broker 06/08/10 PBS, LSF and ARC 12 ARC integration II. •Job preparation: wkf_pre_nordugrid.sh •Wrapper script preparation •Generator-related cleanups (as needed) •Autogenerator-related file uploads (as needed) •Job execution: wkf_nordugrid.sh •xRSL prepared based on job properties •Job submission and management using ng* commands •Wrapper script: manage generator and collector jobs if needed •Job post-processing: wkf_post_nordugrid.sh •No real job to perform 06/08/10 PBS, LSF and ARC 13 ARC integration features •Full PS support •Offers the possibility to select execution resource •Support for any kind of executable •Multi-node job support •Offers possibility to specify runTimeEnvironment attributes 06/08/10 PBS, LSF and ARC 14 Summary •PBS, LSF and ARC integration was relatively simple thanks to the pluggable architecture of PGRADE Portal •However, the devil is in the details: •Ssh connection sharing + parallel connection limits •Proper LSF job cancel •… 06/08/10 PBS, LSF and ARC 15