WS-JDML: A Web Service Interface for Job Submission and Monitoring Stephen MCGough William Lee London e-Science Centre Department of Computing, Imperial College London What Services do we need to make the Grid Work? • One of the key services required is job submission – The ability to transparently submit a job to a resource (potentially through a DRM) where it will run • Many DRM systems exist (Condor, Globus, SGE etc…) – Each have their own way to define a job (language) – Each have their own submission mechanism (command line, API, Service) 2 The Problem • Submitting jobs requires – Knowledge of the job definition procedure – The ability to interface with the appropriate DRM • The Solution – One common Job description language that can be used with all resources (eg RSL) – A generic submission system for jobs • Using community based standards that are in common use 3 Generic Job Submission JDML Web Services WS-JDML 4 Web Service • We are using a plain “Vanilla” Web Service – Don’t rely on any proposed WS standards – Don’t need anything more than core standards for this simple service • Developed in Java • Our work has been deployed into the J2EE enterprise platform – This enables • Scalability • Fault tolerance 5 Job Description Markup Language JDML • Originally developed from Condor ClassAds • Developed for the European DataGrid project • Used within the Imperial College ICENI project • This work is now feeding into the Global Grid Forum Job Submission Description Language standardisation work • JDML will morph to become JSDL 6 JDML (2) • JDML documents are written in sections – – – – What job to run The environment to run the job in Where to get files from Where to send files to at the end • JDML is strongly typed • Consists of name/value pairs 7 JDML (3) • Can have DRM specific sections – It must be safe to ignore this section and the job still work correctly – Seen as a set of hints to the DRM • File transfer is defined for multiple protocols – Grid FTP, HTTP, copy etc… – Each file may have multiple of these definitions • DRM can select the appropriate ones to use 8 WS-JDML Architecture 9 Job Submission Port Type • Takes a JDML document describing the job to run • Validates the JDML so that an immediate response can be given • Validates user credentials, passed as part of the SOAP header, using WS-Security • Job is then placed into queue before being processed into a DRM specific version and deployed locally 10 Job Submission Port Type (2) • Various results – Unrecognised Job Term • The JDML contains some term that the Service doesn’t understand – Invalid Job Term • The JDML has a term which has the wrong type or an invalid value – Successful Submission • URI to identify the job instance is returned 11 Job Monitoring Port Type • This port provides a means to observe the current status of a job and manipulate the output transfer mechanism • Requires the URI representing a job provided from job submission • Current job status is returned – pending, scheduled, running, suspended, done, exit – Not all DRMs support all states 12 Job Monitoring Port Type (2) File Transfers • Port provides the ability to – Get portions of the files specified in the JDML transferred – Override the transfer methods given in the JDML – Indicate that files should be transferred back as attachments to the SOAP document • Allows easy monitoring of the job progress 13 Deployment • DRM Specific Translators have been obtained from existing code within the ICENI project – These include Shell, SGE, Globus and Condor • Web Service architecture has been deployed in Java J2EE 1.4 platform – This provides a number of support features for the services. 14 Demo • Hopefully • http://rhea.lesc.doc.ic.ac.uk:9999/jdmljobservice • Need to run over SSH 15 Further Work • Job State Transition – The ability to represent the status of a job running within a resource • Notification – Currently to monitor a job requires the polling of the monitoring port • Would be better if notifications to a sink service through say WSNotification • Job Term Semantics – Definition of job terms using natural language – No formal model makes JDML transformation error prone – Develop an Ontology for Job submission terms 16 What do you use to build your service? • Widely Implemented Standard Specification (1pt) – <Demonstrable Multiple Implementations, e.g. SOAP, WSDL> • Implemented draft specification (2pt) – <Specification in standards body and supported by most/many companies. One/few implementations exist (e.g., WS-Security, BPEL)> • • • Implemented draft specification (3pt) – <Specification in standards body but alternatives exist. Industry is divided. One/few implementations exist. (e.g., Transactions, coordination, notification, etc.). Implemented proposal (4pt) – An implementation of an idea, a proposal but not submitted to standards body yet (e.g., WS-Addressing, WS-Trust, etc.) Non-implemented proposal (5pt) – <An idea that exits as a white paper, but no code and no specification details> • Concept (6pt) – <An idea that exists only as power point slides!!> • TOTAL: SOAP, WSDL, WS-Security = 3 17 Service Dependencies • What else does your service depend on (i.e. external dependencies)? – RDBMs / J2EE EJBs – Logging (Java Logging) – Message Queue (JMS) • What does your implementation depend on? – Java – J2EE 1.4 compliant 18 AAA & Security • What authentication mechanism do you use? – WS-Security • What authorisation mechanism do you use? – Flexible composition of authorisation plugins. • What accounting mechanism do you use? – Java logging • Does service interaction need to be encrypted? • If these are not used now, will they be in the future? 19 Exploiting the Service Architecture • What features from your ‘plumbing’ do you use in your service? – Event notification – Meta-data 20 Service Activity • Multiple interaction or single user? – Multiple • Throughput (1/per day or 100/per second?) • Typical data volume moved in • Typical data volume moved out 21 Service Failure • Required Reliability – Failure semantics? • Positive ack (might need WS-ReliableMessaging) • Required Persistence – Job entered into the queue is always persisted • Required Availability – One of many or unique requirement 22 Required Service Management • Remote access to: – Usage statistics – Job Progress – Job Diagnostic and repair interfaces 23 Acknowledgements • Director: Professor John Darlington • Research Staff: – – – – – Anthony Mayer, Nathalie Furmento Stephen McGough, James Stanton Yong Xie, William Lee Marko Krznaric, Murtaza Gulamali Asif Saleem, Laurie Young, Gary Kong • Contact: – http://www.lesc.ic.ac.uk/ – e-mail: lesc@ic.ac.uk 24