A PPARC funded project Workflow and Job Control in Astrogrid Jeff Lusted Dept Physics and Astronomy University of Leicester Astrogrid – UK's first Virtual Observatory To interrogate multiple data centers in a seamless and transparent way Provide powerful new analysis and visualization tools Give data centers and providers a standard framework for publishing and delivering services 7 Dec 2003 Workflow and Job Control in Astrogrid 1 Standards Background Open source Web Services - Pluggable components Movement towards grid-based computing XML based Current implementation in Java Server side development GUI – browser based with some possible use of components applets (visualization tool) 7 Dec 2003 Workflow and Job Control in Astrogrid 1 Problem Space: Developer's Large quantities of data Archives distributed around the world Historically based RDBMs of various persuasions or roll-your-own Different sources Data not necessarilly held in uniform manner 7 Dec 2003 Workflow and Job Control in Astrogrid 1 Problem Space: Science – Astronomer's Brown Dwarf Selection What is the contribution of brown dwarfs to the stellar mass budget? Brown dwarfs are intrinsically faint and rare objects, so their detection is not straightforward Current solution requires local access to some of the relevant databases but also download of significant quantities of data from other datasets to a local workstation. VO Solution - the ability to make queries combining attributes of astronomical objects stored in different databases. 7 Dec 2003 Workflow and Job Control in Astrogrid 1 Problem Space 2: Science – Layman's Cone Search Something interesting has happened in this area of the sky. Search the same area in other archives around the world Search other sources (xray, infrared) Obviously search is time based, with observations potentially extending over many years 7 Dec 2003 Workflow and Job Control in Astrogrid 1 Simple Workflow Step One – search archive for some interesting data Step Two – search another archive for similar Step Three – merge the two sets of results in a meaningful way Step Four – analyse the results, perhaps for visualization 7 Dec 2003 Workflow and Job Control in Astrogrid 1 Astrogrid Components VOSpace – Manager and Server Datacenter – Query tools and hosting Registry – publishing Portal – includes Workflow design and submission JES – the Job Entry System Application Controller Tools Message Log Community 7 Dec 2003 Workflow and Job Control in Astrogrid 1 Workflow Designs XML based Designed within the portal environment Held as files within VOSpace On job submission, the relevant design is sent to the Job Entry System 7 Dec 2003 Workflow and Job Control in Astrogrid 1 Workflow Example <?xml version = "1.0" encoding = "UTF-8"?> <workflow name = "TwoQueriesAndVOTableMerge"> <sequence> <flow> <step name="QueryOne" > <tool name="conesearch2Mass"> <input> <parameter name="query" type="VOSpaceRefADQL" location="vospace://queries/query1" /> </input> <output> <parameter name="result" type="VOSpaceRefVOTABLE" location="vospace://votables/votable1" /> </output> </tool> </step> <step name="QueryTwo" > ... </step> </flow> <step name="VOTableMerge" joinCondition="true" > <tool name="DataFederator"> <input> <parameter name="in" type="VOSpaceRefVOTABLE" location="vospace://votables/votable1" /> <parameter name="in" type="VOSpaceRefVOTABLE" location="vospace://votables/votable2" /> <parameter name="in" type="VOSpaceRefVOTABLE" location="vospace://votables/votable3" /> <parameter name="config" type="VOSpaceRefDFCONFIG" location="vospace://datafederator/config1" /> </input> <output> <parameter name="out" type="VOSpaceRefVOTABLE" location="vospace://votables/votable4" /> </output> </tool> </step> </sequence> </workflow> 7 Dec 2003 Workflow and Job Control in Astrogrid 1 Job Entry System 7 Dec 2003 Workflow and Job Control in Astrogrid 1 Job Example <?xml version = "1.0" encoding = "UTF-8"?> <job name = "TwoQueriesAndVOTableMerge"> <community> <token>xxxxxxx</token> <credentials> <account>jl99@star.le.ac.uk</account> <group>xray@star.le.ac.uk</group> </credentials> </community> ... </job> 7 Dec 2003 Workflow and Job Control in Astrogrid 1 JES Components JobController Security check Inserts job data into the system Keeps an audit trail – the original job document JobScheduler Dispatches one or more job steps Interrogates Registry for location information JobMonitor A listener on the ApplicationController When job step finishes, decides whether job has finished The JES Database 7 Dec 2003 Required for reliability and recoverability Updates have transactional semantics Workflow and Job Control in Astrogrid 1 JES Cooperators ApplicationController Deals with one job step Location dependent on tool type ... Bottom leaf: Tool instance 7 Dec 2003 Command line tool colocated with ApplicationController Webservice based tool can be remote Workflow and Job Control in Astrogrid 1 Interesting Points (one hopes) Workflow execution is managed centrally by JES A workflow design is both workflow descriptor and input message. Each step represents a tool invocation - “where” can be decided at run-time. Resources will often be references rather than instream. Referenced resources need to be managed To minimize data movement, data and tools should be co-located 7 Dec 2003 Workflow and Job Control in Astrogrid 1 Astrogrid Home Page www.astrogrid.org 7 Dec 2003 Workflow and Job Control in Astrogrid 1