Workflow and Job Control in Astrogrid Jeff Lusted Dept Physics and Astronomy

advertisement
A PPARC funded project
Workflow and Job Control in
Astrogrid
Jeff Lusted
Dept Physics and Astronomy
University of Leicester
Astrogrid – UK's first Virtual Observatory
To interrogate multiple data centers in a seamless
and transparent way
Provide powerful new analysis and visualization
tools
Give data centers and providers a standard
framework for publishing and delivering services
7 Dec 2003
Workflow and Job Control in Astrogrid
1
Standards Background
Open source
Web Services - Pluggable components
Movement towards grid-based computing
XML based
Current implementation in Java
Server side development
GUI – browser based with some possible use of
components applets (visualization tool)
7 Dec 2003
Workflow and Job Control in Astrogrid
1
Problem Space: Developer's
Large quantities of data
Archives distributed around the world
Historically based
RDBMs of various persuasions or roll-your-own
Different sources
Data not necessarilly held in uniform manner
7 Dec 2003
Workflow and Job Control in Astrogrid
1
Problem Space: Science – Astronomer's
Brown Dwarf Selection
What is the contribution of brown dwarfs to the stellar mass
budget?
Brown dwarfs are intrinsically faint and rare objects, so their
detection is not straightforward
Current solution requires local access to some of the relevant
databases but also download of significant quantities of data
from other datasets to a local workstation.
VO Solution - the ability to make queries combining
attributes of astronomical objects stored in different
databases.
7 Dec 2003
Workflow and Job Control in Astrogrid
1
Problem Space 2: Science – Layman's
Cone Search
Something interesting has happened in this area of
the sky.
Search the same area in other archives around the
world
Search other sources (xray, infrared)
Obviously search is time based, with observations
potentially extending over many years
7 Dec 2003
Workflow and Job Control in Astrogrid
1
Simple Workflow
Step One – search archive for some interesting data
Step Two – search another archive for similar
Step Three – merge the two sets of results in a
meaningful way
Step Four – analyse the results, perhaps for
visualization
7 Dec 2003
Workflow and Job Control in Astrogrid
1
Astrogrid Components
VOSpace – Manager and Server
Datacenter – Query tools and hosting
Registry – publishing
Portal – includes Workflow design and submission
JES – the Job Entry System
Application Controller
Tools
Message Log
Community
7 Dec 2003
Workflow and Job Control in Astrogrid
1
Workflow Designs
XML based
Designed within the portal environment
Held as files within VOSpace
On job submission, the relevant design is sent to
the Job Entry System
7 Dec 2003
Workflow and Job Control in Astrogrid
1
Workflow Example
<?xml version = "1.0" encoding = "UTF-8"?>
<workflow name = "TwoQueriesAndVOTableMerge">
<sequence>
<flow>
<step name="QueryOne" >
<tool name="conesearch2Mass">
<input>
<parameter name="query" type="VOSpaceRefADQL" location="vospace://queries/query1" />
</input>
<output>
<parameter name="result" type="VOSpaceRefVOTABLE" location="vospace://votables/votable1" />
</output>
</tool>
</step>
<step name="QueryTwo" > ... </step>
</flow>
<step name="VOTableMerge" joinCondition="true" >
<tool name="DataFederator">
<input>
<parameter name="in" type="VOSpaceRefVOTABLE" location="vospace://votables/votable1" />
<parameter name="in" type="VOSpaceRefVOTABLE" location="vospace://votables/votable2" />
<parameter name="in" type="VOSpaceRefVOTABLE" location="vospace://votables/votable3" />
<parameter name="config" type="VOSpaceRefDFCONFIG" location="vospace://datafederator/config1" />
</input>
<output>
<parameter name="out" type="VOSpaceRefVOTABLE" location="vospace://votables/votable4" />
</output>
</tool>
</step>
</sequence>
</workflow>
7 Dec 2003
Workflow and Job Control in Astrogrid
1
Job Entry System
7 Dec 2003
Workflow and Job Control in Astrogrid
1
Job Example
<?xml version = "1.0" encoding = "UTF-8"?>
<job name = "TwoQueriesAndVOTableMerge">
<community>
<token>xxxxxxx</token>
<credentials>
<account>jl99@star.le.ac.uk</account>
<group>xray@star.le.ac.uk</group>
</credentials>
</community>
...
</job>
7 Dec 2003
Workflow and Job Control in Astrogrid
1
JES Components
JobController
„
„
„
Security check
Inserts job data into the system
Keeps an audit trail – the original job document
JobScheduler
„
„
Dispatches one or more job steps
Interrogates Registry for location information
JobMonitor
„
„
A listener on the ApplicationController
When job step finishes, decides whether job has finished
The JES Database
„
„
7 Dec 2003
Required for reliability and recoverability
Updates have transactional semantics
Workflow and Job Control in Astrogrid
1
JES Cooperators
ApplicationController
„
„
Deals with one job step
Location dependent on tool type ...
Bottom leaf: Tool instance
„
„
7 Dec 2003
Command line tool colocated with ApplicationController
Webservice based tool can be remote
Workflow and Job Control in Astrogrid
1
Interesting Points (one hopes)
Workflow execution is managed centrally by JES
A workflow design is both workflow descriptor and
input message.
Each step represents a tool invocation - “where” can
be decided at run-time.
Resources will often be references rather than instream.
Referenced resources need to be managed
To minimize data movement, data and tools should
be co-located
7 Dec 2003
Workflow and Job Control in Astrogrid
1
Astrogrid Home Page
www.astrogrid.org
7 Dec 2003
Workflow and Job Control in Astrogrid
1
Download