“Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC mario@epcc.ed.ac.uk Talk Overview Background: OGSA-DAI and DAIS Motivation and Definitions Hierarchies of Service Coordination Conclusions e-Science Workflow Services - www.ogsadai.org.uk 2 OGSA-DAI and DAIS GGF DAIS WG OGSA-DAI Database Access and Integration Services Attempting to standardise interfaces based on OGSI Aim to provide an implementation of DAIS Serve UK e-Science Community OGSA-DAI and DAIS Currently not aligned Data service interface in OGSA-DAI coarse grained Based on an earlier version of DAIS Data service interface in DAIS currently fine grained Scope for more coarse grained interfaces OGSA-DAI will realign DAIS once the latter stabilizes e-Science Workflow Services - www.ogsadai.org.uk 3 OGSA-DAI Project Partners Powered by …. e-Science Workflow Services - www.ogsadai.org.uk 4 Simple Data Service Scenario Data Resource Client Data Service Data Resource 1. Provides access to a data resource. 2. May provide integration of several data resources. e-Science Workflow Services - www.ogsadai.org.uk Data Resource 5 Some Definitions Data Resource An object that can source/sink data Currently databases in scope Files and file systems may come in scope Data Services Grid services Provides common interface to data resources Exposes some capabilities of a data resource SQL Queries, XPath, BinX, … Can also provide additional capabilities Transformations, Third party data delivery, etc … e-Science Workflow Services - www.ogsadai.org.uk 6 Motivation Want common interfaces for: As requests to data service may produce lots of data Want to minimise data movement Hence encapsulate interactions with service Data access Data integration Serialise multiple interactions into one interaction Abstract each interaction into an “activity” Data flows between activities Use a document mechanism to describe this DAIS and OGSA-DAI Concerned with data flow Currently do not have control constructs No looping, conditionals, splits, joins, … e-Science Workflow Services - www.ogsadai.org.uk 7 Service Coordination Patterns 1. Coordinate of activities performed at one Data Service. Client Data Service 2. Client choreographs a set of services to work together. Data Service 3. Orchestration of services using a document directed to one service. 4. Possibly interface with standard workflow languages, e.g. BPEL4WS, WSCI, … … or a service may orchestrate on behalf of the client. e-Science Workflow Services - www.ogsadai.org.uk Service Service Service 8 Coordination Hierarchies Service coordination may take place: Intra service Inter services – application driven Choreographed/orchestrated by a client or service Inter service – document driven Document based Orchestration Ideally would look the same as the intra service document based interface Combined with other workflow languages e-Science Workflow Services - www.ogsadai.org.uk 9 Intra Service Processing Service processing described by a document Possible activities (OGSA-DAI perspective): Statement Delivery Input data from third party Output data to a third party Deliver data in the response Transformations SQL Query, XPath Query XSL Transformations, compression OGSA-DAI has produced a framework for this e-Science Workflow Services - www.ogsadai.org.uk 10 Simple Example: no data flow sqlQueryStatement DeliverToURL <sqlQueryStatement name="statement"> <expression> select * from myTable where id=10 </expression> </sqlQueryStatement> <deliverToURL name="deliverOutput"> <toURL> ftp://anon:frog@ftp.example.com/home </toURL> </deliverToURL> e-Science Workflow Services - www.ogsadai.org.uk 11 Simple Example: with data flow sqlQueryStatement DeliverToURL <sqlQueryStatement name="statement"> <expression> select * from myTable where id=10 </expression> <resultSetStream name=“output1"/> </sqlQueryStatement> <deliverToURL name="deliverOutput"> <fromLocal from=“output1"/> <toURL> ftp://anon:frog@ftp.example.com/home </toURL> </deliverToURL> e-Science Workflow Services - www.ogsadai.org.uk 12 The Perform Document <?xml version="1.0" encoding="UTF-8"?> <gridDataServicePerform xmlns="http://ogsadai.org.uk/namespaces/2003/07/gds/types" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ogsadai.org.uk/namespaces/2003/07/gds/types ../../../../schema/ogsadai/xsd/activities/activities.xsd"> <documentation> This example performs a simple select statement to retrieve one row from the test database. The results are delivered within the response document. </documentation> <sqlQueryStatement name="statement"> <expression> select * from littleblackbook where id=10 </expression> <resultSetStream name=“output"/> </sqlQueryStatement> <deliverToURL name="deliverOutput"> <fromLocal from=“output"/> <toURL>ftp://anon:frog@ftp.example.com/home</toURL> </deliverToURL> </gridDataServicePerform> e-Science Workflow Services - www.ogsadai.org.uk 13 Predefined Building Blocks DeliverFromGDT xmlCollectionManagement relationalResourceManager xmlResourceManagement sqlBulkLoadRowset sqlUpdateStatement sqlStoredProcedure sqlQueryStatement xQueryStatement xUpdateStatement xPathStatement DeliverToGDT DeliverToStream outputStream DeliverFromGFTP DeliverToGFTP DeliverToURL DeliverFromURL e-Science Workflow Services - www.ogsadai.org.uk inputStream xslTransform zipArchive gzipCompression 14 Activities: positives Simple sequence pattern Data-flow Avoid multiple message exchanges Minimise data movement Extensible XML Schema excerpt gives syntax Associate an implementation with activity Done at configuration Allows optimisation Enactment engine can optimise interaction e-Science Workflow Services - www.ogsadai.org.uk 15 Activities: negatives Incomplete syntax Activity implementation & XML schema loosely coupled Workloads on the server may need to be managed Activities not exposed at the interface level Keeping activity and implementation in synch Semantics are not specified Puts work load on the server Activity inputs and outputs are not typed No typing of data streams Possible issue in coming up with a sensible document This may change in line with DAIS Perform document factored out from DAIS base specs Standardisation to become a DAIS informational document Scope may be bigger than DAIS e-Science Workflow Services - www.ogsadai.org.uk 16 Inter Service Application Defined "Workflow" Services stitched together by an application Could be a client Could be another service Use the OGSA-DAI GridDataTransport (GDT) portType Distributed Query Processing (DQP) Service configured separately Each performs its part in the workflow e-Science Workflow Services - www.ogsadai.org.uk 17 Client Driven Scenario (aka poor man's data integration) <sqlQueryStatement> … </sqlQueryStatement> <deliverToGDT … /> Data Service Client GDT Data Service Client creates Data Services. <inputStream … /> <sqlUpdateStatement> … </sqlUpdateStatement> e-Science Workflow Services - www.ogsadai.org.uk 18 Service Driven Scenario GQES Client GDQS Query planning, compilation, scheduling, evaluation, partitioning Distributed Query Processing GQES GQES Evaluate sub-queries e-Science Workflow Services - www.ogsadai.org.uk 19 More Complex DQP Scenario results Client G 4 GDT N0 GDS GDQ G perform(Query) 1 N2 GDS GDS 3 GQES 2 hash_join (p.proteinID=t.proteinID) G perform(QuerySubplan) GDQS GDT 2 N4 createService reduce (proteinID,sequence) Factory GQESF G GDT 3 sequential_scan GDS perform(QuerySubplan) GQES 1 G reduce (p.proteinID, blast) createService perform(QuerySubplan) 2 Factory GQES F G Web S ervices (BLAST) operation_call blast(p.sequence) 4 4 1 N3 results GDT results GDS 3 GQES 1 G GDT Factory GQESF G 2 createService reduce (p.proteinID, blast) GDS GQES 3 G operation_call blast(p.sequence) Factory GQESF G N1 reduce (proteinID) sequential_scan (term=8372) GDS G e-Science Workflow Services - www.ogsadai.org.uk 20 Application Driven "Workflow" Labour intensive Client driven (service choreography) Service driven (service orchestration) DQP hides details There may be other examples … Need to explore this space further Restricted to small numbers of services Need tooling Even then this is best done through other means Can probably accommodate these patterns in an existing workflow language For more general data integration need: Describe more sophisticated behaviour e-Science Workflow Services - www.ogsadai.org.uk 21 Inter Service Document Coordination Currently evolving Document describes: Sequence of operations that may span multiple services Single document includes enough information to: Run an expression on a source data service Deliver the results to a target data service Run and expression on the target data service Informational document to be presented at GGF10 e-Science Workflow Services - www.ogsadai.org.uk 22 A Dataset Example Request DataRequest.xsd <dataRequest> … </dataRequest> Client RemoteRequiredTable DataAccessRecipe.xsd <dar> <gsh> … </gsh> <type> …</type> <dataSet> … </dataSet> </dar> Data Service Data Service e-Science Workflow Services - www.ogsadai.org.uk 23 Document Driven "Workflow" Work in this area is tentative No implementations as yet Shows versatility Carries over some of the OGSA-DAI activity framework Focused on data OGSA-DAI needs to see how it matures Can track provenance in the dataSet Needs to be positioned against general workflow languages e-Science Workflow Services - www.ogsadai.org.uk 24 Traditional Workflow OGSA-DAI has not explored this space … yet Traditionally workflow: Revolves around the execution of atomic activities Use a processing model, e.g. WfMC based Akin to how people talk about service orchestration Want to use existing frameworks as far as possible May need such a framework to facilitate data integration OGSA-DAI does not want to define its own workflow DAIS may come up with something Clearly: Activity model can be used to implement a workflow Collecting use cases e-Science Workflow Services - www.ogsadai.org.uk 25 Workflow Issues OGSA-DAI needs to play to see what works Standards still evolving IP rights: BPEL4WS Royalty-free … ? WSCI Royalty-free Need workflow engines Tooling to construct workflow Ptolemy II … Triana … ? e-Science Workflow Services - www.ogsadai.org.uk 26 Summary & Conclusions Base standards in a state of flux DAIS not settled down yet Need to examine use cases Positioning of OGSA-DAI Successful for data access Shied away from real workflow Should try to use emerging standards if possible Data integration will require workflow patterns Document based interface needs to be re-worked OGSA-DAI implemented simple "workflow" patterns If you don't like what you see get involved and change it Want it to be the leaves of your complex workflow graphs Wrap your data sources and sinks Try OGSA-DAI and feedback! e-Science Workflow Services - www.ogsadai.org.uk 27 Further information The OGSA-DAI Project Site: The DAIS-WG site: http://www.ogsadai.org.uk http://cs.man.ac.uk/grid-db OGSA-DAI Users Mailing list users@ogsadai.org.uk General discussion on grid DAI matters Formal support for OGSA-DAI releases http://www.ogsadai.org.uk/support support@ogsadai.org.uk OGSA-DAI training courses e-Science Workflow Services - www.ogsadai.org.uk 28