Workflow Access and Integration Mario Antonioletti EPCC

advertisement
“Workflow” in Data
Access and Integration
An OGSA-DAI/DAIS Perspective
Mario Antonioletti
EPCC
mario@epcc.ed.ac.uk
Talk Overview

Background: OGSA-DAI and DAIS

Motivation and Definitions

Hierarchies of Service Coordination

Conclusions
e-Science Workflow Services - www.ogsadai.org.uk
2
OGSA-DAI and DAIS

GGF DAIS WG



OGSA-DAI



Database Access and Integration Services
Attempting to standardise interfaces based on OGSI
Aim to provide an implementation of DAIS
Serve UK e-Science Community
OGSA-DAI and DAIS

Currently not aligned



Data service interface in OGSA-DAI coarse grained
 Based on an earlier version of DAIS
Data service interface in DAIS currently fine grained
 Scope for more coarse grained interfaces
OGSA-DAI will realign DAIS once the latter stabilizes
e-Science Workflow Services - www.ogsadai.org.uk
3
OGSA-DAI Project Partners
Powered by ….
e-Science Workflow Services - www.ogsadai.org.uk
4
Simple Data Service Scenario
Data Resource
Client
Data Service
Data Resource
1. Provides access to a data resource.
2. May provide integration of several
data resources.
e-Science Workflow Services - www.ogsadai.org.uk
Data Resource
5
Some Definitions

Data Resource

An object that can source/sink data

Currently databases in scope


Files and file systems may come in scope
Data Services

Grid services

Provides common interface to data resources

Exposes some capabilities of a data resource


SQL Queries, XPath, BinX, …
Can also provide additional capabilities

Transformations, Third party data delivery, etc …
e-Science Workflow Services - www.ogsadai.org.uk
6
Motivation

Want common interfaces for:



As requests to data service may produce lots of data


Want to minimise data movement
Hence encapsulate interactions with service





Data access
Data integration
Serialise multiple interactions into one interaction
Abstract each interaction into an “activity”
Data flows between activities
Use a document mechanism to describe this
DAIS and OGSA-DAI


Concerned with data flow
Currently do not have control constructs

No looping, conditionals, splits, joins, …
e-Science Workflow Services - www.ogsadai.org.uk
7
Service Coordination Patterns
1. Coordinate of activities
performed at one Data Service.
Client
Data Service
2. Client choreographs a set of
services to work together.
Data Service
3. Orchestration of services using
a document directed to one service.
4. Possibly interface with standard
workflow languages, e.g. BPEL4WS,
WSCI, …
… or a service may
orchestrate on behalf
of the client.
e-Science Workflow Services - www.ogsadai.org.uk
Service
Service
Service
8
Coordination Hierarchies

Service coordination may take place:

Intra service


Inter services – application driven


Choreographed/orchestrated by a client or service
Inter service – document driven



Document based
Orchestration
Ideally would look the same
as the intra service document based interface
Combined with other workflow languages
e-Science Workflow Services - www.ogsadai.org.uk
9
Intra Service Processing

Service processing described by a document

Possible activities (OGSA-DAI perspective):

Statement



Delivery

Input data from third party

Output data to a third party

Deliver data in the response
Transformations


SQL Query, XPath Query
XSL Transformations, compression
OGSA-DAI has produced a framework for this
e-Science Workflow Services - www.ogsadai.org.uk
10
Simple Example: no data flow
sqlQueryStatement
DeliverToURL
<sqlQueryStatement name="statement">
<expression>
select * from myTable where id=10
</expression>
</sqlQueryStatement>
<deliverToURL name="deliverOutput">
<toURL>
ftp://anon:frog@ftp.example.com/home
</toURL>
</deliverToURL>
e-Science Workflow Services - www.ogsadai.org.uk
11
Simple Example: with data flow
sqlQueryStatement
DeliverToURL
<sqlQueryStatement name="statement">
<expression>
select * from myTable where id=10
</expression>
<resultSetStream name=“output1"/>
</sqlQueryStatement>
<deliverToURL name="deliverOutput">
<fromLocal from=“output1"/>
<toURL>
ftp://anon:frog@ftp.example.com/home
</toURL>
</deliverToURL>
e-Science Workflow Services - www.ogsadai.org.uk
12
The Perform Document
<?xml version="1.0" encoding="UTF-8"?>
<gridDataServicePerform
xmlns="http://ogsadai.org.uk/namespaces/2003/07/gds/types"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://ogsadai.org.uk/namespaces/2003/07/gds/types
../../../../schema/ogsadai/xsd/activities/activities.xsd">
<documentation>
This example performs a simple select statement to retrieve
one row from the test database. The results are delivered
within the response document.
</documentation>
<sqlQueryStatement name="statement">
<expression>
select * from littleblackbook where id=10
</expression>
<resultSetStream name=“output"/>
</sqlQueryStatement>
<deliverToURL name="deliverOutput">
<fromLocal from=“output"/>
<toURL>ftp://anon:frog@ftp.example.com/home</toURL>
</deliverToURL>
</gridDataServicePerform>
e-Science Workflow Services - www.ogsadai.org.uk
13
Predefined Building Blocks
DeliverFromGDT
xmlCollectionManagement
relationalResourceManager
xmlResourceManagement
sqlBulkLoadRowset
sqlUpdateStatement
sqlStoredProcedure
sqlQueryStatement
xQueryStatement
xUpdateStatement
xPathStatement
DeliverToGDT
DeliverToStream
outputStream
DeliverFromGFTP
DeliverToGFTP
DeliverToURL
DeliverFromURL
e-Science Workflow Services - www.ogsadai.org.uk
inputStream
xslTransform
zipArchive
gzipCompression
14
Activities: positives

Simple sequence pattern

Data-flow

Avoid multiple message exchanges

Minimise data movement

Extensible


XML Schema excerpt gives syntax

Associate an implementation with activity

Done at configuration
Allows optimisation

Enactment engine can optimise interaction
e-Science Workflow Services - www.ogsadai.org.uk
15
Activities: negatives

Incomplete syntax




Activity implementation & XML schema loosely coupled



Workloads on the server may need to be managed
Activities not exposed at the interface level


Keeping activity and implementation in synch
Semantics are not specified
Puts work load on the server


Activity inputs and outputs are not typed
No typing of data streams
Possible issue in coming up with a sensible document
This may change in line with DAIS
Perform document factored out from DAIS base specs


Standardisation to become a DAIS informational document
Scope may be bigger than DAIS
e-Science Workflow Services - www.ogsadai.org.uk
16
Inter Service Application Defined "Workflow"

Services stitched together by an application

Could be a client


Could be another service


Use the OGSA-DAI GridDataTransport (GDT) portType
Distributed Query Processing (DQP)
Service configured separately

Each performs its part in the workflow
e-Science Workflow Services - www.ogsadai.org.uk
17
Client Driven Scenario
(aka poor man's data integration)
<sqlQueryStatement>
…
</sqlQueryStatement>
<deliverToGDT … />
Data Service
Client
GDT
Data Service
Client creates Data Services.
<inputStream … />
<sqlUpdateStatement>
…
</sqlUpdateStatement>
e-Science Workflow Services - www.ogsadai.org.uk
18
Service Driven Scenario
GQES
Client
GDQS
Query planning,
compilation,
scheduling,
evaluation,
partitioning
Distributed Query Processing
GQES
GQES
Evaluate sub-queries
e-Science Workflow Services - www.ogsadai.org.uk
19
More Complex DQP Scenario
results
Client
G
4
GDT
N0
GDS
GDQ
G
perform(Query)
1
N2
GDS
GDS
3
GQES 2
hash_join
(p.proteinID=t.proteinID)
G
perform(QuerySubplan)
GDQS
GDT
2
N4
createService
reduce (proteinID,sequence)
Factory GQESF
G
GDT
3
sequential_scan
GDS
perform(QuerySubplan)
GQES 1
G
reduce (p.proteinID, blast)
createService
perform(QuerySubplan)
2
Factory GQES F
G
Web S ervices
(BLAST)
operation_call
blast(p.sequence)
4
4
1
N3
results
GDT
results
GDS
3
GQES 1
G
GDT
Factory GQESF
G
2
createService
reduce (p.proteinID, blast)
GDS
GQES 3
G
operation_call
blast(p.sequence)
Factory GQESF
G
N1
reduce (proteinID)
sequential_scan (term=8372)
GDS
G
e-Science Workflow Services - www.ogsadai.org.uk
20
Application Driven "Workflow"

Labour intensive

Client driven (service choreography)


Service driven (service orchestration)



DQP hides details
There may be other examples …
Need to explore this space further


Restricted to small numbers of services
 Need tooling
 Even then this is best done through other means
Can probably accommodate these patterns in an
existing workflow language
For more general data integration need:

Describe more sophisticated behaviour
e-Science Workflow Services - www.ogsadai.org.uk
21
Inter Service Document Coordination

Currently evolving

Document describes:



Sequence of operations that may span multiple
services
Single document includes enough information to:

Run an expression on a source data service

Deliver the results to a target data service

Run and expression on the target data service
Informational document to be presented at GGF10
e-Science Workflow Services - www.ogsadai.org.uk
22
A Dataset Example
Request
DataRequest.xsd
<dataRequest>
…
</dataRequest>
Client
RemoteRequiredTable
DataAccessRecipe.xsd
<dar>
<gsh> … </gsh>
<type> …</type>
<dataSet>
…
</dataSet>
</dar>
Data Service
Data Service
e-Science Workflow Services - www.ogsadai.org.uk
23
Document Driven "Workflow"

Work in this area is tentative

No implementations as yet


Shows versatility


Carries over some of the OGSA-DAI activity framework
Focused on data


OGSA-DAI needs to see how it matures
Can track provenance in the dataSet
Needs to be positioned against general
workflow languages
e-Science Workflow Services - www.ogsadai.org.uk
24
Traditional Workflow

OGSA-DAI has not explored this space … yet


Traditionally workflow:




Revolves around the execution of atomic activities
Use a processing model, e.g. WfMC based
Akin to how people talk about service orchestration
Want to use existing frameworks as far as possible



May need such a framework to facilitate data integration
OGSA-DAI does not want to define its own workflow
DAIS may come up with something
Clearly:


Activity model can be used to implement a workflow
Collecting use cases
e-Science Workflow Services - www.ogsadai.org.uk
25
Workflow Issues


OGSA-DAI needs to play to see what works
Standards still evolving

IP rights:




BPEL4WS
 Royalty-free … ?
WSCI
 Royalty-free
Need workflow engines
Tooling to construct workflow

Ptolemy II … Triana … ?
e-Science Workflow Services - www.ogsadai.org.uk
26
Summary & Conclusions

Base standards in a state of flux

DAIS not settled down yet





Need to examine use cases
Positioning of OGSA-DAI



Successful for data access
Shied away from real workflow
Should try to use emerging standards if possible
Data integration will require workflow patterns


Document based interface needs to be re-worked
OGSA-DAI implemented simple "workflow" patterns


If you don't like what you see get involved and change it
Want it to be the leaves of your complex workflow graphs
Wrap your data sources and sinks
Try OGSA-DAI and feedback!
e-Science Workflow Services - www.ogsadai.org.uk
27
Further information

The OGSA-DAI Project Site:


The DAIS-WG site:


http://www.ogsadai.org.uk
http://cs.man.ac.uk/grid-db
OGSA-DAI Users Mailing list
users@ogsadai.org.uk
 General discussion on grid DAI matters


Formal support for OGSA-DAI releases
http://www.ogsadai.org.uk/support
 support@ogsadai.org.uk


OGSA-DAI training courses
e-Science Workflow Services - www.ogsadai.org.uk
28
Download