Optimising the OGSA-DAI Enactment Model Konstantinos Karasavvas Research Associate NeSC, University of Edinburgh Overview ● Background: OGSA-DAI Workflow distinctive features ● Enactment Model optimisation ● Reconstruction Scenarios And future work ● Summary eSI - WODE, 19 Oct. 06 2 OGSA-DAI in a nutshell ● An extensible framework for data access and integration ● Expose heterogeneous data resources to a grid through web services ● Interact with data resources Queries and updates Data transformation / compression Data delivery Application specific functionality ● A base for higher-level services Federation, data mining, visualisation ● Reduce development cost of data centric grid applications by providing useful functionality readily packaged eSI - WODE, 19 Oct. 06 3 OGSA-DAI Request/Response Request OGSA-DAI Data Service DB Response 3rd Party Activity A eSI - WODE, 19 Oct. 06 Activity B Activity C 4 Example OGSA-DAI Request sqlQueryStatement deliverFromURL •SELECT * FROM Bands WHERE name = Bangles; •http://www.someplace.org/styl esheets/webRowSetToHTML.xsl ResultSet XSL sqlResultsToXML WebRowSet xslTransform HTML deliverToURL •ftp://www.musicplace.org/bands/Bangles.html eSI - WODE, 19 Oct. 06 5 Workflow Distinctive Features (1) ● Keep everything local We have homogeneity Avoid data movement • Call with whole result • Separate calls E.g. we can pass object references ● Activities are configured at service deployment sqlQuery RS transform Cannot use any “service” you want in your workflow sqlQuery Service eSI - WODE, 19 Oct. 06 SOAP/XML transform Service 7 Workflow Distinctive Features (2) ● Simple, efficient workflow language Sequence, flow Fits our data processing needs Other constructs can be used at the activity level (e.g. exclusive choice – a.k.a. if-split) A if-split begin if-split end x pipe closes eSI - WODE, 19 Oct. 06 B y pipe closes 8 Workflow Distinctive Features (3) ● Streaming model Large quantities of data Parallel processing (pipelining) Implicit iteration via streaming eSI - WODE, 19 Oct. 06 9 Enactment Model ● A.k.a. Activity Framework ● Core component of OGSA-DAI ● It is responsible for performing tasks (activities) and streaming data eSI - WODE, 19 Oct. 06 DB Query block Produces data in blocks Stores and provides access to data blocks Pipe block Consumes data blocks Delivery 10 Activity Processing (Current Release) ● Processing of blocks (and therefore activities) is controlled by the pipe – from outside the activity processBlock() is called many times until processing is complete Usually consumes and produces a single block per call eSI - WODE, 19 Oct. 06 11 Activity Processing (New) ● Processing of blocks will be controlled by the activity process() is called exactly once Consumes and produces blocks as necessary Each activity in a pipeline processes within each own thread Pipes receive and may buffer blocks until they are requested eSI - WODE, 19 Oct. 06 12 Current Model Activity A processBlock() block processBlock Pipe Called repeatedly block getBlock processBlock Activity B processBlock() eSI - WODE, 19 Oct. 06 Request Processor 13 New Model Called Once Activity A process() putBlock process Processing Service Pipe initialise block as result getBlock process Activity B process() eSI - WODE, 19 Oct. 06 14 Improvements to the Enactment Model ● Each activity is run in its own thread Rather than one thread per chain ● Each activity is called to process only once Rather than once per block ● Each activity sends its result to the pipe when ready Rather than wait to be called ● The pipe now buffers Rather than delegate the request for new block ● Performance improvements eSI - WODE, 19 Oct. 06 15 Mobile Code ● Allows to ‘embed’ functionality in your workflow Allows dynamic inclusion of new code Extends the workflow functionality without calling to a remote service Simple but ‘prepared’ code Activity A Activity B Mobile Code eSI - WODE, 19 Oct. 06 16 Workflow Reconstruction ● Be able to identify patterns Describe workflow nodes/edges ● Reconstruct part of the workflow graph Performance ● Some example scenarios eSI - WODE, 19 Oct. 06 17 Scenario (1) sqlQuery RS RSToWebRowSet WRS WebRowSetProjection WRS sqlQuery RS resultsetProjection RS RSToWebRowSet WRS eSI - WODE, 19 Oct. 06 18 Scenario (2) filename A file FTP URL A Split A A eSI - WODE, 19 Oct. 06 filename file URL filename file URL filename file URL FTP FTP FTP 19 Thoughts on Reconstruction ● Currently recommendations Not automated ● How and what do we describe? Scenario 1 • Semantics of projection • Activities’ I/O typing Scenario 2 • Can be applied in many scenarios • Rules needed When are two workflows equivalent? • Models of enactment and activities How do we estimate/predict the cost of a workflow? ● Future work Need to investigate the above eSI - WODE, 19 Oct. 06 20 Thoughts on Parallelisation ● Parallelising the execution of OGSA-DAI already exploit thread parallelism ● Multiple nodes/JVM running activities ● Requires serialisation of objects passing between activities When and where do we serialise? ● Future work Need to investigate the above eSI - WODE, 19 Oct. 06 21 Summary ● Overview of OGSA-DAI workflow Motivation Distinctions ● Enactment optimisations Improved pipeline processing ● Workflow reconstruction and parallelisation eSI - WODE, 19 Oct. 06 22 Questions? www.ogsadai.org.uk kostas@nesc.ac.uk