Patterns & Notations for OGSA-DAI Malcolm Atkinson Director UK National e-Science Centre & e-Science Institute www.nesc.ac.uk 9th February 2006 Contents of Talk • Part-complete work • Paper in Progress Caveat – this is half-baked thoughts Only partially implemented • Illustrate patterns – To stimulate application design ideas – To focus architectural and implementation design – To pose questions about future architecture • Notation to illustrate use of patterns – Yet another workflow notation – A data flow notation – A context for type and semantic discussion OGSA-DAI Computational context Request Documents OGSA-DAI Data Service Authenticate & Accept Status, Error & Direct Results Parse & Transform Authorise Start Or Resume Session RDB RA Third-party Data Delivery XML Session State DA OGSA-DAI Engine OGSA-DAI Engine OGSA-DAI Engine DRA Files DRA DRA OGSA-DAI Engine Session State DAG Enactment Builder C C A Task 2 C Enactment Controller Set up Task 4 E C C G RA Task 6 Task 1 Task 7 DRA DA B C Task 3 D Task 5 C F C Enactment Container & Activity Framework DRA DRA Two Activities connected by a stream sqlQueryStatement <sqlQueryStatement name = “query”> <expression> select city, population from UKcensus2000 </expression> <resultstream name = “tuples” /> </sqlQueryStatement> sqlResultsToXML tuples Stream <sqlResultsToXML name = “convert”> <resultSet from = “tuples” /> <webRowSet name = “xmlRows” /> </sqlResultsToXML> Textual shorthand for Request Doc. XML { query: sqlQueryStatement( “select city, population from UKcensus2000” ) [ tuples ] || convert: sqlResultsToXML() [ tuples xmlRows ] } Diagrammatic representation query: sqlQueryStatement( “select city, population from UKcensus2000” ) tuples convert: sqlResultsToXML xmlRows Client Request & Response OGSA-DAI DR messages Data Resource Client Request & Response OGSA-DAI DR messages Data Resource Simple Intermediary Pattern Using Simple Intermediary 1 { sqlQueryStatement( “select city, population from UKcensus2000” ) [ tuples ] } Using Simple Intermediary 2 { xPathStatement( collection = “socio-economic-data/2003”, prefix = "c", expansion = “http://socio-economic-data/2003”, expression = “/c:regionId /c:polygon /c:income” ) [ regions] } Using Simple Intermediary 3 { fileAccessActivity( categoryCoding.txt, 9, 21) [ categories] } Using Simple Intermediary 4 { sqlQueryStatement( “select city, population from UKcensus2000” ) [ tuples ] || sqlResultsToXML() [ tuples xmlRows ] || deliverFromURL( “http://xsltTransformations?id=format1”) [ xsltToFormat1] || xslTransform() [xmlRows, xsltToFormat1 format1] || gzipCompress() [format1 compressed] } Using Simple Intermediary 5 { sqlQueryStatement( “select x, y, z, … from skySurvey1 where …” ) [ stars ] || sqlResultsToCSV() [stars cstars] || project(3) [cstars zofcstars] || histogram( [0; 1; 10; 100; 1000; 10000; 100000; 1000000]) [zofcstars zDistribution ] } Using Simple Intermediary 6 part 1 { startSession(); //allocates session id “sid27” { sqlQueryStatement( “select x, y, z, … from skySurvey1 where …” ) [ stars ] || sqlResultsToCSV() [stars cstars] || tee(2) [cstars stars1, stars2] || head( 1000 ) [stars1 shortStars] || project(3) [shortStars zofcstars] || histogram( [0; 1; 10; 100; 1000; 10000; 100000; 1000000]) [zofcstars zDistribution ] || dataStore( “allStars” ) [stars2] }} Using Simple Intermediary 6 part 2 { session = “sid27”; // requests session resumption endSession() } --------------------------- OR ---------------------------------------------{ session = “sid27”; // requests session resumption { project(3) [allStars zofAllStars] || histogram( [0; 1; 10; 100; 1000; 10000; 100000; 1000000]) [zofAllStars zDistribution2 ] }; endSession() } Using Simple Intermediary 7 { fileAccessActivity( Anne’sCollection.bib, 0, 10000) [ ac] || fileAccessActivity( Bill’sCollection.bib, 0, 10000) [ bc] || fileAccessActivity( Cathy’sCollection.bib, 0, 10000) [ cc] || merge() [ac, bc, cc all] || grep( “[OGSA-DAI]|[OGSA DAI]|[OGSADAI]” ) [all ODrecords] || gzipCompress() [ODrecords ODrc] || encrypt( “publicKey” ) [ODrc ODsecret] } Request & Response OGSA-DAI OGSA-DAI Private Store Client DR messages Data Resource Persistent Intermediary Pattern Using Persistent Intermediary Step 1 {{ deliverFromURL( “ftp://www.ncbi.nlm.nih.gov/Omim/download” ) [ omim] || deliverToFile( “omimSnapshot” ) [omim]; addIndexFile( “omimSnapsot”, “omimIndex” ) [] } || { deliverFromURL( “http://www.geneontology.org/ontology/gene_ontology.obo”) [ go] || deliverToFile( “goSnapshot” ) [go]; addIndexFile( “goSnapshot”, “goIndex” ) [] }} Using Persistent Intermediary Step 2 { searchIndexedFile( “goIndex”, “sox9” ) [ sox9synonyms] || searchIndexedFile( “omimIndex” ) [ sox9synonyms omimGenes ] } Client Request & Response y ver i l e ad t a D Consumer OGSA-DAI DR messages Data Resource Redirector Pattern Using the Redirector { startSession(); { sqlQueryStatement( “select x, y, z, … from skySurvey1 where …” ) [ stars ] || tee(2) [stars stars1, stars2] || sqlResultsToCSV() [stars1 cstars] || head( 1000 ) [cstars shortStars] || project(3) [shortStars zofcstars] || histogram( [0; 1; 10; 100; 1000; 10000; 100000; 1000000]) [zofcstars zDistribution ] || sqlResultsToXML() [stars2 xstars] || deliverFromURL( “http://xsltTransformations?id=format1” ) [ xsltToFormat1] || xslTransform() [xstars, xsltToFormat1 format1] || gzipCompress() [format1 xStarsCompressed] || deliverToStream( “consumerURI” ) [xStarsCompressed] }; endSession() } R eq ue st & R es OGSA-DAI DR messages Data Resource Request & Response DR messages Data delivery Client Data Resource Redirector: OGSA-DAI as the consumer po ns e OGSA-DAI consumer Using Redirector 2 part 1 { sqlQueryStatement( “select x, y, z, … from skySurvey1 where …” ) [ stars ] || sqlResultsToCSV() [stars cstars] || gzipCompress() [cstars xStarsCompressed] || deliverToGDT( “consumerURI” ) [xStarsCompressed] } Using Redirector 2 part 2 { sqlUpdateStatement( “create table wantedStars( … )” ) []; { inputStream( “xStarsCompressed” ) [ xStarsCompressed] || gzipDecompress() [ xStarsCompressed cStars ] || sqlBulkLoadRowSet( “wantedStars” ) [cStars] } } OGSA-DAI DR messages DR m es sa ge s Data Resource Request & Response DR1 DR2 Data Resource Client DR es m es g sa Data Resource Coordinator Pattern DR3 Using Coordinator { sqlQueryStatement( dataResource = “DR1”, sql = “select a, b from aTable where …” ) [ dr1Data] || sqlQueryStatement( dataResource = “DR2”, sql = “select a, b from bTable where …” ) [ dr2Data] || sqlQueryStatement( dataResource = “DR3”, sql = “select a, b from cTable where …” ) [ dr3Data] || unionMerge( equalityTest = “a=a & b=b”, timeout = 5 ) [ dr1Data, dr2Data, dr3Data allAvailableData ] } Transactionally using Coordinator { startTransaction() []; { deliverFromURL( “http://simulationResults/run277/temperature.4D” ) [ data ] || fileManipulationActivity( dataResource = “DR1”, operation = “create run277_4D_temperature” ) [ data ] || sqlUpdateStatement( dataResource = “DR2”, sql = “insert into simulation_metadata values ( ‘run277_4D_temperature’, date, ‘john smith’, ‘dan jones method’, … )” ) [] }; commitTransaction() [] } OGSA-DAI DR messages m es sa ge s OGSA-DAI es ag DR messages DR m es e qu Re sa ge s Data Resource DR s es m DR2 Data Resource st DR1 Re qu e DR3 es ,R st ns po e Tr s an rt po DR messages DR m es sa ge s Data Resource ta Da OGSA-DAI s ge sa Data Resource & DR es m Data Resource Client Request & Response Data Resource ,R es po ns e & Da ta Tr an s po rt DR Data Resource s ge sa Data Resource DR es m Data Resource Data Assembly Pattern Pattern Features Facilities provided Intermediary Data service interposed between client applications and data resource Consistent interface for different kinds of data, data filtering, sampling, transformation, composition and transport, movement of computation to data, latency reduction from multiple actions per request, authorisation gateways, sessions and concurrency via pipelining and parallelism. Persistent Intermediary Data storage permits results to be used by subsequent requests As above plus the assembly and caching of results for use by subsequent requests – providing replication, snapshots and acceleration. Redirector Third-party data transfers As above plus reduction in data transport costs through (a) using protocols suitable to the data and recipient and (b) avoiding transfer via intermediaries and double handling. Coordinator Multiple data resources per data service As above plus integration of data from these resources, efficient movement of data between them and transactional integrity of multiple resource operations. Data Assembler Data services using other data services as well as data resources As above plus data federation, distributed query and distributed transactions. Generic High-Level Compositions 1 Step1: Obtain a new body of data & metadata Step2: Compute integrated & standardised data & metadata + provenance Step3: Validate data & validate metadata Step4: Deposit data in store & store metadata BD messages Data Resource Metadata Manager MD messages Client Data Resource Storage Manager Data Resource Data Resource Integrated service for Data & Metadata Naming Service Data Resource Data Resource Metadata & Data Service Generic High-Level Compositions 2 Step1: Perform DAI & queries to locate metadata Step2: Use results to request bulk data Step3: Process retrieved data for application Step4: Validate & prepare for despatch & create / record metadata Step5: Despatch data + metadata Mainframes Mainframes Client Co D ntr at ol a Tr Me an ssa sp ge or s t Request & Response C o D ntr at o a lM Tr e an ss sp ag or es t es ag rt s e s s po M l an ro T r t on ta C Da Storage and Archival Services Storage and Archival Services Storage and Archival Services OGSADAI DR1 DR2 se on t sp or Re sp t, an e s Tr qu ta Re Da & Control Messages Data Transport Computation services es ag t s es or M n sp l ro ra nt a T o C at D OGSADAI Re & que D a st ta , R Tr e s an po sp n se or t Servers Servers OGSADAI DR3 Generic Dynamic Change Management Applications Change Absorbing Libraries Change Manager Services and Networks providing DAI Change Manager Change Manager Change Absorbing Libraries Change Absorbing Libraries Information Providers Storage & Compute Providers Summary & Conclusions • Five Patterns identified – Can we find a small number of Patterns? – Which we can support well? – Which can be composed to meet most requirements? • Are the Five Patterns a good start? • The Patterns provide a Computation Context – Demonstrably powerful – But is it a sufficiently well provisioned context? • A notation introduced – Needs development – Is it a suitable starting place? – Is it worth having? • What are the high-level DAI use patterns? – In the larger scale, longer term, and full computation context?