1 Design Decisions Interoperability in a changing architecture Andrew Jones BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Andrew Jones Interop. in changing infrastructure 2 BiodiversityWorld requirements (1) • Biodiversity Problem Solving Environment – • Heterogeneous diverse resources • Facilitating integration of both legacy and newlydeveloped resources • Flexible workflows • Main challenges centre around metadata, interoperability, resource discovery, etc; • High-performance computing secondary (though relevant) BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Andrew Jones Interop. in changing infrastructure 3 BiodiversityWorld requirements (2) • Distinctive features: • a biodiversity informatics GRID • interoperability with heterogeneous data, complex in structure • resilience to infrastructure change & interoperation with other GRIDs • interactive collaboration a secondary concern • Assumptions about resources: • A resource worked either: • Essentially in ‘batch’ mode, or • Supporting a sequence of operations on a single resource, but involving exchange of minimal data • Reasonable to treat each resource (including databases) as a service offering its own, defined set of operations BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Andrew Jones Interop. in changing infrastructure BiodiversityWorld architectural overview User interface Metadata repository Workflow enactment engine Presentation Native BiodiversityWorld Resources BGI API BiodiversityWorld-GRID Interface (BGI) BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 The GRID Andrew Jones Interop. in changing infrastructure Wrapped resources 4 5 The BGI concept • Standardised invocation mechanism • Wrappers notionally divided into Grid-facing and resource-facing parts <<interface>> 1 BgiWrapperInterface Bgi Implementation_1 1 Bgi Implementation_2 BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 <<abstract>> BdwAbstractWrapper ... Concrete Wrapper_1 Andrew Jones Interop. in changing infrastructure Concrete Wrapper_2 ... Why we protected ourselves from ‘the Grid’(!) • Rapidly evolving standards • Previous experience in GRAB • Globus 2 approach needed ‘canned queries’, temporary files, etc … unnatural for distributed request/response model • BiodiversityWorld • Globus and other software still evolving • Globus 3: Grid Services; Globus 4: WSRF; … • Trade-off: abstraction layer (BGI); invocation mechanism • Insulates from change • Performance penalty • Assume computationally intensive applications lie in a single BDW resource • Proprietary invocation mechanism hinders interoperation with other Grid/Web services BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Andrew Jones Interop. in changing infrastructure 6 7 Implementations of BGI • • • • RMI GT3 Grid Services (incomplete) Web services GT4/WSRF/Grid-Service-as-portal BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Andrew Jones Interop. in changing infrastructure 8 Benefits & limitations • Too many standards, so we defined a new one!! • Interoperability with other projects restricted • Could wrap non-BDW resources, or • Implement alternative Grid-facing “glue” replacing invocation mechanism with some other standard • Restrictions on highly interactive applications • BGI OK for coarse-grained interaction; not for dynamic interaction with potentially large data volumes • Transmission and storage of intermediate results: method not specified • Can pass URI instead of data, but no specifications restricting what this might refer to BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Andrew Jones Interop. in changing infrastructure 9 Transmission/storage of data • Desirable to have uniform mechanisms for transmission and storage of data for: • Efficient operation of workflows • Re-use; composition of workflows • Supporting more flexible experimentation BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Andrew Jones Interop. in changing infrastructure 10 Are workflows sufficient for flexible experimentation? • Creating a workflow: • Workflows clearly good for capturing complex tasks • Good for ‘tweaking’ tasks • But is this how users think? • If not, we should provide an environment that supports a more exploratory approach too, e.g. • User tries out some small subtasks • (S)he joins results together • Builds larger workflows from fragments • This requires recording of interactions, so re-usable workflows can be composed • Storage of intermediate data sets • Provenance metadata (extending MDR) BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Andrew Jones Interop. in changing infrastructure 11 How to achieve dynamic interaction? • Some possibilities for future development • Remote direct manipulation (And other remote interactions?) • BGI not well suited to fine-grained interaction with resources • Some resources may not be accessible except as stand-alone • May need (less portable) ‘by-pass’ mechanisms, e.g. • New BGI protocol • Using existing techniques, such as VNC • Local direct manipulation, etc. • Achievable via component-based ‘plug-in’ approaches (e.g. using JavaBeans), but component interface must be defined • Requires data to be present locally; bandwidth concerns • Some bandwidth problems can be addressed by combining local specialised client component & remote server component (e.g. passing vectors, not bitmaps) • BGI may or may not be fast enough in this case BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Andrew Jones Interop. in changing infrastructure 12 How to achieve data transmission/intermediate result storage? • Low level • E.g. orchestrate facilities such as GridFTP, GRAM, … • Higher-level • E.g. Inferno, SRB BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Andrew Jones Interop. in changing infrastructure 13 Additional considerations • Again, have problem of committing to other, evolving standards • Need at least a thin API layer to protect resources from change • And don’t want to break existing BDW system BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Andrew Jones Interop. in changing infrastructure More direct database exploitation with OGSA-DAI • • BioDA project is investigating relevance & suitability of OGSA-DAI in relation to bioinformatics projects 2 main possibilities within BDW: 1. Augment BGI to support inclusion of queries in workflows and to be sent directly to OGSA-DAI enabled databases. • Distributed query processing facilities could assist in planning execution & distribution of data-orientated parts of a workflow. (For the current status of OGSA-DQP see Section 4.) • • Very major revision to BDW protocols; also, many resources of interest are simply not exposed as databases. 2. Provide facilities within individual wrappers that benefit from OGSADAI. • Current exemplar (under development) takes approach (2) … BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Andrew Jones Interop. in changing infrastructure 14 15 BDW OGSA-DAI initial exemplar OGSA-DAI R5 GDS 3. Invoke wrapper 1. BGI invokeOperation () Wrapper Module BDWQueryActivity 2. Create GDS and query Wrapper Wrapper Wrapper 6. url deliverFromURL(url) Format file (xsl) OGSA-DAI Client 8. getOutPut() 5. Download URL Web DBs 7. XSL transform to BDW format XSLTransform BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 pull data Andrew Jones Interop. in changing infrastructure 4. Query BDW OGSA-DAI exemplar extension OGSA-DAI R5 GDS 1. BGI InvokeOperation ([ ]) 7. XSL transform to BDW format XSLTransform XSLTransform XSLTransform 8. integrate output OGSA-DAI Client mergeOutput 9. To WF unit deliverToURL /GFTP BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Andrew Jones Interop. in changing infrastructure 16 17 Conclusions • BDW interoperation layer designed to meet requirements we were given • Suitable for high-level interactions • Not so good for dynamic interaction with resources (need for this now generally recognised) • Doesn’t specify how data is to be moved around • Applicable to other domains meeting similar criteria • Interesting possibilities for extension • But we have achieved a sustainable architecture; this is an important feature to retain in future systems BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Andrew Jones Interop. in changing infrastructure 18 Some discussion points (Arising from Jaspreet’s and Andrew’s talks) 1. Balance of requirements for different kinds of GRIDS – (performance, resource discovery, sustainability, …) – how does this affect decisions about architectures, protocols, … ? 2. How can BDW protocols best be enhanced in future projects? 3. How can we best achieve interoperability between grids from different projects (including BDW)? 4. How can we make it easier for 3rd parties to • • Introduce their resources to an existing BgiWrapperService? Develop their own additional BgiWrapperServices? BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Andrew Jones Interop. in changing infrastructure