Design decisions: architecture Richard J. White Cardiff School of Computer Science BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 1 BDWorld architecture – user perspective (original ideas) Analytic tool Proxy BDGrid Analytic tool Proxy Taxonomic index (Species 2000 & ITIS Catalogue of Life) GSD GSD GSD GSD Proxy Ontology: Metadata Intelligent links Resource & analytic tool descriptions Maintenance tools Thematic data source Problem Solving Environment: Broker agents Facilitator agents Presentation agents Proxy User Proxy Proxy Abiotic data source BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 Local tools Problem Solving Environment user interface 2 Design principles 1: the Grid • Creating a Grid for biodiversity informatics • Current Grid practice and software keep changing • Architecture and much of the implementation should be insulated from changes in Grid technology: such changes should require no change to resource software (other than rebuilding wrappers); only our interface to the Grid would need to change BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 3 Architecture 1: interfacing to a Grid A Software Component in BDWorld BGI API BDWorld-GRID Interface (BGI) The GRID BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 4 Design principles 2: services • Data sources, analytical tools, etc. should be made available as services which can be invoked remotely by clients • Service-oriented computing is a Good Thing – users do not need to install or adapt resources to their own environments • Potential for interoperability with other Grids in related domains such as environmental, molecular and genomic biology (e.g. myGrid) BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 5 Architecture 2: invocation via the Grid A Software Component Another Software Component BGI API BGI API BDWorld-GRID Interface (BGI) BDWorld-GRID Interface (BGI) The GRID BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 6 Design principles 3: wrappers • The services are made available as Operations provided by a Resource • A Resource is connected to the BDWorld Grid through a Wrapper • Any program could therefore be a resource, only the difficulty of wrapping it would vary • Resources and wrappers should be able to be implemented in any language BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 7 Architecture 3: wrapped resources User Workflow enactment engine Remote Resource Wrapper BGI API BGI API BDWorld-GRID Interface (BGI) BDWorld-GRID Interface (BGI) The GRID BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 8 Design principles 4: workflows • User metaphor based on the concept of workflows – requires a workflow manager for design and enactment of workflows • Flexible use and re-use of work-flows • Resource interoperability with heterogeneous data, complex in structure • Need to be able to select suitable resources which “fit together” in a workflow – requires metadata • Need to record activities, data generated, etc. • Did I say we also need a user interface? BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 9 Architecture 4: (as we planned it) User User interface Legacy user interfaces Presentation layer Metadata repository Workflow enactment engine Local tools e.g. Input and Output Units Native BDWorld resources Wrapped “legacy” resources BGI API BDWorld-GRID Interface (BGI) The GRID BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 10 Design principles 5: desiderata Extensibility and flexibility are important: • Minimise the joining ‘cost’ for • users (easy installation of local components) • providers (adding a new resource of a type not previously encountered) • Adding attributes to the metadata requires no change to the MDR • Challenges with handling non-portable resources and inflexible user interfaces … BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 11 Legacy resource issues addressed • We planned to deal with resources which: • • • • • interact with their user locally in real time have not been designed to be scripted cannot support multiple simultaneous invocations run on specific platforms only have other unexpected requirements • Using techniques such as • capturing input and/or output and emulating a real user’s actions • providing user access to remote desktops • limiting where the user of a work-flow can be sited • providing instructions for direct user control • modifying the source code • avoiding the use of the resource altogether BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 12 Architecture (as we built it) User User interface (Protégé) User interface Presentation layer Metadata repository (MDR) Workflow enactment engine Local tools e.g.WFDA, Input and Output Units MDA Legacy user interfaces Native BDWorld resources Wrapped “legacy” resources BGI API BDWorld-GRID Interface (BGI) The GRID BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 13 Glossary of components (existing or Real Soon Now) • Problem-solving Environment (PSE) • Workflow Designer, Enactment Engine, User Interface [Triana] • local Units in Toolboxes • proxies for remote Operations • local functions, including • Input and Output Units • Workflow Design Assistant (WFDA) • Metadata Agent (MDA) • BDWorld-Grid Interface (BGI) • BGI Comms Layer, API, Wrappers • Remote Resources • provide Operations (services) • Metadata Repository (MDR) • BDWorld ontology, metadatabase, user interface BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 14 Current evaluation; future flexibility We believe our architecture ensures that BDWorld is: • not limited to a specific application domain • extensible to cope with unanticipated uses and resources Because: • new resources can be added • domain-specific knowledge resides • only in the resources and the MDR • not in the BGI or the workflow engine or its user interface or the Metadata Agent • MDR contents come from the resources and from humans customising the MDR to assist in new domains BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 15 A dream • Desktop environment in which scientists “drag & drop” data sources, analysis and modelling tools, and visualisation interfaces into desired sequence of operations which can be run automatically • Essentially a component-based visual programming environment for scientific tasks • With additional features (some described earlier), the environment could be made richer, more productive, and support research groups. • Not just for biodiversity! BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 16 Where do we go from here? • Present system is a proof of concept • Limited • Restricted domain of exemplars • Needs • more data resources • more PSE functionality (described next) • additional features • User interaction (described earlier) • Virtual organisations (described later) BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 17 Extra PSE functionality Some of these topics are becoming available within the present BDWorld project • Enhanced metadata • Provenance and data lineage • Automatic electronic “notebook” • Stored workflows • Repeatability, reproduceability • Re-use with different data, changed parameters • Ontologies • Resource discovery and improved selection • Usability • Dynamic interaction of users with resources BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 18 Virtual organisations These are not going to be addressed during the present BDWorld project, but would make a good Computer Science component in future proposals • Collaborative working environments • Shared and private resources: data, tools • Shared experimentation • User authentication • Access control • Controlled release of data, tools and results • Dynamic • Membership • Resources BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 19 The way forward • New domain exemplars • Links with national and international organisations, resources • “End users” • Applied use, driven by scientific priorities • Input for planning • Feedback for evaluation and improvement • … BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Richard White Design decisions: architecture 1 July 2005 20