DRAFT Defining the Role of the UK e-Science Architectural Task Force Malcolm Atkinson 19th October 2001 1 Context The Architectural Task Force (ATF) has been set up by the e-Science Core Programme Directorate to provide medium-term guidance to the development of UK e-Science. This general goal is refined below, based on a meeting of the ATF in Cambridge, 19th October 2001. The ATF will take a fresh look at the issues and strategies for building, maintaining and operating the large-scale, distributed systems that are need to support e-Science and eCommerce. Our challenge is to identify the frameworks that allow the requirements and structure of typical applications to be understood and then to identify existing components or research and development required to fill these frameworks. We believe that at present much work is still required before the systems that are envisaged can be built economically and routinely, with sufficient durability and flexibility. The ATF will not be able to provide immediate advice, that role will be filled by the Grid Support Team (GST). In the longer term it will provide assessment of project plans against its system frameworks and identify which components are readily available and which require R&D. It will also suggest which areas research are most necessary or have the greatest potential rewards. The development of this mosaic denoting the status of components within frameworks will be used to explain the progress achieved by Grid projects and to identify the issues that must be addressed before the vision of the infrastructure that supports e-Science and similar commercial activities can be realised. 2 Generic Issues The large-scale distributed systems that are built today are often developed by bespoke technology or rely heavily on special properties of a particular application or host organisation. As a result the intellectual and engineering investment in their construction and operation does not transfer to other applications. Similarly, they are typically brittle, in the sense that they cannot easily be changed to extend their function. Encouraging work that will overcome these limitations is a priority for the ATF. The ATF’s initial enumeration of the issues that must be considered includes: 1. The intrinsic unreliability of subsystems within a large distributed system. This must be considered, e.g. through a recognition of trade offs between the costs of redundancy and the reduction of risk. 2. Durability is a requirement before systems can support routine use. Premature dependence on an infrastructure will result in failures that lead to an excessive backlash against the general and achievable goals. 3. Systems have to operate without components, designers, implementers or operators having universal knowledge about the total system. Consequently they have to operate via dynamic knowledge discovery mechanisms that enable local operation in terms of what can be discovered about relevant properties of the encompassing system. ATF_Role+MOv1-19Oct01MPA.doc 1 of 4 19/03/2002 4. Heterogeneity is intrinsic to large-scale and distributed systems. Thus they must support information integration technology and a variety of translation subsystems, “digital Babel fish”☺. 5. Security and trust has to be established throughout these systems. 6. Some components must support dynamic information discovery and some must use such information to dynamically optimise operations. 7. Autonomy has to be supported, as local subsystems must change to meet new management, operating or functional requirements. This generates a concomitant requirement for evolution mechanisms and dynamic modification of subsystem specifications. 8. It is often appropriate to aggregate subsystems and services to achieve economic and accounting benefits. Accounting and economic models are a prerequisite for routine and sustained use. 9. Systems under consideration interact with other systems that already exist or are developed independently. Consequently, adequate mechanisms are needed to handle interfaces at the boundaries of large systems and to handle legacy models, components and systems. 10. The scale of systems envisaged will raise technological challenges. 11. These large-scale systems require new software development methods and tools. 12. Similarly tools are required to support the operations and management of these systems. These considerations interact and appropriate compositions of choices will differ for different applications. It is therefore necessary to identify and consider a number of representative frameworks, which will characterise the major patterns required for the variety of applications considered. 3 ATF Operational Plan The following activities will be undertaken by the ATF with the outputs indicated. Readers should be aware that this is a complex field and our programme of work will need to adapt as our understanding develops. 1. Identify a small set of frameworks that characterise relevant large-scale distributed systems. Each framework will identify components and show how those components relate to the issues given above. To validate and motivate these frameworks, they will be used to describe existing and planned application systems. We do not anticipate that there will be a useful one-size-fits-all framework. 2. The frameworks will be used to expose problems and issues. This will lead to an identification of areas warranting a focus of research or development attention. The frameworks should allow researchers and application developers to focus on system properties of interest and to leave other issues to other workers or automated mechanisms. 3. Existing and contemporaneous work, from both current industry practice and a broad range of research programmes, will be reviewed to identify significant inputs to the exposed issues and problems. ATF_Role+MOv1-19Oct01MPA.doc 2 of 4 19/03/2002 4. Active liaison will be maintained with GGF teams, relevant standards groups and practitioners with the goal of influencing the design of future systems via emerging standards and direct communication with development teams. 5. A summary of the progress and outstanding issues will be undertaken. This will be composed from the four activities above and is motivated by two considerations: it is important that potential users and funders understand what remains to be done, and it is essential to ensure that adequate investment is directed into addressing infrastructure research and development. 3.1 ATF Outputs and Services The ATF will produce outputs available to the UK e-Science Core-Programme Directorate, the UK e-Science community, computer-science researchers and wider audiences. The output will vary from the following illustrative list as our understanding develops and as needs are recognised. 1. Reports describing the distributed system frameworks, their motivation and use will be produced. 2. Meetings will be held with relevant groups and practitioners. As an example a meeting joint with the database architecture task force and with Ian Foster has been arranged for 12th to 14th December 2001, at the e-Science Institute, Edinburgh. 3. Issues and subsystems will have their status classified to indicate whether solutions are available in production quality, or whether they require development or research. 4. Topics requiring development or research will be identified together with a review of potentially significant inputs to their treatment. 5. White papers and other documents will be produced to influence standards. In some cases, these will need a supporting prototype implementation to validate their design and to offer a public reference implementation. The mechanisms for achieving such substantial bodies of work are still a matter of discussion. The ATF could operate entirely through encouraging others to develop implementations or could manage some of these developments itself, if granted suitable resources. 6. Publicly understandable reviews of progress and the road ahead will be produced to help manage expectations, to assist those contemplating using these systems and to influence funding decisions. 7. From Q2 2002 we will offer a service to pilot e-Science projects and others planning to use grid-like infrastructure where we review their project plans against our frameworks and assessment of progress. We will seek a summary of their distributed system implementation and comment on which components will be needed and their status, paying particular attention to spotting potential areas of difficulty. 3.2 ATF Requirements In order to operate, the ATF has a few requirements. 1. Funds to support meetings, attendance at meetings and meeting administration. 2. Staff time to prepare material, attend meetings and develop outputs. 3. Web site and mail system support. (This will be provided by NeSC.) ATF_Role+MOv1-19Oct01MPA.doc 3 of 4 19/03/2002 3.3 ATF Membership and Life-Cycle The membership should be based on meeting requirements for a set of skills and to provide authoritative links with other bodies, e.g. TAG, the Grid Network Team (GNT), etc. It should also be kept small to make it feasible to arrange meetings and achieve progress. It should be kept fresh through a process of renewal and review. 3.3.1 Initial Membership Name Malcolm Atkinson Jon Crowcroft David De Roure Vijay Dialani Andy Herbert Ian Leslie Tony Storey Initials MPA JC DDR VKD AH IL TS e-mail mpa@nesc.ac.uk j.crowcroft@cs.ucl.ac.uk dder@ecs.soton.ac.uk vkd00r@ecs.soton.ac.uk aherbert@microsoft.com Ian.Leslie@cl.cam.ac.uk tony_storey@uk.ibm.com The above listed group met Friday 19th October 2001, without Andy Herbert, who was unavailable, and this document is a result of that first meeting. 3.3.2 Skills and Contacts Required The membership of the ATF needs to cover the following list of skills. The table shows the coverage from the existing group. We have identified people to be approached in order to better meet our skill requirement. Skill area Dependable Computing Programming Models Security W3C Information Models Software Engineering Networks Operating Systems & Distributed Systems Databases Applications (preferably biomedical) Information Engineering Existing Coverage TS & DDER TS & ? JC & IL AH & IL TS & MPA The ATF also needs effective communication with important groups in its context. An initial identification of this requirement is given in the following table. Group TAG NeSC GGF W3C GNT GST DB Architecture TF Existing Links JC & IL MPA MPA DDR JC MPA MPA & ST 3.3.3 Review Cycle Members of the ATF require fixed-term commitments to provide a framework for scheduling deliverable outputs, to limit commitments and to ensure that new views and new energy is brought into the discussion on a regular basis. We therefore propose to operate for 12 months under an initial regime and then invite a review. We expect after that review to revise our goals, method of working and membership. ATF_Role+MOv1-19Oct01MPA.doc 4 of 4 19/03/2002