AT F 2 A G E N D A : M E E T I N G 2 Malcolm Atkinson V0.1 Tuesday, 28 October 2003 Agenda for meeting to be held at the National e-Science Institute, Edinburgh, 29th October 2003. 1 Comment on web site & note of last meeting http://www.nesc.ac.uk/teams/atf.html 2 Clarify ATF2 Mission 3 Identify strategy for producing architectural road map See pages 2 and 3 for some first thoughts 4 Tabulate categories (and examples) of inputs into strategy 5 Case studies: examples of architecturally significant perturbing forces 6 Method of working & Plan for next meetings DONM AOCB Malcolm Malcolm All All All 1 M AL C O L M ’ S F I R S T T H O U G H T S O N F U T U R E G R I D S The primary purpose of the future grid deployments will be to enable information grids. The incessant growth in data is powered by Moore’s law, by investment in shared instruments and sensor grids, and by the increased performance of simulation systems. The nature of this data is changing. More and more is structured and documented with annotations about its production, purpose, history and quality. It will be essential to invest in systematic approaches to metadata and integration in order to reap the benefits of extensive inter-organisational collaboration in creating and analysing this wealth of data. Indeed significant data, will be the result of collaborative annotation, which is already a powerful new communication medium for collaborative work. An information grid develops improved infrastructure that enables good use of these rich data resources in all disciplines and commercial applications. It will require significant advances in registries to allow both users and programs to find relevant data, and to combine it successfully for analysis and discovery. Systematic approaches to the description of services, software and data will automate many of the detailed steps that are required today and substantially accelerate application development and reduce errors. Computation will be intimately connected with data management and it will rarely be appropriate to think of one or the other dominating an application or infrastructure architecture. In particular, resource management, complex task planners, work flow enactment and schedulers will all co-optimise data operations a code execution. This requires increased use of code movement as data volumes will grow faster than code size. This raises new issues of safety and predictability. These grids will also require much greater dynamic capabilities than today’s grids. Many computations and data operations will be intimately connected with external data flows, e.g. streams from sensor networks, control flows from participating humans, output to control instruments, support systems and visualisation environments. Some of these will dynamically change the path and characteristics of workloads and some will have very demanding real-time requirements. This will require new infrastructure that supports dynamic re-optimisation and dynamic application redeployment. The production use of grids will lead to dependent applications that cannot tolerate service interruption and will be unwilling to adapt to changing grid infrastructure. However, grids will themselves need to evolve to deploy improved engineering, to respond to changing use and to take advantage of changing network and hardware provisioning. These ineluctably require incremental and dynamic infrastructure deployment while sustaining an uninterrupted service. Therefore the operational environment will be a federation of heterogeneous grids. Major advances in dynamic adaptation and infrastructure management will be required to support this. Current grids require far too much effort to support their operations, to develop applications and to become a proficient grid user. Future grids will therefore have to establish a framework that supports and encourages the development of tools that will address these socio-economic inhibitors to grid exploitation. There will be high-level systems for building and debugging grid applications, pervasive tools to support operations, automatically handling many local failures, and consistent and well-integrated portals. The development of effective tools depends on a highly consistent and high-level description of all components: resources, data, services and software. This will evolve as the grid develops to use consistent semantics and ontologies across all components and they themselves will evolve. The envisioned grids will depend on more complex and ambitious software than today’s systems, yet they must deliver dependability, resilience, flexibility, security and economic operation. This requires significantly improved architectures and engineering processes to build, maintain and operate the required infrastructure. 2 AN O T H E R S E T O F T H O U G H T S Today’s Grids are the product of design and implementation driven by specific goals and the urgent requirement to serve communities and demonstrate Grid potential, as in the European Data Grid project. In the next five years, the e-Infrastructure will transform as operational effectiveness, flexibility and application life-time costs become much more significant. Architectural evolution in response to the new balance of requirements will reconcile: • Increased application effectiveness through the development of supported and well explained programming paradigms and design patterns. That is, the life time application development and maintenance costs must be reduced by this evolution in order that Grid computing becomes beneficial to many more businesses and scientific endeavours. • The drive for a robust, reliable and resilient infrastructure to reduce operational costs will include enhancement of the technologies to install, operate, manage and evolve eInfrastructure. • The wider application of the Grid approach to distributed computation will ineluctably require many additional interfaces to other operational infrastructures and diversity in technical solutions. For example, many Grid applications will interact intimately with ambient computing networks of environmental sensors, mobile personal health systems and engineering instrumentation. These three forces all have interacting impacts on required architectures. A further challenge is the concomitant evolution of IT technology, e.g. the evolution of web service standards, toolsets and prevalent programming models. Finally, we may expect that the architecture will evolve to improve our ability to implement the e-Infrastructure middleware itself, for example, on the one hand exploitation of generic web services encourages a mix and match approach from a large set of orthogonal standards, whilst convenience in application and management is achieved by agreeing on certain standard combinations, such as those in OGSI. 3 P R I N C I P L E S W I T H W H I C H TO AP P R O AC H R C H I T E C T U R E Basic principles that will guide the formulation of architectural strategies include: • Take due account of existing investment, emerging and existing standards, operational practices and established application programming methods. • Continuous availability and economic operational cost requires partitioned autonomous components so that automated failure responses can limit the impact of local failures and minimise the risk of global failures. • Heterogeneity must be embraced so that multiple systems, adapted to specific requirements, produced by different organisations and corresponding to different stages in our architectures and implementations can be used together. • Dynamic evolution and replacement of infrastructure is essential to combine scale, longevity, diversity and continuity. • All aspects of computation must be embraced within one framework, for example data management and computation must be considered together. • Diversity of applications, application development and workload has to be accommodated and may be expected to increase. • Similarly, diversity of resource providers and their policies should be welcomed. • The proposed systems must be open, that is they must be prepared to interact with many other, co-evolving systems, such as those produced in other regions and those performing other functions. The scale of the envisaged system is such that it will not be able to demand any forms of central control. 4 S O M E TO P I C S F R O M T H E N E X T G R I D B I D • Pervasive and systematic descriptions to support discovery, integration and automation • Advanced registry technologies generating and using those descriptions • Automated support for integration and federation of data, services and computation • Core services: including unified resource models spanning data access and computation. • Planning and scheduling services that co-optimise data management and computation using the unified resource model. • Dynamic mechanisms to allow applications and infrastructure to adapt to changing load, context and external inputs. • Service quality: performance, reliability and resilience; • Service security: including protocols for security token exchange and propagation, and intrusion detection. • Service management: including autonomous and decentralised management.