Databases and the Grid OGSA-DAI Architecture & Requirements Malcolm Atkinson OGSA-DAI Chief Architect Director of National e-Science Centre www.nesc.ac.uk 30th May 2002 OGSA Early Adopters’ Workshop Argonne National Laboratories Overview UK e-Science Scale, Coordination, Structure, Projects Database Task Force & GGF DAI-WG OGSA-DAI Project Scope, Scale, Participants, Plans Architecture Relationship with OGSA Requirements UK e-Science Programme Tony Hey DG Research Councils E-Science Steering Committee Director’s Awareness and Co-ordination Role Academic Application Support Programme Research Councils (£74m), DTI (£5m) PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) £80m ESRC (£3m) EPSRC (£17m) CLRC (£5m) Grid TAG Director Director’s Management Role Generic Challenges EPSRC (£15m), DTI (£15m) Collaborative projects Industrial Collaboration (£40m) UK Grid Network National e-Science Centre Edinburgh Glasgow AccessGrid always-on video walls Newcastle Belfast Daresbury Lab Manchester Cambridge Hinxton Oxford Cardiff RAL London Southampton NeSC’s Roles Coordination, Stimulation & Education e-Science Centres Application Pilots IRCs … e-Scientists, Grid users, Grid services & Grid Developers TAG ETF GNT DBTF ATF NeSC STF GSC UK Core Directorate eSI CS Research Global Grid Forum … UK Architectural Task Force (ATF) Malcolm Atkinson (NeSC) Jon Crowcroft (Cambridge U.) Vijay Dialani (Southampton U.) Ian Leslie (Cambridge U.) Ken Moody (Cambridge U.) Tony Storey (IBM) Geof. Coulson (Lancaster U.) David De Roure (Southampton U.) Andrew Herbert (Microsoft) Andrew Martin (Oxford U.) Steven Newhouse (ICSTM & LeSC) …………… Plus consultations UK Role in Open Grid Services Architecture, Version 0.6 11th March 2002 www.nesc.ac.uk → teams → ATF Obtained Agreement: OGSA as Foundation for UK work, 18 April 2002 e-Science Institute National e-Science Centre Edinburgh + Glasgow Universities Physics & Astronomy × 2 Informatics, Computing Science EPCC £6M EPSRC/DTI + £2M SHEFC over 3 years e-Science Institute visitors, workshops, co-ordination, outreach middleware development 50 : 50 industry : academia ‘last-mile’ networking www.nesc.ac.uk UK Pilot Projects Research Councils Autonomy > 30 Projects $5 million to $0.3 million Wide Range of Disciplines Industrial Involvement Integration and Access to Information e-Science Centre Projects > 50% Industrial Involvement IRC ‘Grand Challenge’ Projects Equator: Technological innovation in physical and digital life AKT: Advanced Knowledge Technologies DIRC: Dependability of Computer-Based Systems MIAS: From Medical Images and Signals to Clinical Information From presentation by Tony Hey Particle Physics and Astronomy e-Science Projects GridPP links to EU DataGrid, CERN LHC Computing Project, US GriPhyN and PPDataGrid Projects, and iVDGL Global Grid Project AstroGrid links to EU AVO and US NVO projects OGSA-DAI Early Adopter From presentation by Tony Hey EPSRC e-Science Projects (1) Comb-e-Chem:Structure-Property Mapping Southampton, Bristol, Roche, Pfizer, IBM DAME: Distributed Aircraft Maintenance Environment York, Oxford, Sheffield, Leeds, Rolls Royce Reality Grid: A Tool for Investigating Condensed Matter and Materials QMW, Manchester, Edinburgh, IC, Loughborough, Oxford, Schlumberger, … From presentation by Tony Hey EPSRC e-Science Projects (2) MyGrid: Personalised Extensible Environments for Data Intensive in silico Experiments in Biology Manchester, EBI, Southampton, Nottingham, Newcastle, Sheffield, GSK, Astra-Zeneca, IBM, Sun OGSA-DAI Early Adopter GEODISE: Grid Enabled Optimisation and Design Search for Engineering Southampton, Oxford, Manchester, BAE, Rolls Royce Discovery Net: High Throughput Sensing Applications Imperial College, Infosense, … From presentation by Tony Hey MyGrid e-Science Workbench Goal is to develop ‘workbench’ to support: Experimental process of data accumulation Use of community information Scientific collaboration Provide facilities for resource selection, data management and process enactment Bioinformatics applications Functional genomics, pattern database annotation Manchester, EBI, Newcastle,Nottingham, Sheffield, Southampton GSK, AstraZeneca, Merck, IBM, Sun, ... From presentation by Tony Hey Overview UK e-Science Scale, Coordination, Structure, Projects Database Task Force & GGF DAI-WG OGSA-DAI Project Scope, Scale, Participants, Plans Architecture Relationship with OGSA Requirements ( DBTF Web Pages http://www.cs.man.ac.uk/grid-db DBTF Membership Malcolm Atkinson (NESC) Vijay Dialani (Southampton University) Norman Paton (Manchester University) Dave Pearson (Oracle UK) Tony Storey (IBM Hursley) Paul Watson (Newcastle University) DBTF: Aims & Actions Requirements Capture Pilot Project Meetings Report Dave Pearson Roadmap UK Coordination GGF Articulation Standards BoF GGF4 Papers GGF5 Implementation Projects OGSA-DAI Architecture Liase with ATF Liase with Globus team Education e-Science Institute Pilot Projects GSC Evolving GGF DAIS WG Broader community Overview UK e-Science Scale, Coordination, Structure, Projects Database Task Force & GGF DAI-WG OGSA-DAI Project ( Scope, Scale, Participants, Plans Architecture Relationship with OGSA Requirements OGSA-DAI Partners IBM USA EPCC & NeSC Glasgow Newcastle Belfast Daresbury Lab Manchester Oxford Cambridge Hinxton EPCC & NeSC Oracle RAL IBM UK Cardiff London IBM Hurseley IBM USA Southampton Manchester e-SC Newcastle e-SC st February 2002 $5 million, 18 months, started 1 Oracle OGSA-DAI Scope Definition and development of generic Grid data services which provide access to and integration of data held in databases, and the management of data within a distributed environment. Database A stored, structured collection of data Accessed using an API that takes account of the structure of the data stored Includes Relational and object databases XML repositories Adequately described collections of files Databases in the Grid Data Complexity Computational Complexity Scope of Database Services Discovery of Data by Content Query and Update Statements Metadata Management & Evolution Transactions (Flavours of) Distributed queries and updates Specialised types Encapsulated (safe) Function application Notification (driven by triggers, etc.) OGSA-DAI Objectives Produce specifications for generic data services based on a common design framework consistent with Open Grid Service Architecture Design specifications as basis of standards recommendations via Database Access and Integration Services Working Group to the Global Grid Forum Deliver Grid data services software in future releases of the Globus Toolkit (GT3 December 2002) Refine identified requirements evaluate design options develop demonstrators transfer skills to the Grid community Develop reference implementations of generic data services Ensure that the Grid model and OGSA standards address fully the needs of data access and integration Ensure Grid data services meet the levels of service required performance, scalability, resilience, availability, and manageability evolution and distribution large user populations and large data volumes OGSA-DAI Plan Two Phases Phase 1: Started Feb 02 ends GGF5 Detailed Plan – X X X X X X X X Requirements, Designs & Prototypes 6 Work Packages Project Management (Oracle, EPCC) Architecture (NeSC, DBTF) XML Data Management (NeSC & EPCC) Distributed Query Systems (Manchester & Newcastle) Metadata & Registries (NeSC & EPCC) Relational Databases (IBM UK) Phase 2: 12 months X X Structure and Objectives to be Refined in Major Review GGF5 DAIS WG meeting a major input OGSA-DAI Time Line WS + GSI UK support ( > 60 downloads) XML + OGSA Prototypes for Early Adopters RDB + GT2 / OGSA Prototypes for Early Adopters Design Documents & Demos for DAIS WG @ GGF5 XML + OGSA Prototype Available RDB + GT2 / OGSA Prototypes Available Ship for GT3 Integration Feb ’02 May ’02 Phase 1 Starts Jul ’02 Sep ’02 Dec ’02 Phase 2 Starts Feb ’03 May ’03 Sep ’03 Milestones & Deliverables 3rd Jul 2002 30th Sept 2002 31st Dec 2002 GGF 5 Deliverables 1st Draft – OGSA-DAI Design Specification Working Grid data service prototype with workshop material Draft Phase 2 functional scope for each Work Package End Phase 1 Phase 1 Review Report and recommendations including: revisions to Phase 2 streams of work, Work Package structure, content, and scope Completed, Tested, Work Package prototypes with evaluation report detailing functional scope and deficiencies, design options, measures for acceptance RDBMS/Globus-2 prototype implementation Phase 2 scope Agreed 2nd Draft – OGSA-DAI design specification Dissemination programme for UK e-Science community Transition programme for UK Grid Support Team and Globus Development Team Globus Toolkit Release 1st Grid data services reference implementation for Globus Toolkit 3 1st Grid data services specification for Globus Toolkit 3 Scope of functional content for 2nd Globus Toolkit release and specification 1st release training and support courses 31st Mar 2003 Interim UK e-Science community release 31st Jul 2003 Globus Toolkit Release Interim Grid data services implementation for UK e-Science community Release training and support courses, with documentation 2nd Grid data services reference implementation for Globus Toolkit 3 2nd Grid data services specification for Globus Toolkit 3 2nd release training and support courses Publications and papers to support reference implementations through WG discussions and GGF standards processes Final Project Report OGSA-DAI: Key Components Grid Database Services (GDS) GXDS, GRDS, GSFDS, … Perform DB actions Extra Data Service Elements DB-action-Management Functions Notifications from Triggers Grid Database Service Factories (GDSF) Create the above Extra Data Service Elements Database Service Registries (DSR) Specialised Registries to find DBs, Services & Factories Grid Data Transfer Services (GDTS) Described at Requirement Level Flexible & mapped to grid-FTP, MQ Series, … OGSA-DAI Architecture GDSF DSR 1 request for factory client OGSA-DAI Architecture GDSF DSR 1 request for factory client 2 response with GDSFs GSHs OGSA-DAI Architecture GDSF 3 script for 3 GDSs DSR 1 request for factory client 2 response with GDSFs GSHs OGSA-DAI Architecture GDSF 3 script for 3 GDSs 4 creation of 3 GDSs GDS1 DSR 1 request for factory GDS2 client 2 response with GDSFs GSHs GDS3 OGSA-DAI Architecture GDSF 3 script for 3 GDSs 4 creation of 3 GDSs GDS1 DSR 1 request for factory client 2 response with GDSFs GSHs 5 response with 3 GSHs GDS2 GDS3 OGSA-DAI Architecture GDSF 3 script for 3 GDSs 4 creation of 3 GDSs GDS1 DSR 1 request for factory client 5 response with 3 GSHs 2 response with GDSFs GSHs GDS2 GDS3 6 scripts requesting DB actions OGSA-DAI Architecture GDSF 3 script for 3 GDSs 4 creation of 3 GDSs GDS1 DSR 1 request for factory client 5 response with 3 GSHs 2 response with GDSFs GSHs GDS2 GDS3 6 scripts requesting DB actions 7 transfer data batch to GDS2 stream to GDS3 OGSA-DAI Architecture GDSF 3 script for 3 GDSs 4 creation of 3 GDSs GDS1 7 transfer data batch to GDS2 stream to GDS3 DSR 1 request for factory client 5 response with 3 GSHs 2 response with GDSFs GSHs GDS2 GDS3 6 scripts requesting DB actions 8 stream data to GDS2 OGSA-DAI Architecture GDSF 3 script for 3 GDSs 4 creation of 3 GDSs GDS1 7 transfer data batch to GDS2 stream to GDS3 DSR 1 request for factory client 5 response with 3 GSHs 2 response with GDSFs GSHs GDS2 GDS3 6 scripts requesting DB actions 9 transfer data batch to client 8 stream data to GDS2 OGSA-DAI Architecture GDSF 3 script for 3 GDSs 4 creation of 3 GDSs GDS1 7 transfer data batch to GDS2 stream to GDS3 DSR 1 request for factory client 5 response with 3 GSHs 2 response with GDSFs GSHs GDS3 6 scripts requesting DB actions 10 stream data to specified destination GDS2 9 transfer data batch to client 8 stream data to GDS2 OGSA-DAI & OGSA <((-:} Description, e.g. portType Works Well Adding only one portType / GDS(F) | DSR Expect to make extensive use of Data Service Elements X Special to DBs: Static & Dynamic Component Management Notification Grid-FTP Accounting Security: X Authentication, Authorisation & Privacy Reliable invocation … OGSA-DAI & OGSA <))-:} Lifetime Issues Conditions for termination Controlled clean-up opportunity Scope of State Evolution Notification Issues Registering & using same notification system X X For DBs, e.g. triggers do we have to construct a dummy Service Data Element? Type System Issues Standards needed for wide range of types Service Definition Issues How to create / obtain standard definitions for common services OGSA-DAI Summary On Schedule & Going Well Expect Contributions via DAIS-WG @ GGF5 Expect Contributions to GT3 Releases Early Days Testing Architectural Design Using OGSA Working with Early Adopter Pilot Projects X AstroGrid & MyGrid Planned release of prototypes Influence OGSA-DAI direction Via DAIS-WG