Databases and the Grid Norman Paton University of Manchester (Chair of UK e-Science Programme Database Taskforce) Databases and the Grid Why they matter. What the issues are. What the e-Science Programme is doing about it. The Grid in Context Data Complexity Computational Complexity Grids Classical Grids emphasise sharing of physical resources. Existing Grid middleware allows resource discovery, resource allocation, data movement, … NASA Power Grid (http://www.ipg.nasa.gov/) Managing Data Data collections need to be managed to provide: Scaleability. Reliability. Concurrency. Evolution. Both size and complexity matter. Databases and the Grid Why they matter. What the issues are. What the e-Science Programme is doing about it. Data Management Complexity Many different: Models of data. Domains of control. Locations. Patterns of use. No well defined application boundaries: Grid applications combine data access and update with computation. Middleware Complexity Combining Grid and Web Services composition frameworks (e.g. XCAT) Job Submission / Control Grid ssh File Transfer CORBA GRAM Data Management Monitoring Events …… Credential Management Workflow Management other services: •visualization •interface builders •collaboration tools •numerical grid generators •etc. Python, Java, etc., JSPs CoG Kits implementing Web Services in servelets, servers, etc. Apache SOAP, .NET, etc. Apache Tomcat&WebSphere &Cold Fusion=JVM + servlet instantiation + routing Resources Condor-G SRB/ Metadata Catalogue Data Replica and Metadata Catalog GridFTP Grid Monitoring Architecture Grid X.509 Certification Authority Grid Information Service Grid Web Service Description (WSDL) & Discovery (UDDI) MPI Secure, Reliable Group Comm. Grid Protocols and Grid Security Infrastructure Environment Management (LaunchPad, HotPage) Grid Services: Collective and Resource Access Grid Protocols and Grid Security Infrastructure http, https. etc. Problem Solving Environments (AVS, SciRun, Cactus) PDA Web Browser X Windows Discipline / Application Specific Portals (e.g. SDSC TeleScience) Web Services XML / SOAP over Grid Security Infrastructure Clients Application Portals Compute (many) Storage (many) Communication Instruments (various) The Issues Identifying the most important services. Agreeing consistent interfaces. Integrating with other Grid services. Implementing services for database: Access. Integration. Databases and the Grid Why they matter. What the issues are. What the e-Science Programme is doing about it. Pilot Projects Several application based pilot projects need database functionalities. These projects: Push current technologies. Identify generic requirements. Database Task Force Collating requirements. Developing standards. Developing reference implementations. Development Project Developing database access and integration services. Joint between industry, and eScience Centres. Exploiting Open Grid Services Architecture: OGSA. Summary Early Grid middleware largely overlooks the importance of databases. The UK e-Science programme was quick to identify this omission. The UK e-Science programme is working within GGF to establish standards. Reference implementations of the standards will be pioneered within the UK.