The OGSA-DAI Project Databases and the Grid Neil Chue Hong Project Manager EPCC, Edinburgh N.ChueHong@epcc.ed.ac.uk http://www.ogsadai.org.uk What is OGSA-DAI? It is a project: – OGSA Data Access and Integration: funded by the UK eScience Grid Core Programme It is a vision: – From simple database access to truly virtualised data resources It is a standard: – The GridDataService Specification from the Data Access and Integration Working Group (DAIS-WG) of the Global Grid Forum (GGF) It is software that you can use: – Current version is R2.5 http://www.ogsadai.org.uk OGSA-DAI Objective To define: – – – – – open standards and open source based uniform service interfaces for accessing heterogeneous data sources within the Open Grid Services Architecture (OGSA) framework Why? – Because we are increasingly wanting to integrate different data sources from different organisations together – The Grid, and OGSA, appears to provide a framework for producing software to do this http://www.ogsadai.org.uk Who are we? Contributing to the global grid computing community IBM USA EPCC & NeSC Glasgow Newcastle Belfast Manchester Daresbury Lab EPCC & NeSC IBM UK IBM USA Manchester e-SC Newcastle e-SC Oracle 373 man months Oxford Cardiff IBM Hursley RAL Cambridge Oracle Hinxton London Southampton £3 million, 18 months, started February 2002 Funded by the Grid Core Programme http://www.ogsadai.org.uk What are we doing? Data Intensive Applications Scientific Data Mining & Integration Technology Monitoring Diagnosis Scheduling Accounting Logging Grid Plumbing & Security Infrastructure Data & Storage Resources Distributed http://www.ogsadai.org.uk What are we doing? Data Intensive Applications Scientific Data Mining & Integration Technology Monitoring Diagnosis Logging Data Integration Scheduling Accounting Authorisation Data Access Grid Plumbing & Security Infrastructure Data & Storage Resources Structured Data Distributed http://www.ogsadai.org.uk What are we doing? Data Intensive Applications App. Developers Scientific Data Mining & Integration Technology Monitoring Diagnosis Logging Data Integration Scheduling Accounting Authorisation Data Access Operations Grid Plumbing & Security Infrastructure Team Owners Data & Storage Resources Structured Data Distributed http://www.ogsadai.org.uk What are we doing? Data Intensive Application Scientists Data Intensive Applications App. Developers Scientific Data Mining & Integration Technology Tech. Developers Monitoring Diagnosis Logging Data Integration Scheduling Accounting Authorisation Data Access Operations Grid Plumbing & Security Infrastructure Team Owners Data & Storage Resources Distributed Structured DataData Providers Data Curators http://www.ogsadai.org.uk DAIS WG GridDatabaseService Specification – – – – – DAIS WG of the GGF Aim to produce a V1.0 specification by early 2004 Defines an interface for a GridDatabaseService May contributors, not just OGSA-DAI Project OGSA-DAI (the software) seeks to be a reference implementation of this standard • But does not necessarily track it exactly just now – Requirements and Overview Informational documents also published http://www.ogsadai.org.uk The OGSA-DAI Approach Reuse existing technologies and standards – OGSA, Query languages, Java, transport Three key services: – GridDataService – GridDataServiceFactory – DAIServiceGroupRegistry Benefits: – – – – – Location independence Hides heterogeneity Scalable Flexible Dynamic http://www.ogsadai.org.uk OGSA-DAI Positioning - Today OGSA-DAI Distributed Query OGSA-DAI Basic Services Delivery Data Format Query GDS (Create Retrieve Drivers Update Delete) OGSA GDSF Meta Data Notification Lifetime Location Database, Communication, OS… Technology http://www.ogsadai.org.uk DAISGR OGSA-DAI To Date Assuming that OGSA becomes the standard framework – Have adopted the OGSA approach Have first concentrated on data access – Released software has only limited data integration so far – Distributed query processor prototype due in July Implementation provides focus on basic functionality first – But architecturally we have tried to answer many pertinent questions – Functionality will increase over subsequent releases http://www.ogsadai.org.uk GDS in action 1a. Request to Registry for sources of data about “x” Analyst 3c. Results of query returned to client as XML SOAP/HTTP Registry DAISGR service creation 1b. Registry responds 2a. Request to Factory for access to with database Factory handle 2c. Factory returns handle of GDS to client Factory GDSF 2b. Factory creates GridDataService to manage access 3a. Client queries GDS with SQL, XPath, XQuery etc OR 3d. Results of query delivered to consumer as XML Grid Data Service GDS 3b. GDS interacts with database Consumer API interactions http://www.ogsadai.org.uk Database (Xindice MySQL Oracle DB2) Activities OGSA-DAI is structured around the concept of activities This framework allows new functionality to be added easily Three types of activity at present: – statement (e.g. SQLQuery, Xupdate) – transformation (e.g. XSL translation, compression) – delivery (e.g. GridFTP) OGSA-DAI provides implementations of common functionality, others can extend http://www.ogsadai.org.uk Documents Accessing a Grid Data Resource is done using Documents <gridDataServicePerform> – caveat: this may change A document allows you to: – define parameters – execute activities – deliver results <request name=“myRequest”> <parameter name=“idname”> <value name=“idvalue”>10</value> </parameter> <sqlQueryStatement name=“myStatement”> <sqlParameter position=“1” from=“idvalue”/> <expression> SELECT * FROM littleblackbook WHERE id=? </expression> <webRowSetStream name=“statementresult”/> </sqlQueryStatement> Written in XML, <deliverToResponse name=“d1”> normally used by a client. <fromLocal from=“statementresult”/> </deliverToResponse> </request> </gridDataServicePerform> http://www.ogsadai.org.uk OGSA-DAI Core Services OGSA-DAI Release 2.5 – out now – Java, Tomcat, Globus Toolkit 3 Beta – Supports MySQL, DB2, Xindice; SQL92, XPath, Xupdate OGSA-DAI Release 3 – end July – Java, Tomcat, Globus Toolkit 3.0 – Supports MySQL, DB2, Oracle, Xindice; SQL92, XPath, Xupdate – Adds Notification, Internationalisation, Transactions, Caching Continue to track Globus Toolkit 3 releases – Experimental, then production, GT3 grids will help http://www.ogsadai.org.uk Asynchronous Delivery Asynchronous delivery – Pull Q Client GDS Instance GDS 1 Rs 2 DT DB D + GDH GSH/R + data id 3 Ra Consumer GDT Asynchronous delivery – Push Q + D + GSH/R Client GDS 1 Rs 2 DT GDT Consumer GDS Instance GSH/R 3 Ra http://www.ogsadai.org.uk DB GDS Composition 1 2 3 Client Client Client GDS Operation GDS Operation GDS Operation GDS Operation GDS DB 4 GDS DB 5 Operation Operation Client DB Client GDS Operation GDS Operation GDS Operation GDS Operation GDS Operation GDS Operation Operation http://www.ogsadai.org.uk DB DB Distributed Query Service A higher level service: – Extension of Polar* query processor, partitions and schedules queries – Sits on top of OGSA and OGSA-DAI Defines new portTypes and services – GridDistributedQuery(GDQ) PortType – GridDistributedQueryService(GDQS) – wraps Polar* – GridQueryEvaluatorService(GQES) – perform subqueries Currently based on OGSA-DAI Release 1.5 http://www.ogsadai.org.uk DQS Architecture http://www.ogsadai.org.uk DQP in action http://www.ogsadai.org.uk DQS: the future The GridDistributedQueryService – is an example of a higher level data integration service which utilises OGSA-DAI core services – Assumes that GDSF, GDQS Factory and client live in different containers – Really requires a well-defined meta-model for the physical schema of a database • Being partially addressed in DAIS WG – Shows how a GDS can be both client and service • Service hierarchy and composition DAIT (proposed follow-on to OGSA-DAI) would produce a robust reference implementation of the DQP components http://www.ogsadai.org.uk Projects using OGSA-DAI Industry: – FirstDIG: business process analysis (with First Transport Group) • OGSA-DAI with datamining Collaborative – Bridges: database integration over six geographically distributed genomics research sites (with IBM UK) • OGSA-DAI with DiscoveryLink – eDIKT: porting OGSA-DAI to other platforms • OGSA-DAI with performance – DEISA: linking Europe’s HPC centres • OGSA-DAI with distributed accounting – MS .Net Grid: porting OGSA-DAI to the .Net framework (with Microsoft Research UK) • OGSA-DAI with .Net http://www.ogsadai.org.uk ODD Genes OGSA-DAI used to query gene expression data resources at GTI and HGU – One data resource: low spatial resolution, high gene resolution – Other resource: high spatial resolution, low gene resolution – Query one database and use data to find correct data resource to run more detailed query and produce visualisation – Simple example of data integration at work Client GDS Query GTI GDS Query HGU EPCC Render http://www.ogsadai.org.uk Project Timeline today WS + GSI UK support ( > 100 downloads) XML + OGSA Prototypes for Early Adopters Design Documents & Demos for DAIS WG @ GGF5 XML + OGSA Prototype Available RDB + GT2 / OGSA Prototypes Available GGF6 WG Papers & Prototypes Early Adopters Workshop @ NeSC Ship Release 1 (Jan 15th 2003) OGSADAI Tutorial @ NeSC Release 1.5 (Feb 28th 2003) Tutorial @ GGF7 Release 2 Tutorial @ NeSC Release 2.5 Release 3 Feb ’02 May ’02 Jul ’02 Sep ’02 Dec ’02 TP4 Phase 1 Starts TP5 GT3 A1 Phase 2 Starts http://www.ogsadai.org.uk Feb ’03 May ’03 GT3 A3 GT3 A2 GT3 A4 GT3 Beta GT3 Final Sep ’03 A DAIT for the Future DAIT (Data Access and Integration Two) – – – – follow on project from OGSA-DAI, funded for two years continue to research, prototype and productise release every six months, R4 in December 2003 R4: • • • • • support for SQL Server and structured filesystems extended DBMS management functionality (e.g. archive) bulk load operations (where supported) support for DFDL file access triggers exposed through notification – R5 • Distributed Query Processing, Distributed Transactions • Virtualised views across databases http://www.ogsadai.org.uk Further information The OGSA-DAI Project Site: – http://www.ogsadai.org.uk The DAIS-WG site: – http://cs.man.ac.uk/grid-db OGSA-DAI Users Mailing list – users@ogsadai.org.uk – General discussion on grid data access and integration Formal support for OGSA-DAI releases – http://www.ogsadai.org.uk/support + support@ogsadai.org.uk OGSA-DAI training courses – http://www.ogsadai.org.uk/courses/ http://www.ogsadai.org.uk