OGSA-DAI Data Access and Integration for the Grid Neil Chue Hong N.ChueHong@epcc.ed.ac.uk http://www.ogsadai.org.uk Overview Motivation Goals Partners Features Projects Further information Overview and demo of FirstDIG/INWA http://www.ogsadai.org.uk 2 OGSA-DAI Motivation Entering an age of data – Data Explosion • CERN: LHC will generate 1GB/s = 10PB/y • VLBA (NRAO) generates 1GB/s today • Pixar generate 100 TB/Movie – Storage getting cheaper Data stored in many different ways – Data resources • Relational databases • XML databases • Flat files Need ways to facilitate – Data discovery – Data access – Data integration Empower e-Business and e-Science – The Grid is a vehicle for achieving this http://www.ogsadai.org.uk 3 Goals for OGSA-DAI Aim to deliver application mechanisms that: – Meet the data requirements of Grid applications • Functionally, performance and reliability • Reduce development cost of data centric Grid applications • Provide consistent interfaces to data resources – Acceptable and supportable by database providers • Trustable, imposed demand is acceptable, etc. • Provide a standard framework that satisfies standard requirements A base for developing higher-level services – – – – Data federation Distributed query processing Data mining Data visualisation http://www.ogsadai.org.uk 4 Integration Scenario A patient moves hospital Data A Data B Amalgamated patient record Data C DB2 Oracle A: (PID, name, address, DOB) B: (PID, first_contact) CSV file C: (PID, first_name, last_name, address, first_contact, DOB) http://www.ogsadai.org.uk 5 Why OGSA-DAI? Why use OGSA-DAI over JDBC? – Language independence at the client end • Do not need to use Java – Platform independence • Do not have to worry about connection technology and drivers – Can handle XML and file resources – Can embed additional functionality at the service end • Transformations, Compression, Third party delivery • Avoiding unnecessary data movement – Provision of Metadata is powerful – Usefulness of the Registry for service discovery • Dynamic service binding process – The quickest way to make data accessible on the Grid • Installation and configuration of OGSA-DAI is fast and straightforward http://www.ogsadai.org.uk 6 Project Partners Powered by …. Funded by the Grid Core Programme OGSA-DAI £3 million, 18 months, from Feb 2002 Three major releases, three interim releases DAIT (DAI-Two) Keep the OGSA-DAI brand name £1.5 million, 24 months, from Oct 2003 Four major releases GGF DAIS WG Strong involvement. Standardise the interfaces OGSA-DAI to be a reference implementation http://www.ogsadai.org.uk 7 Core features An extensible framework for building applications – Supports relational, xml and some files • MySQL, Oracle, DB2, SQL Server, Postgres, XIndice, CSV, EMBL – Supports various delivery options • SOAP, FTP, GridFTP, HTTP, files, email, inter-service – Supports various transforms • XSLT, ZIP, GZip – Supports message level security using X509 certificates – Client Toolkit library for application developers – Comprehensive documentation and tutorials Third production release is coming in November – OGSI/GT3 based – Also previews of WS-I and WS-RF/GT4 releases http://www.ogsadai.org.uk 8 Activities are the drivers Express a task to be performed by a GDS Three broad classes of activities: – Statement – Transformations – Delivery Extensible: – Easy to add new functionality – Does not require modification to the service interface – Extension operate within the OGSA-DAI framework Functionality: – Implemented at the service – Work where the data is (do not require to move data back) http://www.ogsadai.org.uk 9 Client Toolkit Why? Nobody wants to write XML! A programming API which makes writing applications easier – Now: Java – Next: Perl, C, C#?, ML!? // Create a query SQLQuery query = new SQLQuery(SQLQueryString); ActivityRequest request = new ActivityRequest(); request.addActivity(query); // Perform the query Response response = gds.perform(request); // Display the result ResultSet rs = query.getResultSet(); displayResultSet(rs, 1); http://www.ogsadai.org.uk 11 e-Digital MammOgraphy National Database Built a prototype of a national database of mammographic images in support of the UK Breast screening programme Employ Grid technologies to facilitate this process http://www.ogsadai.org.uk 13 CHU Data Training Load App Core & Training API KCL Data Training Load App Data Training Load App Core & Training API Core Services Core Services OGSA-DAI OGSA-DAI UED UCL Core & Training API Core Services OGSA-DAI Data Training Load App Core & Training API Core Services Content Manager DB2 Content Manager DB2 Core Training API API Training Services OGSA-DAI OGSA-DAI DB2 Federation DB2 Training Application OGSA-DAI Content Manager DB2 Content Manager http://www.ogsadai.org.uk Database Files 14 GeneGrid Grid Based Framework for Bioinformatics – Virtual Bioinformatics Laboratory – – – – Integration of Existing Technologies & Data Sets Gene Study in Silico Develop Specialist Data Sets Grid Services for Commercial or 3rd Party Use Data resources as XML collections (XIndice), flat files and relational databases (MySQL) – OGSA-DAI plus custom extensions – Beta testers for file based activities http://www.qub.ac.uk/escience/projects/genegrid/ http://www.ogsadai.org.uk 16 Distributed Query Processing Queries mapped to algebraic expressions for evaluation Parallelism represented by partitioning queries 3,4 op_call (Blast) exchange hash_join (proteinId) – Use exchange operators Prototype available from: – http://www.ogsadai.org.uk reduce reduce exchange reduce 1 table_scan (protein) http://www.ogsadai.org.uk 2 table_scan termID=S92 (proteinTerm) 18 GridMiner Test application area: medical – traumatic brain injury treatment – Predicting the outcome of seriously ill patients – analytical part focuses on data mining and On-Line Analytical Processing (OLAP) Target: – provide tools to discover and access relevant knowledge and information from different distributed and heterogeneous data sources – building on and extending OGSA-DAI http://www.gridminer.org/ http://www.ogsadai.org.uk 19 GridMiner Scenario Heterogeneities: – Name in A is „First Last“ (as the target format) – Name in C has to be combined Distribution: – 3 data sources http://www.ogsadai.org.uk 20 Future work Architecture review – better concurrency model – better AAA framework – better definition of extensibility points • security, activities, dynamic configuration, mobile code,… Improved support for – – – – – WS Security profiles Stored procedures Data transport XQuery Database specific datatypes and SQL Additionally – JDBC and ODBC driver for OGSA-DAI – Contribution process http://www.ogsadai.org.uk 21 Further information The OGSA-DAI Project Site: – http://www.ogsadai.org.uk The DAIS-WG site: – http://forge.gridforum.org/projects/dais-wg/ OGSA-DAI Users Mailing list – users@ogsadai.org.uk – General discussion on grid DAI matters Formal support for OGSA-DAI releases – http://www.ogsadai.org.uk/support – support@ogsadai.org.uk OGSA-DAI training courses http://www.ogsadai.org.uk 22 Project Membership Malcolm Kostas Norman Paul Principal Investigators Research Team Programme Management Board Chair Neil Technical Review Board Chair Charaka Mike Ally Mario Project Manager Amy Tom EPCC Team Andy Simon Dave IBM Development Team http://www.ogsadai.org.uk Neil Patrick IBM Dissemination Team 23 The End Questions? http://www.ogsadai.org.uk 24