OGSA-DAI data access and integration NERC GridGIS workshop eSI, 1 February 2006 Neil Chue Hong Project Manager, EPCC N.ChueHong@epcc.ed.ac.uk +44 131 650 5957 Overview • The Data Deluge – challenges of increasing data availability – benefits of bringing data together • OGSA-DAI – overview – use as a data integration base layer NERC GridGIS workshop - 1 February 2006 2 Data Services: challenges to management • Scale – Many sites, large collections, many uses • Longevity – Research requirements outlive technical decisions • Diversity – No “one size fits all” solutions will work – Primary Data, Data Products, Meta Data, Administrative data, … • Many Data Resources – Independently owned & managed – No common goals – No common design – Work hard for agreements on foundation types and ontologies – Autonomous decisions change data, structure, policy, … – Geographically distributed • and I haven’t even mentioned security yet! NERC GridGIS workshop - 1 February 2006 6 What is a data service? • An interface to a stored collection of data – e.g. Google and Amazon – web services • But the data could be: – – – – – replicated shared federated virtual incomplete • Don’t care about the underlying representation – do care about the information it represents • Adding a service layer to existing data sources can improve composability NERC GridGIS workshop - 1 February 2006 8 Use Cases for Data Services • Data Filtering: – Single source producing large amounts of data distributed to many sites downstream • Data Discovery: – many sources, many query entry points in a linked system • Data Translation: – source to sink, conversion of data model / structure • Data Federation: – many sources, linked to provide view as a single source • Data Replication – full or partial copies to improve throughput • Data Integration (model aggregation) – e.g. integration of time variant data, streams, files • Data Integration (knowledge expansion) – forming links between databases to increase knowledge NERC GridGIS workshop - 1 February 2006 10 OGSA-DAI In One Slide • An extensible framework for data access and integration. • Expose heterogeneous data resources to a grid through web services. • Interact with data resources: – Queries and updates. – Data transformation / compression – Data delivery. • Customise for your project using – Additional Activities – Client Toolkit APIs – Data Resource handlers • A base for higher-level services – federation, mining, visualisation,… NERC GridGIS workshop - 1 February 2006 13 The OGSA-DAI Framework Application Client Toolkit OGSA-DAI service Engine SQLQuery readFile XPath XSLT GZip GridFTP Activities JDBC XMLDB File Data Resources SQL MySQL DB2 Server XIndice SWISS PROT Databases NERC GridGIS workshop - 1 February 2006 17 Intermediary • Simple intermediary Request & Response OGSA-DAI DR messages Client Request & Response OGSA-DAI DR messages Data Resource Client Data Resource – potential to accelerate development, logging, or filtering • Persistent intermediary Request & Response OGSA-DAI DR messages OGSA-DAI Private Store Client Data Resource – e.g. to allow efficient local indexing NERC GridGIS workshop - 1 February 2006 18 DR messages po rt & Da ta Tr an s sa ge s e DR1 consumer Client Request & Response DR OGSA-DAI es m s ge sa DR messages DR m es e qu Re sa ge s Data Resource DR messages Data Resource po ns e OGSA-DAI m es DR2 Data Resource po ns ,R es R es st & Re qu e st DR messages Data Resource R eq ue OGSA-DAI Data delivery Client Data Resource DR Request & Response s ge sa Data Resource DR OGSA-DAI es m Data Resource • Allowing composition and decentralisation Data Resource Redirector, Coordinator, Network DR3 es ,R st DR messages DR m es sa ge s Data Resource s ge sa Data Resource DR OGSA-DAI es m Data Resource Data Resource Data Resource DR2 rt po sa ge s s an m es Tr DR ta Da DR messages & OGSA-DAI DR1 e Request & Response s ge sa Data Resource ns po Client DR es m DR3 NERC GridGIS workshop - 1 February 2006 19 Extensibility Example OGSA-DAI service Engine SQLQuery Multiple JDBC SQL GDS SQL SQL JDBC JDBC MySQL SQL SQL JDBC JDBC NERC GridGIS workshop - 1 February 2006 20 Map Retrieval: Current browser Internet EDINA OGC Service GIS Oracle NERC GridGIS workshop - 1 February 2006 21 Map Retrieval: Grid Prototype Basic client to demonstrate proof of concept EDINA SO-OGC Client OGC OGSA-DAI 1 GIS NERC GridGIS workshop - 1 February 2006 Oracle 22 Map Retrieval: Security • Exploit NGS infrastructure to provide secure access layer EDINA NGS Authentication Allowed users dn SO-OGC Portlet OGC ODS 1 GIS NERC GridGIS workshop - 1 February 2006 Oracle 23 Map Retrieval: Integration • Exploit OGSA-DAI extensibility to add e.g. overlay JDBC NGS Authentication Oracle Census ODS 1 SQL/XML SO-OGC Portlet OGC ODS 2 GIS Oracle SO-OGC ODS 3 NERC GridGIS workshop - 1 February 2006 Application data 24 OGSA-DAI / EDINA prototyping work • Stage 1: Using existing OGSA-DAI technology • Stage 2: Extending OGSA-DAI GIS Client InputURL Parameters Image/XML File OGSA-DAI service DeliverFrom GIS Activities URL HTTP Data Resource HTTP Request WMS HTTP Response Server NERC GridGIS workshop - 1 February 2006 25 Distributed Query Processing • Higher level services building on OGSA-DAI – specialised metadata extraction 3,4 • Execute queries in parallel over multiple op_call (Blast) exchange data resources • Queries mapped to algebraic hash_join (proteinId) expressions for evaluation • Parallelism represented by partitioning reduce queries –Use exchange operators • Equality based joins in current release – supported types: long, integer, string, double and float reduce exchange reduce 1 table_scan (protein) NERC GridGIS workshop - 1 February 2006 2 table_scan termID=S92 (proteinTerm) 28 DQP architecture Query SQL & OQL OGSA-DAI activity Co-ordinator Evaluator Evaluator Evaluator WS-I only Using client toolkit OGSA -DAI OGSA -DAI OGSA -DAI OGSA -DAI All interfaces that are supported by toolkit NERC GridGIS workshop - 1 February 2006 29 Contributing to OGSA-DAI • Additional functionality: – – – – Provide activities which implement specific functionality Provide extra client functionality Provide different security mechanisms Provide higher level components and applications • Different levels of contributions – Based on OGSA-DAI? – Works with OGSA-DAI? – Part of OGSA-DAI? NERC GridGIS workshop - 1 February 2006 37 In the near future • A new version of the OGSA-DAI Engine – should look mostly the same externally – better support for concurrency, sessions and monitoring • Implementing new versions of specifications – DAIS Specifications • Key things that we will be addressing: – – – – – Performance A Security Model which can be applied across platforms Full Transactions framework, distributed transactions More data integration facilities Better abstraction over DBMS variation • Application centric queries – collaborating with other projects • Research projects looking at: – schema mapping – extended data resources NERC GridGIS workshop - 1 February 2006 38 Associated Meetings and Workshops • DIALOGUE Workshops (http://www.datagrids.org) – Data Integration Applications: Linking Organisations to Gain Understanding and Experience – Bringing together Data Integration middleware and application providers with users – Next one at NeSC: 9-10th February 2006 – http://www.nesc.ac.uk/esi/events/636/ • Next Generation Distributed Data Management (HPDC15, Paris) – http://www.isi.edu/~annc/distributedDataWorkshop.html • Data Management on Grids (VLDB’06, Seoul) NERC GridGIS workshop - 1 February 2006 39 Conclusions • The benefits of trying to integrate data are hindered by challenges such as heterogeneity, scale and distribution • A common data service layer should make data integration easier • OGSA-DAI provides an extensible, data service based framework which makes it easier to implement data integration • GIS data is amenable to integration using data services NERC GridGIS workshop - 1 February 2006 40 Further information • The OGSA-DAI Project Site: – http://www.ogsadai.org.uk • The DAIS-WG site: – http://forge.gridforum.org/projects/dais-wg/ • OGSA-DAI Users Mailing list – users@ogsadai.org.uk – General discussion on grid DAI matters • Formal support for OGSA-DAI releases – http://bugs.ogsadai.org.uk/ • OGSA-DAI training courses NERC GridGIS workshop - 1 February 2006 41