Grid-based Database Integration in AIST Isao KOJIMA Said Mirza Pahlevi Data Intensive Computing Team GTRC,AIST {kojima,mirza}@ni.aist.go.jp 1 National Institute of Advanced Industrial Science and Technology Overview Background for Database Integration Distributed and Heterogeneous Target Database Discovery Multi level application specific View (under Autonomous /Dynamic environment) Approach Bottom-up We just started - Current Results are not so large GT3.0/OGSA-DAI based tools Bring external web databases into Grid environment Query conversion service to integrate different XML schema Demo 2 National Institute of Advanced Industrial Science and Technology Background A.I.S.T.=National Research Institute Research Information Databases Online on the Web Bio/Life science Geo/Earth science Chemical/Material Patent/Bibliographic 3 National Institute of Advanced Industrial Science and Technology AIST & Tsukuba Area In AIST nearly 100 online DBs(urls) Tsukuba Science City 96 research institutes 52 public/governmental research labs. >880urls Number is not so large, But the problem is the same (heterogeneous, distributed) 4 National Institute of Advanced Industrial Science and Technology Current Status Each databases is separated/distributed Can share some information Chemical Structure, CAS Registry No, Latitude, Longitude Metadata Structure(Dublin Core,GILS,MARC,,) Integration/Interconnection is useful New research aspects/views Multi Integration View(Organization, Area, Research Domain) Most of them supports Form-query only Limitation of Web interfaces Need to combine with computing Data Mining Distributed Computation 5 National Institute of Advanced Industrial Science and Technology Target Integrate existing database/computing resources within Grid framework OGSA(OGSI),OGSA-DAI(S) framework Provide Database Discovery function Advanced Information Service Autonomous Resource Management Provide Application Specific Database View Schema Integration, Virtualization, Ontology Database Autonomy/Dynamism 6 National Institute of Advanced Industrial Science and Technology Bottom-Up Approach for Research & Deployment Workflow Our Target Advanced Our Database Discovery Application Specific View Target Database Autonomy Transaction Distributed Query Processing Practical Bottom-Up Approach Remote Access Web-Service(WS-XX) GGF-DAIS,OGSA-DAI Our Application Field (Scientific Data) Existing external web databases EC site, Search Portal etc,,, National Institute of Advanced Industrial Science and Technology 7 Result & –Summary Approach the Problems GridBring Proxy/Mediator Web Databases database into service OGSA to access web Environment databases. Howuniform to accessOGSA external webSQL by using OGSA/DAIS -DAI access to Provide framework external web databases (including join) Integrate within OGSA/DAIS framework Query Conversion Service based on XQuery and How to integrate (bottom-up) different database XML schema schema Provide integrated view of multi databases with different XML schema CrossSearch over multi databases (not join) 8 National Institute of Advanced Industrial Science and Technology Grid-based Integration Overview Our Target Our Application Field (Scientific Data) Workflow Transaction Feedback Distributed Query Processing Remote Access apply GGF-DAIS OGSA-DAI Query Conversion Grid Service to make CrossSearch Proxy/Mediator Grid Database Service To access Existing external WebDBs Existing external web databases EC site, Search Portal etc,,, National Institute of Advanced Industrial Science and Technology 9 1:Grid Proxy/Mediator database service for external web databases OGSA Environment OGSA-DAI Compliant Database Access DbLP Web Databases Wrapper Db/Lp join siteseer SQL Delphin (login) Local /Remote Databases Mediator Proxy Databases Wrapper siteseer join Wrapper delphion 10 National Institute of Advanced Industrial Science and Technology Architecture Internet OGSA-DAI based System OGSA-based Grid Environment SQL Grid Database Service Invoke Mediator SQL Mediator Wrapper Site specific query Proxy relations SQL XML HTML Proxy Database Resource Management Grid Database Service SQL Management Relations Data Services グリッド外の グリッド外の Outside the Grid データサービス データサービス Wrapper Load &Exec e-Commerce site Search portal web dataases Resource DB (wrappers,URLs,data formats) 11 National Institute of Advanced Industrial Science and Technology Implementation GDSF_WebDB WebDB.jar 0.create 1.perform Client SQLQueryStatementActivity GDS 2. invoke(SQL_statement, username, databasename) ProxyDBi 10. SQL statement ProxyDB 9. SQL INSERT Proxy relations Mediator SQL processing 3. proxy relation & column names 9 Database connection GDS Management relations 4 5. mapping information Internet connection wrappers Dynamic loading … 8. XML format 4. authentication & field mapping … 7. search result 6. Boolean query web databases 12 National Institute of Advanced Industrial Science and Technology Features Globus3.0/OGSA-DAI based. Compatible with OGSA-DAI RDB version. Platform independent (DBMS,Wrapper) Combined with Wrapper generator Tool (XFetch,WebL..) Management functions are Grid services Wrapper registration/deployment,External database definition Proxy Relation within the Grid Works as a cache for external webDB SQL condition is converted to webDB query statement (not SQL function) Handled as a table, not as a SQL function (it is possible) Simple query optimization utilizing proxy relations Approximate query => exact processing is done on proxy. 13 National Institute of Advanced Industrial Science and Technology 2:Query/Schema Conversion Service Provide Schema/Query Conversion Function between different XML schema User can define multiple resources with XML schema Databases for DB resources User can define relationship/conversion between schema (simple kind of Ontology) XQuery-XQuery conversion service based on this info Conversion Service XQuery Converted a, m e XQueries Sch n, o i t a Loc Database Resources XML Schema Management Service Resource databases National Institute of Advanced Industrial Science and Technology 14 Application Prototype Geographic Metadata Query System Multiple XML databases Dublin Core Dublin Core + Longitude/Latitude GILS Application Specific Converted XQueries DC XQuery Distributed Metadata Query System DC+ GILS JMP Cuurent version is not OGSI based (in deveopment) 15 National Institute of Advanced Industrial Science and Technology Summary & Directions 2 prototype services OGSA-DAI compliant grid service to bring web databases into the grid Lesson:Need of dynamic scheduling for uncertainness of external web Schema/Query conversion service to handle multiple XML schema Lesson: Need of concise set to handle ontology Directions Advanced Grid Database Discovery Service Active/Autonomous Functions for DBMS 16 National Institute of Advanced Industrial Science and Technology Demo Video Grid Proxy Database Demo OGSA-DAI compliant Extensible wrappers Perform joins for multiple sites Query Conversion service Manages Relationship between multiple XML shema Geographic Metadata Query System Based on XQuery CrossSearch for multiple databases 17 National Institute of Advanced Industrial Science and Technology Back Slides 18 National Institute of Advanced Industrial Science and Technology Discussion(1) Share information with other grid DB R&D activities. Data Integration Problem for Scientific databases Our interest includes: functions to provide application-specific integrated view for different information grid resources. Ontology? Software & Specification : Need of Roadmap & Timelines: GGF/DAIS, OGSA-DAI High-Level Functions for Database integration projects.(Transactions,Views,,,) 19 National Institute of Advanced Industrial Science and Technology Discussion(2) Research & Development Direction Current Our Interest/Direction include: Advanced resource discovery services How to find appropriate databases from the grid How to combine/convert them easily for specific application Active/Autonomous functions for DBMSs Active Databases re-evaluated Example: When failure occurred (Event) Evaluate the resource status (Condition) Decide whether to wait another 10 sec or go alternative resource (Action) 20 National Institute of Advanced Industrial Science and Technology Comparison with DB2 Information Integrator Policy Our system is not an information integrator Focus is to bring external resource into the grid easily Integration is done within DAIS framework No need for oracle/DB2 integration Schema/Query conversion is done by our other system. Aim Our system is to support web databases Form-like website SQL condition clause is converted to AND/OR form query. Not external function embedded in SQL statement Could not provide (general purpose) predefined wrapper. How to construct wrapper in easy manner is important 21 National Institute of Advanced Industrial Science and Technology Comparison with DB2 II(cont’d) Architecture Our system is not broker approach Mediator/Wrapper Each SQL request is converted to ‘approximate’ request for the website Proxy database Provide exact answer for each SQL Provide cash database on the grid. Target is web resources/web services Network transformation is a dominant factor Simple query optimization can work well 22 National Institute of Advanced Industrial Science and Technology XQuery-XQuery conversion Currently Supports Simple types 1. Name Author= 著者(japanese) 2. Data Format Conversion 2002/01/13 <-> 01/13/02 <-> 13th,Jan,2002 3. Combination/Abstraction Name = Last | Middle | First = Last | First Extensible (XSLT like attachment) 23 National Institute of Advanced Industrial Science and Technology