Grid-based Database Integration in AIST Isao KOJIMA Said Mirza Pahlevi Data Intensive Computing Team GTRC,AIST {kojima,mirza}@ni.aist.go.jp 1 National Institute of Advanced Industrial Science and Technology Overview Background for Database Integration Distributed and Heterogeneous Target Database Discovery Multi level application specific View (under Autonomous /Dynamic environment) Approach Bottom-up We just started - Current Results are not so large GT3.0/OGSA-DAI based tools Bring external web databases into Grid environment Query conversion service to integrate different XML schema Demo 2 National Institute of Advanced Industrial Science and Technology Background A.I.S.T.=National Research Institute Research Information Databases Online on the Web Bio/Life science Geo/Earth science Chemical/Material Patent/Bibliographic 3 National Institute of Advanced Industrial Science and Technology AIST & Tsukuba Area In AIST nearly 100 online DBs(urls) Tsukuba Science City 96 research institutes 52 public/governmental research labs. >880urls Number is not so large, But the problem is the same (heterogeneous, distributed) 4 National Institute of Advanced Industrial Science and Technology Current Status Each databases is separated/distributed Can share some information Chemical Structure, CAS Registry No, Latitude, Longitude Metadata Structure(Dublin Core,GILS,MARC,,) Integration/Interconnection is useful New research aspects/views Multi Integration View(Organization, Area, Research Domain) Most of them supports Form-query only Limitation of Web interfaces Need to combine with computing Data Mining Distributed Computation 5 National Institute of Advanced Industrial Science and Technology Target Integrate existing database/computing resources within Grid framework OGSA(OGSI),OGSA-DAI(S) framework Provide Database Discovery function Advanced Information Service Autonomous Resource Management Provide Application Specific Database View Schema Integration, Virtualization, Ontology Database Autonomy/Dynamism 6 National Institute of Advanced Industrial Science and Technology Bottom-Up Approach for Research & Deployment Workflow Our Target Advanced Our Database Discovery Application Specific View Target Database Autonomy Transaction Distributed Query Processing Practical Bottom-Up Approach Remote Access Web-Service(WS-XX) GGF-DAIS,OGSA-DAI Our Application Field (Scientific Data) Existing external web databases EC site, Search Portal etc,,, National Institute of Advanced Industrial Science and Technology 7 Result & –Summary Approach the Problems GridBring Proxy/Mediator Web Databases database into service OGSA to access web Environment databases. Howuniform to accessOGSA-DAI external webSQL by using OGSA/DAIS Provide access to framework external web databases (including join) Integrate within OGSA/DAIS framework Query Conversion Service based on XQuery and How to integrate (bottom-up) different database XML schema schema Provide integrated view of multi databases with different XML schema CrossSearch over multi databases (not join) 8 National Institute of Advanced Industrial Science and Technology Grid-based Integration Overview Our Target Our Application Field (Scientific Data) Workflow Transaction Feedback Distributed Query Processing Remote Access apply GGF-DAIS OGSA-DAI Query Conversion Grid Service to make CrossSearch Proxy/Mediator Grid Database Service To access Existing external WebDBs Existing external web databases EC site, Search Portal etc,,, 9 National Institute of Advanced Industrial Science and Technology 1:Grid Proxy/Mediator database service for external web databases OGSA Environment Mediator OGSA-DAI Compliant Database Access DbLP Web Databases Wrapper Db/Lp join siteseer Wrapper SQL join siteseer Delphin (login) Local /Remote Databases Wrapper Proxy Databases delphion 10 National Institute of Advanced Industrial Science and Technology Architecture Internet OGSA-DAI based System OGSA-based Grid Environment SQL Invoke Mediator SQL Grid Database Service Mediator Wrapper Site specific query Proxy relations SQL XML HTML Proxy Database Resource Management Grid Database Service SQL Management Relations Data Services グリッド外の グリッド外の Outside the Grid データサービス データサービス Wrapper Load &Exec e-Commerce site Search portal web dataases Resource DB (wrappers,URLs,data formats) 11 National Institute of Advanced Industrial Science and Technology Features Globus3.0/OGSA-DAI based. Compatible with OGSA-DAI RDB version. Platform independent (DBMS,Wrapper) Combined with Wrapper generator Tool (XFetch,WebL..) Management functions are Grid services Wrapper registration/deployment,External database definition Proxy Relation within the Grid Works as a cache for external webDB SQL condition is converted to webDB query statement (not SQL function) Handled as a table, not as a SQL function (it is possible) Simple query optimization utilizing proxy relations Approximate query => exact processing is done on proxy. 13 National Institute of Advanced Industrial Science and Technology 2:Query/Schema Conversion Service Provide Schema/Query Conversion Function between different XML schema User can define multiple resources with XML schema Databases for DB resources User can define relationship/conversion between schema (simple kind of Ontology) XQuery-XQuery conversion service based on this info Conversion Service XQuery Converted XQueries Database Resources XML Schema Management Service Resource databases 14 National Institute of Advanced Industrial Science and Technology Application Prototype Geographic Metadata Query System Multiple XML databases Dublin Core Dublin Core + Longitude/Latitude GILS Application Specific Converted XQueries DC XQuery Distributed Metadata Query System DC+ GILS JMP Cuurent version is not OGSI based (in deveopment) 15 National Institute of Advanced Industrial Science and Technology Summary & Directions 2 prototype services OGSA-DAI compliant grid service to bring web databases into the grid Lesson:Need of dynamic scheduling for uncertainness of external web Schema/Query conversion service to handle multiple XML schema Lesson: Need of concise set to handle ontology Directions Advanced Grid Database Discovery Service Active/Autonomous Functions for DBMS 16 National Institute of Advanced Industrial Science and Technology