Grid - based Database Integration in AIST

advertisement
Grid-based Database Integration in
AIST
Isao KOJIMA
Said Mirza Pahlevi
Data Intensive Computing Team
GTRC,AIST
{kojima,mirza}@ni.aist.go.jp
1
National Institute of Advanced Industrial Science and Technology
Overview
Background for Database Integration
„
Distributed and Heterogeneous
Target
„
„
Database Discovery
Multi level application specific View
(under Autonomous /Dynamic environment)
Approach
„
Bottom-up
We just started - Current Results are not so large
„
GT3.0/OGSA-DAI based tools
Bring external web databases into Grid environment
Query conversion service to integrate different XML schema
Demo
2
National Institute of Advanced Industrial Science and Technology
Background
A.I.S.T.=National Research Institute
Research Information Databases
„
Online on the Web
Bio/Life science
Geo/Earth science
Chemical/Material
Patent/Bibliographic
3
National Institute of Advanced Industrial Science and Technology
AIST & Tsukuba Area
„
In AIST
nearly 100 online DBs(urls)
Tsukuba Science City
„
„
96 research institutes
52 public/governmental research labs.
>880urls
Number is not so large,
But the problem is the same
(heterogeneous, distributed)
4
National Institute of Advanced Industrial Science and Technology
Current Status
Each databases is separated/distributed
„
Can share some information
Chemical Structure, CAS Registry No,
Latitude, Longitude
Metadata Structure(Dublin Core,GILS,MARC,,)
„
Integration/Interconnection is useful
New research aspects/views
Multi Integration View(Organization, Area, Research Domain)
Most of them supports Form-query only
Limitation of Web interfaces
„
Need to combine with computing
Data Mining
Distributed Computation
5
National Institute of Advanced Industrial Science and Technology
Target
Integrate existing database/computing resources
within Grid framework
OGSA(OGSI),OGSA-DAI(S) framework
Provide Database Discovery function
„
„
Advanced Information Service
Autonomous Resource Management
Provide Application Specific Database View
„
Schema Integration, Virtualization, Ontology
Database Autonomy/Dynamism
6
National Institute of Advanced Industrial Science and Technology
Bottom-Up Approach
for Research & Deployment
Workflow
Our Target
Advanced
Our Database Discovery
Application
Specific View
Target
Database Autonomy
Transaction
Distributed
Query Processing
Practical
Bottom-Up
Approach
Remote Access
Web-Service(WS-XX)
GGF-DAIS,OGSA-DAI
Our Application Field
(Scientific Data)
Existing external web databases
EC site, Search Portal etc,,,
National Institute of Advanced Industrial Science and Technology
7
Result &
–Summary
Approach
the Problems
GridBring
Proxy/Mediator
Web Databases
database
into service
OGSA to access
web Environment
databases.
Howuniform
to accessOGSA
external
webSQL
by using
OGSA/DAIS
-DAI
access
to
„ Provide
framework
external
web databases (including join)
Integrate within OGSA/DAIS framework
Query Conversion Service based on XQuery and
How to integrate (bottom-up) different database
XML schema
schema
„ Provide integrated view of multi databases
with different XML schema
„ CrossSearch over multi databases (not join)
8
National Institute of Advanced Industrial Science and Technology
Grid-based Integration
Overview
Our
Target
Our Application Field
(Scientific Data)
Workflow
Transaction
Feedback
Distributed
Query Processing
Remote Access apply
GGF-DAIS
OGSA-DAI
Query Conversion
Grid Service
to make CrossSearch
Proxy/Mediator
Grid Database Service
To access
Existing external
WebDBs
Existing external web databases
EC site, Search Portal etc,,,
National Institute of Advanced Industrial Science and Technology
9
1:Grid Proxy/Mediator database service for external web databases
OGSA
Environment
OGSA-DAI
Compliant
Database
Access
DbLP
Web Databases
Wrapper
Db/Lp
join
siteseer
SQL
Delphin
(login)
Local /Remote
Databases
Mediator
Proxy
Databases
Wrapper
siteseer
join
Wrapper
delphion
10
National Institute of Advanced Industrial Science and Technology
Architecture
Internet
OGSA-DAI based System
OGSA-based
Grid Environment
SQL
Grid
Database
Service
Invoke Mediator
SQL
Mediator
Wrapper
Site specific
query
Proxy relations
SQL
XML
HTML
Proxy Database
Resource
Management
Grid Database Service
SQL
Management
Relations
Data
Services
グリッド外の
グリッド外の
Outside
the Grid
データサービス
データサービス
Wrapper
Load
&Exec
e-Commerce site
Search portal
web dataases
Resource DB
(wrappers,URLs,data formats)
11
National Institute of Advanced Industrial Science and Technology
Implementation
GDSF_WebDB
WebDB.jar
0.create
1.perform
Client
SQLQueryStatementActivity
GDS
2. invoke(SQL_statement,
username, databasename)
ProxyDBi
10. SQL statement
ProxyDB
9. SQL
INSERT
Proxy relations
Mediator
SQL
processing
3. proxy relation &
column names
9
Database
connection
GDS
Management relations
4
5. mapping
information
Internet
connection
wrappers
Dynamic
loading
…
8. XML format
4. authentication & field
mapping
…
7. search
result
6. Boolean query
web databases
12
National Institute of Advanced Industrial Science and Technology
Features
Globus3.0/OGSA-DAI based.
„
Compatible with OGSA-DAI RDB version.
Platform independent (DBMS,Wrapper)
„
Combined with Wrapper generator Tool (XFetch,WebL..)
Management functions are Grid services
Wrapper registration/deployment,External database
definition
Proxy Relation within the Grid
„
Works as a cache for external webDB
SQL condition is converted to webDB query
statement (not SQL function)
„
„
„
Handled as a table, not as a SQL function (it is possible)
Simple query optimization utilizing proxy relations
Approximate query => exact processing is done on proxy.
13
National Institute of Advanced Industrial Science and Technology
2:Query/Schema Conversion Service
Provide Schema/Query Conversion Function between different
XML schema
„ User can define multiple resources with XML schema
„ Databases for DB resources
„ User can define relationship/conversion between schema
(simple kind of Ontology)
„ XQuery-XQuery conversion service based on this info
Conversion Service
XQuery Converted
a,
m
e
XQueries Sch
n,
o
i
t
a
Loc
Database
Resources
XML Schema
Management
Service
Resource databases
National Institute of Advanced Industrial Science and Technology
14
Application Prototype
Geographic Metadata Query System
„
Multiple XML databases
Dublin Core
Dublin Core + Longitude/Latitude
GILS
Application Specific
Converted
XQueries
DC
XQuery
Distributed
Metadata
Query
System
DC+
GILS
JMP
Cuurent version is not OGSI based (in deveopment)
15
National Institute of Advanced Industrial Science and Technology
Summary & Directions
2 prototype services
„
„
OGSA-DAI compliant grid service to bring web databases
into the grid
Lesson:Need of dynamic scheduling for uncertainness of
external web
Schema/Query conversion service to handle multiple XML
schema
Lesson: Need of concise set to handle ontology
Directions
„
„
Advanced Grid Database Discovery Service
Active/Autonomous Functions for DBMS
16
National Institute of Advanced Industrial Science and Technology
Demo Video
Grid Proxy Database Demo
„
„
„
OGSA-DAI compliant
Extensible wrappers
Perform joins for multiple sites
Query Conversion service
„
Manages Relationship between multiple XML shema
Geographic Metadata Query System
„
„
Based on XQuery
CrossSearch for multiple databases
17
National Institute of Advanced Industrial Science and Technology
Back Slides
18
National Institute of Advanced Industrial Science and Technology
Discussion(1)
Share information with other grid DB R&D
activities.
Data Integration Problem for Scientific databases
„
Our interest includes:
„
„
„
functions to provide application-specific integrated view
for different information grid resources.
Ontology?
Software & Specification :
Need of Roadmap & Timelines:
„ GGF/DAIS, OGSA-DAI
„
High-Level Functions for Database integration
projects.(Transactions,Views,,,)
19
National Institute of Advanced Industrial Science and Technology
Discussion(2)
Research & Development Direction
Current Our Interest/Direction include:
„
„
Advanced resource discovery services
„ How to find appropriate databases from the grid
„ How to combine/convert them easily for specific
application
Active/Autonomous functions for DBMSs
Active Databases re-evaluated
Example: When failure occurred (Event)
Evaluate the resource status (Condition)
Decide whether to wait another 10 sec
or go alternative resource (Action)
20
National Institute of Advanced Industrial Science and Technology
Comparison with
DB2 Information Integrator
Policy
„
Our system is not an information integrator
Focus is to bring external resource into the grid easily
Integration is done within DAIS framework
„
„
No need for oracle/DB2 integration
Schema/Query conversion is done by our other system.
Aim
„
Our system is to support web databases
Form-like website
„
„
SQL condition clause is converted to AND/OR form query.
Not external function embedded in SQL statement
Could not provide (general purpose) predefined wrapper.
How to construct wrapper in easy manner is important
21
National Institute of Advanced Industrial Science and Technology
Comparison with DB2 II(cont’d)
Architecture
„
Our system is not broker approach
Mediator/Wrapper
„
Each SQL request is converted to ‘approximate’ request
for the website
Proxy database
„
„
„
Provide exact answer for each SQL
Provide cash database on the grid.
Target is web resources/web services
Network transformation is a dominant factor
„
Simple query optimization can work well
22
National Institute of Advanced Industrial Science and Technology
XQuery-XQuery conversion
Currently Supports Simple types
1.
Name
Author= 著者(japanese)
2.
Data Format Conversion
2002/01/13 <-> 01/13/02 <-> 13th,Jan,2002
3.
Combination/Abstraction
Name = Last | Middle | First = Last | First
Extensible (XSLT like attachment)
23
National Institute of Advanced Industrial Science and Technology
Download