OGSA DQP/DAI related work at OGSA-DQP/DAI related work at AIST

advertisement
OGSA-DQP/DAI
OGSA
DQP/DAI related work at
AIST
Isao Kojima,
Kojima Said Mirza Pahlevi
Pahlevi, Steven Lynden
Lynden,
Akiyoshi Matono, Masahiro Kimoto
Information
o at o Technology
ec o ogy Research
esea c Institute
st tute
National Institute of Advanced Science and Technology (AIST)
OGSA-DAI/DQP
OGSA
DAI/DQP based work
• OGSA-WebDB
OGSA WebDB
– Accessing Web databases though Grid services
– Optimises joins between multiple DBs
• OGSA-DAI-RDF
– OGSA-DAI activities for accessing
g RDF stores
• OGSA-DQP extensions
– Enable XML and integration with OGSA-WebDB
– Try out with more complex application; support
GeoGrid project (www.geogrid.org)
Research Areas
e‐Science
e
Science Applications
Applications
Semantic
G id/W b
Grid/Web
Infrastructures
Support RDF Databases
S-MDS
OGSA-DAI
OGSA
DAI
RDF
AIST‐SOA
SPARQL-based
Matchmaking
Service‐based
(OGSA)
Data Grid
Data
Integration
Bloom Filter
Query
Q er
Processing
MapReduce
Provide
Scalable
Database
Processing
Support Web Databases
GEO Grid
Service Registry
GEO Grid
Federated
SPARQL
OGSA-WebDB
OGSA-DQP with
XML& WebDB
Distributed
Di t ib t d
Databases
OGSA-DQP + OGSA-WebDB
OGS
OGSA-WebDB
eb
• Contributes
activities to OGSADAI that support
Web database
access and
integration
Client
OGSA-DAI
WDB SQL
activities
JDBC
• Uses a mediatorwrapper architecture
• Wrappers exist
for p
popular
p
webdatabases, e.g.
DBLP, Citeseer,
etc.
etc
Proxy
Database:
RDB View
MySQL
Internet
I t
t
Pubmed
PDB
FDA
WebDB System
y
Architecture
Select
* from rdblp where title like ‘%grid%’ and year
> 2000
Web
Grid/OGSA-DAI
Grid
Client
10.results
WebDB
activity
8 SQL
8.
Proxy
l ti
relations
rdblp
PDB
Wrapper
wrapper
Bibs
1. SQL
Data Service
MEDIATOR
Medical
Medical
web databases
7.insert Wrapper
results wrapper
9. resultsManagement
ge e
relations
user
2. Invoke Mediator
3.supported
conditions
SQL
Analyzer
5. Boolean query
6. results
DBLP
Bibliography
web
bd
databases
b
Key features of OGSA-WebDB
OGSA WebDB
• Can integrate data from multiple Web DB sources
declaratively
– Parser supports SQL
• Queries are optimised at runtime
– Statistical analysis of intermediate results can be
used to determine join order.
• Interface is consistent with OGSA-DAI
OGSA DAI relational
resources
– Web DB types
yp mapped
pp to SQL types
yp
– Each WebDB looks like a relational table
Extensions to OGSA-DQP
OGSA DQP
• Support XML and OGSA-WebDB
OGSA WebDB
OGSA-DQP
client
OGSA-DAI
OGSA-DAI
OGSA-DAI
OGSA
DAI
Analysis service
+ OGSA-WebDB
• Motivation
Relational DB
XML DB
Web DB
– Integration of heterogeneous data
– Declarative service orchestration
• Difficult to do without manipulating XML
DQP with OGSA-WebDB
OGSA WebDB
• OGSA
OGSA-WebDB
W bDB already
l d
behaves a lot like a
relational resource
• Different language
constraints and
response times required
changes to the
optimiser
• Replication/parallelism
i b
is
beneficial
fi i l ffor jjoins
i
with relational data to
improve response times
Example: select * from
employee, dblp where
dblp.author=employee.name
dblp.author
employee.name
Client
OGSA-DQP
OGSA-WebDB
OGSA-DAI
employee DB
XML processing
• The approach is to add a set of XML
functions for manipulating XML and
converting to/from relational data
• The functions are based on those from
various sources
sources, including
– The fifth revision of the SQL language (SQL 2003)
– DBMS vendors such as Oracle 10g
– Other projects, e.g. Tukwila data integration
system
• A mix-and-match from these sources is
employed as compliance with SQL in this
area is far from universal.
Example functions
1 Construction of XML from relational data
select XMLGen(
‘<person>
<name>{$p.name}</name>
<age>{$p.address}</age>
</person>’
) from person as p
1
name | age
John | 31
David | 47
<person>
<name>John</person>
<age>31</age>
</person>
<person>
<name>David</person>
<age>47</age>
</person>
2 Extracting values from XML and type
conversion
select avg(XMLIntValue(p,‘//age’))
from ExtractXML(people,’//person’)
as p
2
avg
39
Joining data
• IIntegration
t
ti off WebDB,
W bDB relational,
l ti
l XML
data
<authors>
<author>
<name>Cliff Jones</name>
<field>Dependability</field>
</author>
<author>
<name>…</name>
<field>…</field>
</author>
…
</authors>
AUTHOR
PUBLICATION
CITESEER
OpenXML
<authors>
<author>
<name>Cliff Jones</name>
<field>Dependability</field>
</author>
<author>
<name>…</name>
<field> </field>
<field>…</field>
</author>
…
</authors>
select xauthor.name, publication.title,
citeseer.url, xauthor.field
from publication, citeseer,
OpenXML(
p
author,
'//author',
'//name/text() name, //field/text() field‘
) as xauthor
where publication.title=citeseer.title
and xauthor.name=publication.author;
<author>
<name>Cliff Jones</name>
<field>Dependability</field>
</author>
name
title
url
field
…
…
…
…
…
…
…
…
<author>
<name>…</name>
<field>…</field>
</author>
Example application
• The application is based
on an e-Science project,
eXSys (e-Science
(e Science tools for
the analysis of complex
systems.
• Newcastle University
2003/2004
• This project uses graph
theory-based techniques
to predict target proteins
for in-silico drug design.
• It later matured into a drug
discovery company
(from http://www.etherapeutics.co.uk/)
• Protein interaction network = nodes (proteins) and edges (binding
capability, formation of complex).
(Static representation)
• Analysing topology allows
the extension of a protein's
properties to encompass its
role relative to the network in
which it exists.
• Properties
P
ti off protein
t i
interaction networks have
been studied: structurally
important implies functionally
important.
Process
• Data from separate relational databases are
integrated to form an un-directed graph
structure representing protein interactions
from a specific organism.
• An
A analysis
l i program iis executed
t d with
ith th
the
graph structure as input. Each protein is
ranked
k d according
di tto it
its significance
i ifi
tto th
the
topology of the interaction network.
• More information is retrieved about highly
ranked proteins using Web databases such
as UniProt.
Locally owned interaction / species tables
Topology analysis – example XML input/output
Convert back to a
relational view
OGSA-DQP
OGSA
DQP query
result: protein name and gene
extract values from XML data
invoke Web service
construct XML input to
analysis web service
interaction
select important proteins only
join condition with web database
species
Queryy Plan
ExtractXML
UniProt
XMLOccurs
Function
call
Analyser
XMLElement
XMLA
XMLAgg
XMLGen
The function call is
i l
implemented
t db
by calling
lli
an external Web service
interactions.Interaction_id = species.interaction_id
species
interactions
Wrapped using OGSADAI
Implementation
pe e a o
• B
Based
d on OGSA
OGSA-DQP
DQP 3
3.2.1
21
• At least Java 1.5 required
q
• Installation
– Coordinator
C di t
• Install DQP 3.2.1, extensions over-write the
DQP Jars
J
d l
deployed
d onto
t T
Tomcatt
– Evaluator
• Install extended evaluator from scratch
(changes to more than just the Java code were
made)
Configuration
Co
gu a o p
process
ocess
– XML data sources imported in a similar
manner to relational data sources.
<xs:complexType name="ImportedXMLDataSourceType">
<xs:sequence>
<xs:element name="URI" type="xs:anyURI" />
<xs:element name="CollectionName" type="xs:string" />
<xs:element name="ResourceID" type="xs:anyURI" />
<xs:element name="SchemaURI" type="xs:anyURI" minOccurs="0"/>
<xs:element
l
name="cardinality"
"
di li " type="xs:long"
"
l
" minOccurs="0"/>
i
"0"/
</xs:sequence>
</xs:complexType>
– Cardinality (optional) must be specified by the client if it is to
be used during optimisation, otherwise a default is used
Web DBs
<xs:complexType name="ImportedWebDataSourceType">
<xs:sequence>
<xs:element name="URI" type="xs:anyURI" minOccurs="1"/>
<xs:element name="ResourceID" type="xs:anyURI" />
<
<xs:element
l
t name=”SupportsOR”
”S
t OR” type=”xs:boolean”
t
”
b l
” minOccurs=”0”/>
i O
”0”/>
<xs:element name="table" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
yp
g use="required"
q
/
/>
<xs:attribute name="name" type="xs:string"
<xs:attribute name="cardinality" type="xs:long" use="required"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
•
•
“SupportsOR”: to speed up loop joins,
joins indicates whether the Web database allows multiple
bindings from input tuples to be grouped together by building a Web database query of the form
SELECT * from table where attribute = “value1” or “value2” …
Cardinality, if not supplied, is assumed to be very large to reflect the fact that most Web databases
are relatively large compared to more relational database tables
Required
q
modifications
•
Variable assignment
g
–
Fields in the FROM and SELECT clauses can now be assigned names. For example, the following
query is now supported:
select p.name
p name from person_person
person person as p where p
p.id<5;
id<5;
select name as surname from person_person;
•
Di t consequence off th
Direct
the XML extensions,
t
i
for
f example
l
select xml from ExtractXML(othello,’//actor’) as xml;
•
Limited support for nested queries in the FROM/WHERE clauses and functions
returning collections (as in the above example) introduced.
•
Limited support, i.e. one statically defined function is supported that allows
arbitrary XML to be passed as a parameter and arbitrary XML returned as a
result. JAX-WS looked at as a potential implementation mechanism for this.
Implemented functions
•
XMLForest( value1, value2, ..., valueN ) returns XML
Converts relational columns to XML and merges them together. Each XML element is
assigned a name equal to the column name. Parameters types can be any type that is not
XML. For example, given a tuple with two fields, name=john, age=19, the output of
XMLForest(name age) is <name>john</name><age>19</age>.
XMLForest(name,age)
<name>john</name><age>19</age>
•
XMLConcat( XML value1, XML value2, ..., XML valueN ) returns XML
Basically does the same thing as XMLForest except that the values are already XML
typed.
•
XMLGen( String template ) returns XML
Already seen
•
XMLAgg( collection<XML> values ) returns XML
Aggregate function that creates a single XML value from a collection of XML values.
Limited applicability as OGSA-DQP doesn’t support GROUP-BY
Implemented
p e e ed functions
u c o s
•
ExtractXML( String collectionName, String XPathStatement )
returns
t
collection<XML>
ll ti <XML>
Applies an XPath statement to a named XML collection residing in an XML database.
•
ExtractXMLValue( XML value, String XPathStatement ) returns
collection<XML>
Applies an XPath statement to an XML value.
•
ExtractXMLStringValue( XML value, String XPathStatement ) returns
collection<String>
Applies an XPath statement to an XML value,
value converts results to String values [Variants
exist for other types, e.g. ExtractXMLIntValue etc.]
•
XMLOccurs( XML value,
value String XPathStatement ) returns boolean
Applies an XPath statement to an XML value, returns true if the statement returns any
results, false otherwise. Can be used in the WHERE clause.
Status/future
S
a us/ u u e work
o
•
•
•
•
•
Respond to requirements from GeoGrid project
– One potential issue here is support for VOMS security
Code is not up to the standards required for OGSA-DAI
contributions (extensions and OGSA-WebDB)
OGSA-WebDB wrappers require constant updates
More work needed on optimisation strategies when invoking
functions
Processing XML data with elements named <row> or <col> may
fail due to the way that OGSA-DQP encodes results internally
using
i these
h
tags. (b
(bug))
OGS
OGSA-DAI-RDF
• Activities for accessing RDF data
resources e.g:
• DataSetManagement
• create, remove, list
• GraphManagement
• insert, delete triples
• SPARQLQuery
• forward a query to underlying DB
• Current work looking at implementing the WS-DAI-RDF
specifictions
Download