The GridMiner project Alexander Wöhrer University of Vienna Institute for Software Science

advertisement
The GridMiner project
Alexander Wöhrer
University of Vienna
Institute for Software Science
email: woehrer@par.univie.ac.at
www.gridminer.org
… Intelligent Grid Solutions
Outline
 GridMiner overview



Members, hosts
KDD process
Work packages
 OGSA-DAI introduction
 Grid Data Mediation Service






Motivation: data integration scenarios
Requirements => Principles
Concepts => Architecture
Example
Current prototype
Future work
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
2
GridMiner Overview I

Start: Jan. 2003

Host:
University of Vienna

Test application area: medical




Vienna University of Technology
traumatic brain injury treatment
Predicting the outcome of seriously ill patients
analytical part focuses on data mining and On-Line Analytical
Processing (OLAP)
Target:

provide tools to discover and access relevant knowledge and
information from different distributed and heterogeneous data sources
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
3
GridMiner Overview II
 Current status:
 Prototypes GT3 based
 GDMS functionality implemented as OGSADAI R3 activity
 Going to support WSRF
 when its more stable
 General applicable
 Not bound to a special application area
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
4
Project members
Project leader:
Prof. A Min Tjoa, Vienna University of Technology
Prof. Peter Brezany, University of Vienna
Visualization:
Radoslav Ivanov
Data streaming:
Nguyen Manh Tho
Data mediation:
Alexander Wöhrer
Knowledge Mgt:
Ivan Janciak
Job Control:
Günter Kickinger
Sequence Rules:
Michael Rinner
Decision rules:
Christian Kloner
GUI:
Paul Panhofer
OLAP:
Bernhard Fiser
Umut Onan
Ibrahim Elsayed
Clustering:
Markus Mayer
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
5
The process to cover
 Data distributed over
participating hospitals
 accesses from
different platforms
(hand held, PC,…) for
data generation,
querying, analysis
 Process needs to
access various data
sources
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
6
Work packages
GMGUI - Graphical User Interface
Demo at SC 04 - research exhibition booth 2437
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
7
Outline

GridMiner overview



Members, Hosts
KDD process
Work packages
 OGSA-DAI introduction





Target
main services
Perform/Response Document
Overall architecture
Grid Data Mediation Service





Requirements => Principles
Concepts => Architecture
Example
Current Prototype
Future directions
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
8
OGSA-DAI introduction I
 Target of OGSA-DAI:
 To incorporate data resources within the OGSA
framework and accessible via a standard interface
 OGSA-DAI consists of 3 parts
 Grid Data Service (GDS):
 primary service
 access to one particular physical data resource
 providing access via a document-oriented model
 Grid Data Service Factory (GDSF):
 service creation facility
 Grid Data Service Registry (GDSR):
 directory facility for OGSA-DAI services
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
9
OGSA-DAI introduction II
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
10
OGSA-DAI introduction III
 Engine performs
specified activities
 solves well a lot of
aspects for accessing
data via the Grid
 Metadata
 Flexible
chains of activities
support new activities
 Transformation
 Delivery (to 3rd parties)
 Rights management
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
11
Outline

GridMiner overview




Members, Hosts
KDD process
Work packages
OGSA-DAI introduction




Target
main services
Perform/Response Document
Overall architecture
 Grid Data Mediation Service






Motivation: data integration scenarios
Requirements => Principles
Concepts => Architecture
Example
Current Prototype
Future directions
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
12
Data Integration I
 Single data source
 Federated data source:
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
13
Data Integration II
 Heterogeneities to overcome:
 Technical:
 OS, hardware,...
 Interface:
 access language
 Data Model:
 OO, Relational,....
 Semantic:
 equal names for different concepts
 Schematic:
 encoding of concepts with different elements of a
data model
 Structural:
 attributes are grouped into different tables
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
14
GDMS Requirements






Ease of use
Language, location, schema transparency

SQL subset

virtually one homogeneous, single RDBMS (read only)


Maintainability


easy installation & setup 
maintenance (tool supported)

Access/Authorization/Authentication

View support
Performance
Semantic issues
Security

Extensible




Open to new data sources

Flexible

User Defined Functions to solve various kinds of heterogeneities

www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
15
GDMS Principles
 Tight Federation:
 global (relational) schema
 Virtual integration:
 let the date where it is
 always up-to-date data
 No proprietary solution
 inherit well solve aspects from OGSA-DAI
 Not bound to special architecture
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
16
GDMS Concepts I
 Mapping Schema
 to describe the building process of the
virtual data source
 for each table
 operators as building blocks
 SELECT: to query data sources
 UNION/JOIN: to combine the results
 XML document
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
17
GDMS Concepts II
 Transformation Functions:
 Why:
 hard to predict all functionality once will
need
 Flexibility
 Static/dynamic java functions
 Use logical names for parameters
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
18
GDMS Architecture
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
19
Example I - Scenario
 Heterogeneities:
 Name in A is „First Last“ (as the target format)
 Name in C has to be combined
 Distribution:
 3 data sources
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
20
Example II – the basic mapping
<VDSTable name=“patient”>
<union kind=“all”>
<join>
<select source=“xmldb:..” name=“A”>
<mapSource>….</mapSource>
<sourcePart>collectionXYZ</sourcePart>
</select>
<select source=“jdbc://…” name=“B”>
<mapSource>….</mapSource>
<sourcePart>databaseXYZ</sourcePart>
</select>
<joinInfo kind=“inner”>
<left keys=“pid”>
<right keys=“pid”>
</joinInfo>
</join>
<select soure=“file://...” name=“C”>
<mapSource>….</mapSource>
<sourcePart></sourcePart>
</select>
</union>
</VDSTable>
………….
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
21
Example III – CVS mapSource
CSV file line example:
1;Woehrer;Alexander;Vienna;24/07/1980;01/01/2004
<mapSource>
<ColSeperator>;</ColSeperator>
<LineSeperator>\r\n</LineSeperator>
…
<column ref=“p_name”
transform=“combine(fn,ln)”
</column>
<column ref=“ln”>
<source>2<source>
</column>
<column ref=“fn”>
<source>3<source>
</column>
…
</mapSource>
//Transformation function for
//the CSV file
public class TestTransform {
public static String combine(
String one, String two)
{
return one+“ “+two;
}
}
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
22
Example IV – XML mapSource
<mapSource>
…
<column ref=“p_id">
<soure>/entry/@id</soure>
</column>
<column ref=“p_name">
<soure>/entry/name/text()</soure>
</column>
…
</mapSource>
Example entry in XMLDB:
<entry id=“1”>
<name>Alexander Woehrer</name>
<Address>Edinburgh</Address>
<dob>24/07/1980</dob>
</entry>
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
23
Example Query execution
 Query:
SELECT p_name FROM patient WHERE id=10
Standard
to
optimized
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
24
Current GDMS Prototype
 Supported data sources:
 RDBMS (via JDBC)
 XMLDB (Xindice)
 CSV files
 Operators: “union all” and “inner join”
 Centralized version inside OGSA-DAI R3
 SQL subset:
SELECT column, [column] FROM table
WHERE condition [AND|OR] condition
ORDER BY column [,column] [ASC|DESC]
 Operators are XQuery based (using SAXON)
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
25
Future Work
 Evolve to distributed
mediator
 Investigate the use of a
proxy database to
increase performance
 Improve tools for


Installation/setup
Maintenance
 Semantic issues
 Performance issues
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
26
References
 GridMiner project page
http://www.gridminer.org
 OGSA-DAI and DQP
http://www.ogsadai.org
 SAXON XSLT/XQuery Processor
http://saxon.sf.net/
www.gridminer.org
NeSC, 6. Sept. 04
Alexander Wöhrer
27
Download