OGSA-DAI data access and integration NERC GridGIS workshop

advertisement
OGSA-DAI
data access
and integration
NERC GridGIS workshop
eSI, 1 February 2006
Neil Chue Hong
Project Manager, EPCC
N.ChueHong@epcc.ed.ac.uk
+44 131 650 5957
Overview
• The Data Deluge
– challenges of increasing data availability
– benefits of bringing data together
• OGSA-DAI
– overview
– use as a data integration base layer
NERC GridGIS workshop - 1 February 2006
2
Data Services: challenges to management
• Scale
– Many sites, large collections, many uses
• Longevity
– Research requirements outlive technical decisions
• Diversity
– No “one size fits all” solutions will work
– Primary Data, Data Products, Meta Data, Administrative data, …
• Many Data Resources
– Independently owned & managed
– No common goals
– No common design
– Work hard for agreements on foundation types and ontologies
– Autonomous decisions change data, structure, policy, …
– Geographically distributed
• and I haven’t even mentioned security yet!
NERC GridGIS workshop - 1 February 2006
6
What is a data service?
• An interface to a stored collection of data
– e.g. Google and Amazon
– web services
• But the data could be:
–
–
–
–
–
replicated
shared
federated
virtual
incomplete
• Don’t care about the underlying representation
– do care about the information it represents
• Adding a service layer to existing data sources can improve
composability
NERC GridGIS workshop - 1 February 2006
8
Use Cases for Data Services
• Data Filtering:
– Single source producing large amounts of data distributed to many sites
downstream
• Data Discovery:
– many sources, many query entry points in a linked system
• Data Translation:
– source to sink, conversion of data model / structure
• Data Federation:
– many sources, linked to provide view as a single source
• Data Replication
– full or partial copies to improve throughput
• Data Integration (model aggregation)
– e.g. integration of time variant data, streams, files
• Data Integration (knowledge expansion)
– forming links between databases to increase knowledge
NERC GridGIS workshop - 1 February 2006
10
OGSA-DAI In One Slide
• An extensible framework for data
access and integration.
• Expose heterogeneous data
resources to a grid through web
services.
• Interact with data resources:
– Queries and updates.
– Data transformation / compression
– Data delivery.
• Customise for your project using
– Additional Activities
– Client Toolkit APIs
– Data Resource handlers
• A base for higher-level services
– federation, mining, visualisation,…
NERC GridGIS workshop - 1 February 2006
13
The OGSA-DAI Framework
Application
Client Toolkit
OGSA-DAI service
Engine
SQLQuery
readFile
XPath
XSLT
GZip
GridFTP
Activities
JDBC
XMLDB
File
Data
Resources
SQL
MySQL
DB2
Server
XIndice
SWISS
PROT
Databases
NERC GridGIS workshop - 1 February 2006
17
Intermediary
• Simple intermediary
Request &
Response
OGSA-DAI
DR messages
Client
Request &
Response
OGSA-DAI
DR messages
Data
Resource
Client
Data
Resource
– potential to accelerate development, logging, or filtering
• Persistent intermediary
Request &
Response
OGSA-DAI
DR messages
OGSA-DAI
Private Store
Client
Data
Resource
– e.g. to allow efficient local indexing
NERC GridGIS workshop - 1 February 2006
18
DR messages
po
rt
&
Da
ta
Tr
an
s
sa
ge
s
e
DR1
consumer
Client
Request &
Response
DR
OGSA-DAI
es
m
s
ge
sa
DR messages
DR
m
es
e
qu
Re
sa
ge
s
Data
Resource
DR messages
Data
Resource
po
ns
e
OGSA-DAI
m
es
DR2
Data
Resource
po
ns
,R
es
R
es
st
&
Re
qu
e
st
DR messages
Data
Resource
R
eq
ue
OGSA-DAI
Data
delivery
Client
Data
Resource
DR
Request &
Response
s
ge
sa
Data
Resource
DR
OGSA-DAI
es
m
Data
Resource
• Allowing composition and decentralisation
Data
Resource
Redirector, Coordinator, Network
DR3
es
,R
st
DR messages
DR
m
es
sa
ge
s
Data
Resource
s
ge
sa
Data
Resource
DR
OGSA-DAI
es
m
Data
Resource
Data
Resource
Data
Resource
DR2
rt
po
sa
ge
s
s
an
m
es
Tr
DR
ta
Da
DR messages
&
OGSA-DAI
DR1
e
Request &
Response
s
ge
sa
Data
Resource
ns
po
Client
DR
es
m
DR3
NERC GridGIS workshop - 1 February 2006
19
Extensibility Example
OGSA-DAI service
Engine
SQLQuery
Multiple
JDBC
SQL GDS
SQL
SQL
JDBC
JDBC
MySQL
SQL
SQL
JDBC
JDBC
NERC GridGIS workshop - 1 February 2006
20
Map Retrieval: Current
browser
Internet
EDINA
OGC
Service
GIS
Oracle
NERC GridGIS workshop - 1 February 2006
21
Map Retrieval: Grid Prototype
Basic client to demonstrate
proof of concept
EDINA
SO-OGC
Client
OGC
OGSA-DAI 1
GIS
NERC GridGIS workshop - 1 February 2006
Oracle
22
Map Retrieval: Security
• Exploit NGS infrastructure to provide secure access layer
EDINA
NGS Authentication
Allowed users dn
SO-OGC
Portlet
OGC
ODS 1
GIS
NERC GridGIS workshop - 1 February 2006
Oracle
23
Map Retrieval: Integration
• Exploit OGSA-DAI extensibility to add e.g. overlay
JDBC
NGS Authentication
Oracle
Census
ODS 1
SQL/XML
SO-OGC
Portlet
OGC
ODS 2
GIS
Oracle
SO-OGC
ODS 3
NERC GridGIS workshop - 1 February 2006
Application data
24
OGSA-DAI / EDINA prototyping work
• Stage 1: Using existing OGSA-DAI technology
• Stage 2: Extending OGSA-DAI
GIS
Client
InputURL
Parameters
Image/XML File
OGSA-DAI
service
DeliverFrom
GIS
Activities
URL
HTTP Data
Resource
HTTP Request
WMS
HTTP Response Server
NERC GridGIS workshop - 1 February 2006
25
Distributed Query Processing
• Higher level services building on
OGSA-DAI
– specialised metadata extraction
3,4
• Execute queries in parallel over multiple
op_call
(Blast)
exchange
data resources
• Queries mapped to algebraic
hash_join
(proteinId)
expressions for evaluation
• Parallelism represented by partitioning
reduce
queries
–Use exchange operators
• Equality based joins in current release
– supported types: long, integer, string,
double and float
reduce
exchange
reduce
1
table_scan
(protein)
NERC GridGIS workshop - 1 February 2006
2
table_scan
termID=S92
(proteinTerm)
28
DQP architecture
Query
SQL & OQL
OGSA-DAI activity
Co-ordinator
Evaluator
Evaluator
Evaluator
WS-I only
Using client toolkit
OGSA
-DAI
OGSA
-DAI
OGSA
-DAI
OGSA
-DAI
All interfaces that are
supported by toolkit
NERC GridGIS workshop - 1 February 2006
29
Contributing to OGSA-DAI
• Additional functionality:
–
–
–
–
Provide activities which implement specific functionality
Provide extra client functionality
Provide different security mechanisms
Provide higher level components and applications
• Different levels of contributions
– Based on OGSA-DAI?
– Works with OGSA-DAI?
– Part of OGSA-DAI?
NERC GridGIS workshop - 1 February 2006
37
In the near future
• A new version of the OGSA-DAI Engine
– should look mostly the same externally
– better support for concurrency, sessions and monitoring
• Implementing new versions of specifications
– DAIS Specifications
• Key things that we will be addressing:
–
–
–
–
–
Performance
A Security Model which can be applied across platforms
Full Transactions framework, distributed transactions
More data integration facilities
Better abstraction over DBMS variation
• Application centric queries
– collaborating with other projects
• Research projects looking at:
– schema mapping
– extended data resources
NERC GridGIS workshop - 1 February 2006
38
Associated Meetings and Workshops
• DIALOGUE Workshops (http://www.datagrids.org)
– Data Integration Applications: Linking Organisations to Gain
Understanding and Experience
– Bringing together Data Integration middleware and application
providers with users
– Next one at NeSC: 9-10th February 2006
– http://www.nesc.ac.uk/esi/events/636/
• Next Generation Distributed Data Management (HPDC15,
Paris)
– http://www.isi.edu/~annc/distributedDataWorkshop.html
• Data Management on Grids (VLDB’06, Seoul)
NERC GridGIS workshop - 1 February 2006
39
Conclusions
• The benefits of trying to integrate data are hindered by
challenges such as heterogeneity, scale and distribution
• A common data service layer should make data integration
easier
• OGSA-DAI provides an extensible, data service based
framework which makes it easier to implement data
integration
• GIS data is amenable to integration using data services
NERC GridGIS workshop - 1 February 2006
40
Further information
• The OGSA-DAI Project Site:
– http://www.ogsadai.org.uk
• The DAIS-WG site:
– http://forge.gridforum.org/projects/dais-wg/
• OGSA-DAI Users Mailing list
– users@ogsadai.org.uk
– General discussion on grid DAI matters
• Formal support for OGSA-DAI releases
– http://bugs.ogsadai.org.uk/
• OGSA-DAI training courses
NERC GridGIS workshop - 1 February 2006
41
Download