A Grid Data Integration Service (OGSA-DQP) Paul Watson, University of Newcastle-upon-Tyne

advertisement
A Grid Data Integration Service
(OGSA-DQP)
Paul Watson, University of Newcastle-upon-Tyne
based on the work of…
Norman Paton, Tasos Gounaris,
Alvaro Fernandes, Rizos Sakellariou
University of Manchester
Jim Smith, Arijit Mukherjee, Paul Watson
University of Newcastle-upon-Tyne
www.neresc.ac.uk
The Problem
• Many grid applications would
benefit from access to
distributed data
• Data sources are scattered
and autonomous
• Integration is often done by
tedious manual process
• or (recently) hand-coded
workflows
• We are interested in how to
simplify the process of querying
distributed data
• Focussing initially on
information held in (relational)
databases
www.neresc.ac.uk
2
Distributed Query Processing
• Queries are expressed in OQL
• allows computations to be included in the query
• A single query may reference data at multiple sites
• the data locations may be transparent to the query author
select p.proteinId, Blast(p.sequence)
from
protein p, proteinTerm t
where t.termId = ‘S92’ and
p.proteinId = t.proteinId
www.neresc.ac.uk
3
Query Compiler
OGSA-DQP automatically compiles and executes the query on a set of Grid
nodes - in parallel where possible
OQL
Parser
Multi-node
optimiser
www.neresc.ac.uk
Logical
Optimiser
Physical
Optimiser
Partitioner
Scheduler
Single-node
optimiser
Evaluator
4
Execution Plan
select p.proteinId,
Blast(p.sequence)
from
protein p,
proteinTerm t
where t.termId = ‘S92’ and
p.proteinId = t.proteinId
• The plan is split in to a set of partitions
• Grid resources are acquired to execute
the partitions
• in parallel where possible, required and
affordable
9,10
3-8
reduce
op_call
(Blast)
exchange
hash_join
(proteinId)
exchange
reduce
exchange
reduce
1
2
table_scan
(protein)
www.neresc.ac.uk
table_scan
termID=S92
(proteinTerm)
5
Evaluation on the Grid
• The OGSA-DQP builds on OGSA-DAI
• accesses relational databases wrapped by OGSA-DAI
• Oracle, DB2, MySQL
• Data streams between nodes
• flow control
• All services are OGSI-compliant
• built on GT3
www.neresc.ac.uk
6
Execution on the Grid
results
Client
G
1
4
GDT
N0
GDQ
G
perform(Query)
GDS
N2
GDS
GDS
3
GQES 2
hash_join
(p.proteinID=t.proteinID)
G
perform(QuerySubplan)
GDQS
GDT
2
N4
createService
reduce (proteinID,sequence)
Factory GQESF
G
GDT
3
sequential_scan
GDS
perform(QuerySubplan)
GQES 1
G
reduce (p.proteinID, blast)
createService
perform(QuerySubplan)
2
Factory GQES F
G
Web S ervices
(BLAST)
operation_call
blast(p.sequence)
4
4
1
N3
results
GDT
results
GDS
3
GQES 1
G
GDT
Factory GQESF
G
2
createService
reduce (p.proteinID, blast)
GDS
GQES 3
G
operation_call
blast(p.sequence)
Factory GQESF
G
N1
reduce (proteinID)
sequential_scan (term=8372)
GDS
G
www.neresc.ac.uk
7
Mutual Benefit
The Grid needs DQP:
DQP needs the Grid:
• Declarative, high-level
resource integration with
implicit parallelism
• Systematic access to remote
data and computational
resources
• Cost based optimisation
• Dynamic resource discovery
and allocation
www.neresc.ac.uk
8
Summary
• DQP is a potentially important technology for the Grid
• OGSA-DQP supports:
•
•
•
•
•
declarative expression of queries
location transparency
access to both data and computational resources
dynamic deployment on Grid resources
implicit parallelism
• First release made in September 2003
• available for download
• Dynamic adaptation now being investigated
• fault-tolerance, performance, cost
www.neresc.ac.uk
9
Experiences and Issues
• Remote service deployment not yet available for Grids,
but some work…
• PhD Project at Newcastle (Chris Fowler)
•
•
•
•
dynamically deploy individual services remotely
initial prototype by end of November 2003
working on security issues
WS only
• GridShed project (Newcastle + BT)
• design of hosting environments for Grids
• install execution images on nodes as required
www.neresc.ac.uk
10
Experiences & Issues
• DQP vs Workflow?
• for what space of problems is each better
• DQP advantages?
• declarative expression of intent
• cost-based choice of execution plans
• implicit parallelisation
• Investigating with Bioinformatics applications in the
myGrid project
• DQP with workflows & workflows with DQP
www.neresc.ac.uk
11
Projects/Sponsors
Projects
• OGSA-DAI
• Polar
• Polar*
• myGrid
www.neresc.ac.uk
Sponsors
12
Download