Service-Based Distributed Query Processing on the Grid Declarative Grid Service Orchestration with OGSA-DQP Alvaro A A Fernandes Department of Computer Science University of Manchester 16-17 October 2003 Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 1 places, people, funding, projects Manchester M Nedim Alpdemir Anastasios Gounaris Norman W Paton Alvaro A A Fernandes Rizos Sakellariou 16-17 October 2003 Newcastle upon Tyne Arijit Mukherjee Jim Smith Paul Watson Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 2 motivation • Pull by applications: – overwhelming amounts of semantically complex data in – very diverse, structurally dissimilar, and autonomous, geographically dispersed data sources – requiring computationally demanding analysis. 16-17 October 2003 • Push from context and infrastructure: – Web service impetus combined with – Grid abstractions and protocols that enable, – not just dynamic resource discovery but also, – dynamic resource allocation and use. Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 5 context 1. High-level data access and integration services are needed if applications that have data with complex structure and complex semantics are to benefit from the Grid. 2. Standards for data access are emerging, and middleware products that are reference implementations of such standards are already available. 3. Distributed query processing technology is one approach to delivering (1.) given the availability of (2.). 4. Declarative service orchestration falls out. 16-17 October 2003 Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 6 OGSA-DQP approach Query Results OGSA-DQP OGSA-DAI OGSA-DAI DBMS DBMS data data 16-17 October 2003 • OGSA-DQP uses a middleware approach. • It can be seen as a mediator over OGSADAI wrappers. • It promises bottomlines regarding: – efficiency: “leave to it to schedule in parallel”; – effectiveness: “leave to it to orchestrate your services”; – usability: “use it as a Grid data service”. Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 9 OGSA-DQP example • Given two DBMSs and one • Then, OGSA-DQP acts as an analysis tool (e.g., a WS): enactor of a declarative orchestration of services on the – proteinTerm to a GO Gene Grid: Ontology running as a remote mySQL DB, reduce 3,4 – protein to a GIMS Genome op_call(Blast) Warehouse running as a exchange remote ODMG-compliant DB, 2 hash_join – Blast (sequence alignment (proteinId) scoring); exchange exchange • We can obtain alignment scores for a sequence against proteins of reduce reduce a certain kind: select p.proteinId, Blast(p.sequence) from protein p, proteinTerm t where t.termId = ‘GO:0005942’ and p.proteinId = t.proteinId 16-17 October 2003 table_scan (protein) 1 5 index_scan termId=GO:0005942 (proteinTerm) Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 16 OGSA-DQP extends/depends on extends depends on • Leonidas Fegaras’s DB system and OPTGEN optimiser generator. • OGSA/OGSI/GT3 Grid Services (GSs). • OGSA-DAI Grid Data Services (GDSs). [1997-2000] • Polar: a parallel query processing engine. [1998-2001] • Polar*: an MPICH-G distributed extension of Polar. [2002] 16-17 October 2003 • Leonidas Fegaras and David Maier’s work on a formal semantics for OQL. [TODS 25(4),2000] Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 17 OGSA-DQP manages/provides provides manages • Grid Distributed Query Services (GDQSs) that: • Grid Query Evaluation Services (GQESs) that: – interact with clients; – find and retrieve service descriptions; – parse, compile, partition and schedule the query execution over a union of distributed data sources. • The query plan is an orchestration of GQESs 16-17 October 2003 – implement the physical query algebra; – implement the query execution model and semantics; – run a partition of a query execution plan generated by a GDQS; – interact with other GQESs/GDSs/WSs but not with clients. Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 18 OGSA-DQP a brief tour (1) • It builds upon GDSs which build upon GSs. • A GDS is a leaf in a query execution plan up from which data ultimately flows. • Data resources are, thereby, virtualised. • Since they are GSs, they can be dynamically created by dynamically discovered factories and then disposed of. 16-17 October 2003 • A GDQS is a GDS capable of integration and distributed retrieval and analysis of data. • To perform a request a GDQS spawns as many GQESs in as many hosts as the partitioning and scheduling policies of the GDQS recommend for that request. Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 19 OGSA-DQP a brief tour (2) • To obtain an execution plan, a GDQS: – Interacts with registries to fetch information about the data and computational services deemed of interest by the requestor; – Interacts with GDSs and (in future) Index Services to acquire relevant metadata; – Compiles, optimises, partitions and schedules the query execution. 16-17 October 2003 Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 20 OGSA-DQP a brief tour (3) • Given a distributed query plan, a GDQS: – Interacts with GDS factories to create the leaf services in the plan; – Interacts with WSs that front-end analysis capabilities; – Commands the creation of GQESs as stipulated by the partitioning and scheduling decided on by the compiler; – Coordinates the GQESs into executing the plan. 16-17 October 2003 Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 21 findServiceData(DBSchema) GS 1 4.1 GDS G Instances GDS registerService 1 createService Registry GDSR G GS 7 perform(query) 5 findServiceData Client GDT 4 1 importSchema perform(gqes_query) GDS GDQ GDQS 6 GDT 6 (1) GDS GQES 1 G GDT . . . 2 1 perform(querySubPlan) what is going on behind the scenes Factory GDQSF G 3 1 GDS 8 GQES n G GDT 8 16-17 October 2003 Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 39 results Client G 1 4 GDT N0 GDQ GDS 3 GQES 2 hash_join (p.proteinID=t.proteinID) G perform(QuerySubplan) GDQS GDT 2 N4 createService reduce (proteinID,sequence) Factory GQESF G GDT 3 sequential_scan GDS perform(QuerySubplan) GQES 1 G reduce (p.proteinID, blast) createService perform(QuerySubplan) what is going on behind the scenes G perform(Query) GDS N2 GDS 2 Factory GQES F G Web S ervices (BLAST) operation_call blast(p.sequence) 4 4 1 N3 results GDT results GDS 3 GQES 1 G GDT (2) Factory GQESF G 2 createService reduce (p.proteinID, blast) GDS GQES 3 G operation_call blast(p.sequence) Factory GQESF G N1 reduce (proteinID) sequential_scan (term=8372) GDS G 16-17 October 2003 Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 40 the Khalaf-Leymann taxonomy for web services aggregation aggregation unconstrained grouping 16-17 October 2003 recursive wiring constrained choreography service domains agreements Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 51 OGSA-DQP various kinds of service aggregation • There is interface inheritance from GSs and GDSs. • The execution plan can be seen as encapsulating a wiring of GQESs, • But constrained, and constructed on-thefly, as in an an orchestration. 16-17 October 2003 • As in service domains, there is competition of GQESs for a role to play in the orchestration. • As is agreements, the orchestration is opportunistic, responsive to the obtaining resource levels and shortlived. Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 54 summary • OGSA-DQP is a service-based distributed query processor for the Grid that is: – Exposed as a service; – Implemented as an orchestration of services. • OGSA-DQP is an enactor of declarative Grid service orchestrations that: – Improves on Grid portals when only retrieval and analysis is involved; – Fills the gap left by the lack of a service orchestration framework in the OGSA. 16-17 October 2003 Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 57 where to find out more: papers 1. 2. 3. 4. M N Alpdemir, A Mukherjee, A Gounaris, A A A Fernandes, N W Paton, P Watson, J Smith. An Experience Report on Designing and Building OGSA-DQP: A Service Based Distributed Query Processor for the Grid. GGF9 Workshop on Designing and Building Grid Services, 2003. M N Alpdemir, A Mukherjee, A Gounaris, N W Paton, P Watson, A A A Fernandes, J Smith. Service-Based Distributed Querying on the Grid. 1st Int. Conf. on Service Oriented Computing, 2003. LNCS, to appear M N Alpdemir, A Mukherjee, A Gounaris, N W Paton, P Watson, A A A Fernandes, J Smith. OGSA-DQP: A Service-Based Distributed Query Processor for the Grid. 2nd UK e-Science All Hands Meeting, 2003. J Smith, A Gounaris, P Watson, N W Paton, A A A Fernandes, R Sakellariou. Distributed Query Processing on the Grid. GRID 2002, LNCS 2536 (papers available from http://www.cs.man.ac.uk/~alvaro/publications.html ) 16-17 October 2003 Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 58 where to find out more: software OGSA-DQP Grid middleware to query distributed data sources www.ogsadai.org.uk/dqp OGSA-DAI Grid middleware to interface with data(bases) www.ogsadai.org.uk/ Globus Toolkit Open-source implementation of OGSA/OGSI www.globustoolkit.org/ 16-17 October 2003 Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 59