Grid DAIS: Database Access and Integration Services Greg Riccardi Florida State University riccardi@cs.fsu.edu 1 Overview of Presentation Goals of DAIS Conceptual model of Grid database access Examples of client-service interactions Discovery and creation of services Asynchronous query processing and datasets Updating from datasets Representing Sky Query in DAIS Other topics/issues 2 Goals of DAIS The group seeks to promote standards for the development of grid database services, focusing principally on providing consistent access to existing, autonomously managed databases. Provide service-based access to existing data management systems. Accommodate several widely used data management paradigms (e.g., relational, object, XML) within a consistent framework. Provide sufficient information about itself to allow the service to be used given the specification of the service and the metadata provided by the service. Peacefully coexist with other Web and Grid Service standards. Be orthogonal to Grid authentication and authorization mechanisms. Support higher-level information-integration and federation services. 3 Desirable Properties of DAIS Systems OGSI/A compliant Letter and Spirit Plugability/Extensibility Different kinds of data resources Many access mechanisms Evolvable Easy to understand and apply GridServices and WebServices applicable Supports current technology Access AND integration Existing standards/designs Tooling Integration of different models at the data level Implementable Integrateable into customer scenarios Technology independent 4 The Model – External Artifacts External data resource manager DBMS DB External data resource External data set Resultset External = external to the OGSI compliant grid 5 The Model – Logical Artifacts data resource manager DBMS DB data resource data activity session data request data set Resultset 6 Data Resource Manager External data resource manager (edrm) A data management system such as a relational database management system or a file system Data resource manager (drm) A grid service that represents the external data resource manager Binds to an existing edrm Supports management operations such as start and stop Mainly out of scope of DAIS. A place holder for interaction with other working groups 7 Data Resources External Data Resource (edr) A data construct managed by the external data resource manager, for example, a database or a directory structure. An external data resource manager may manage many external data resources Data Resource (dr) A grid service that represents an external data resource Represents the point of contact to the data structures managed by the edrm. Exposes meta-data about the structure of the edr Defines the the target for queries across the edr Can act as a notification source for notifications associated with the edr Is bound to existing or newly created edr Has similarities with a data set. More of which later. 8 Data Sets External Data Set (eds) Data logically separated from an external data resource manager Could be a snapshot (query) of a relational database or data generated by some process prior to being inserted into a database Will be typed and identifiable Data Set (ds) A service wrapper for the eds Exposes meta data about the type, description, format of the eds Immutable Exposes simple data access operations depending on the type of data. getAllData, createIterator, getTuple, getFile, getByte, etc. Can be moved while maintaining its handle and data identity Can be copied or replicated while maintaining its data identity Can be delivered to a data manager for persistence Query and update could be supported 9 Putting It Together Logical Artifact = Service External world create edrm edr bind/ create drm eds create bind create dr create das bind/ create ds data request DAIS world locate access data requester 10 Exploiting The Logical Artifacts: Data Sets edr edr GSH dr launch ds GSH move copy ds ds create reference reference create reference create das query dr launch das move service copy service target details insert/update target details Analyst1 11 Client-Server Interaction Patterns Update/Insert Retrieve 1. 4. Q A Q+U 7. A G Pipeline Q1 G=P G S S+R S1 A 2. A S 6. Q A Q1 + D P G U S1 A Q U/R Q2 A D R G=P 8. S C G=C G R C S2 Q+D G S I I P A 3. U 5. Q+D U/R Q2 + D G S S2 G=C 12 Examples of client-service interactions Discovery and creation of services 1 Create A Create 2 Result 1 dr create GSH 3 Query create GSH drm External Data Resource Manager Database 2 das 13 Examples of client-service interactions Asynchronous query processing and datasets 1 Query A 3 Id Id 4 5 C Database das create 2 Get Result ds 14 Examples of client-service interactions Updating from datasets ds 2 Get Result 1 A Database Update(Id) Status das 3 15 Example of performance estimation Query Database Status A Estimate das prepare Statistics 16 SkyQuery Cross Match Query 17 Cross Match Estimation in DAIS Query Client Query Manager Statistics Spatial Statistics Spatial Query Query Prepare das Prepare das Prepare das Statistics Spatial Query Database Database Database 18 Cross Match in DAIS Query Query Manager Id3 get Id3 create ds3 ds2 Run Query Id1 das Query das t ge ds1 Query t ge Id2 Match Id1 Query das Match Id2 create Result create Client Database Database Database 19 Other topics and issues for DAIS Data provenance management Transaction management Fault tolerance Security, logging, auditing Supporting many concurrent users Establishing the identity and provenance of datasets Creating pipelines and other workflows Querying streams of data 20