MAGIK-I: Managing Grids Containing Information and Knowledge that are Incomplete Scenario: a Semantic Grid for Astronomers Imagine a Semantic Grid that is used by Astronomers around the world for pooling together their data, and for running computations on that data. Data Grid Semantic Layer Underpinning their Grid is a “semantic layer”: an interface that allows both astronomers and Grid components to locate and query information that has been published. Meta data Publishing and Querying Information Results Databases register a description of the data that they publish. Satellites and telescopes advertise data streams. Results of computations run on a Grid can be stored. Users query the semantic layer to find relevant information. Figure 1: Scenario: An astronomer’s Grid Information in a Grid may be Incomplete Attribute Level Answer Level Query: Give me the x-ray flux of all optical sources that are redder than X. Global Level Query: Give me the x-ray flux of all objects within distance D of position P of the sky. Query: Give me all galaxies that are not detected in the radio range. Problem: Sources may be incomplete. Details: Say we have two databases: 1. Galaxy database 2. Radio emitting objects database Problem: Problem: The precision of the stored data is not detailed Some data sources may be temporarily enough. unavailable. Details: Details: The x-ray fluxes may be found by: To obtain an answer to the query all • Extracting sources redder than X from databases that contain x-ray flux information about the region with radius D, an optical database. The radio database is incomplete. Some centred on position P of the sky will be • Find counterparts in an x-ray database. galaxies emitting radio signals are missing contacted. If the red source is faint in x-ray image then it from database 2. One of these databases may be may not appear in x-ray database. An answer returned by taking the galaxies unavailable at the time of the query, but Astronomer with access to raw x-ray image from database 1 and removing objects found still registered. would be able to estimate an upper bound for in database 2 could be wrong. Requirement: the flux. Requirement: Return answer currently available along Requirement: Mechanism to deal with incomplete sources in with details of the source currently Semantic data should contain details of query answering. unavailable. precision. Aim and Objectives of MAGIK-I MAGIK-I aims to develop a logical model for integrating and querying incomplete information that is published on a Grid. Problems addressed will include: • How to describe incompleteness in data? New constructs are needed for query languages and data descriptions. • How to express an answer that takes account of incompleteness? • What is the meaning of an incomplete answer? • What algorithms can be devised for integrating data from different sources, and computing (possibly incomplete) answers to a query? Register Query Consumer Consumer Republisher Register Query & View (Q=V) Registry Schema Data The Framework Producer R-GMA is a Grid information system built by the DataGrid project. R-GMA has a producer/consumer architecture (figure 2), and includes a mediator that can answer queries posed against a global relational schema. We plan to extend R-GMA’s query execution engine in order to evaluate the concepts and algorithms developed in MAGIK-I. Project Team Collaborators Werner Nutt Howard Williams Andy Cooke Steve Fisher RAL Robert Mann ROE Producer Register View Producer Figure 2: The architecture of R-GMA Website: http://www.macs.hw.ac.uk/magik-i Email: magik-i@macs.hw.ac.uk