DIALOGUE Dialogue DataGrid • Relational databases, files, XML databases, object stores • Strongly typed • Multi-tiered metadata management system • Incorporates elements from OGSA-DAI, Mobius, caGrid, STORM, DataCutter, GT4 … • Scales to very large data, high end platforms Requirements • Support or interoperate with caGrid, eScience infrastructure • Interoperate with or replace SRB • Well defined relationship to Globus Alliance • Services to support high end large scale data applications • Design should include semantic metadata management • Well thought out relationship to commercial products (e.g. Information Integrator, Oracle) Topics • Overview of products – Identify standard interfaces, where things could be done the same • Ship results from one product to another • Interface for a data service (DAIS mapping etc) – What scenarios do we have? • What patterns of queries will be run on a federated system – How do we make all our products look like they come from a single vendor? • All are fairly non-overlapping • Should client APIs look similar, as well as service interfaces? • Should security be similar, handled in a similar way – Other products, how do they fit • Root (from CERN) for data access (object oriented data analysis framework) • II, SRB, GridFTP, RFT • Distributed querying – Is this part of DIALOGUE? • Multi-product data integration – E.g. join across mobius and ogsa-dai – What’s missing from DAIS specs to let us do this • Output format • Standard interfaces for e.g. query planners, plugins,… – Locating relevant data • How to data discovery – Common user tool at top level to aid installation, configuration and service development • E.g. “Introduce” suite for deploying strongly typed GT4 services – Metadata • Minimal amounts of metadata for – Discovery / Registries – Integration – Querying – Performance • Hindering uptake • Is web services / SOAP / XML the right technology for data integration? – Branding • How to do distributions outside of just the academic area • What’s the model for contributing to something like this • Awareness of componets – Common vision • Understanding how to cross-sell projects • Multi-project data integration – Collaboration difficulties • Engaging the right people • Choosing the right process • Quality assurance, brand assurance – Agreeing a model of QA together • Integration and ease of use/install • What products components are generic, which are application/distribution specific? • Getting more effort – Target joint funding • Strongly typed, strongly classified data – – – – Is this necessary for data integration? caBIG approach is probably not generic enough Need programmatic interoperability, rather than user instructed Too much burden on data provider? • What’s the minimum barrier to entry? • Generic components and tools – Which could be leveraged between products • Are they other projects which use our products together? – If not, why not – Are the problem spaces too far apart? • They are both generic, but they have different focus – Not contradictory, but not aligned • Does it make sense to develop a tightly integrated set of products? – Possibly, if funding allows: the DIALOGUE software • Common Vision • Standard interfaces and where DAIS is not enough • Naming • User tools that help using products together • Metadata – Binding metadata and data, internally and externally • Collaboration Common Vision • Either: – Plug and play world where components fit together, but no restrictions on what sets – A single generic data service powerful enough to satisfy all applications – Combinations of tightly integrated components which satisfy a targeted application area • DIALOGUE should produce a convincing demonstration of how things should work – A portfolio of how our the bits work, what needs to be changed, translated, etc. – Which could later be made robust Standard Interfaces: Is DAIS enough? • Data exploration tools, administration of sets of data resources, discovery of data resources – Can we do all of this with data integration tech on top of DAIS interfaces – Does DAIS give us the minimal set of metadata we require • Don’t want to force a particular representation – But all, say XML operations should compose well – Also need to define transfer operators between representational models (structured binary, semi-structured textual, XML, relational, objects (tbd)) • Is RDF different, or a special case of XML? – Do you need to force a set of formats for each representation • Assume small set to allow proof of concept • A standard way of specifying – – – – – • Query languages Representational models Representational formats Transfer mechanisms Endpoints A way of binding data constraints and rules to data An aside • If data contains details of how it can be represented as a service, plus rules and constraints on it – How do constraints change as you do operations • E.g. what happens when you derive, copy data • Operations which change the rules Friday Breakout groups • Lunchtime – Stocktaking of components / Collaborative work / low hanging fruit (Ally, Steve, Peter, Lucas) - Cramond – Metadata (Jessie, Mario, Scott, Alex, Leena, Larry, Elias) - Newhaven – Movement of data between components (Kostas, Neil, Umit, Shannon, Ivan) Breakout – Beyond DIALOGUE (Joel, Malcolm, Peter) - Dean • Afternoon – Metadata – Collaborative architecture • Wrapup – Organising next meeting • Unassigned – – – – Mapping of components to scenarios Schema federation / integration collaboration Data Warehousing needs Interface to bulk data / metadata Actions • Share commonalities between toolkits – White paper on choke points common to models (editor Shannon) e.g. • • • – What’s gained and lost by each combination / layering of components • • • – E.g. data model, data integration, global schema Informational document in DAIS/OGSA Data Are there common things needed from “the Grid” – • Expressed as use case, maybe tied to application scenarios (publish on our web sites, Ally) Cross-referencing of these between sites (each group choose the 5 or so papers which describe them) Later expand to include “external” components Define a glossary of agreed terminology (editor Neil) • • • Common Data Model - Representing models (HDM and GME) Schema Mappings (?+IQL and Java->XPath?) Query translation Common schema format representation (across our projects) from data access services (Amy) e.g. xsd for xml, cim for relational Component linkups – – – Explore integration of OGSA-DAI and DataCutter for image processing (Edinburgh MSc project?) STORM and OGSA-DAI, with MRC Human Genetics Unit application (Edinburgh Summer Intern?) Send across grad students from Ohio Actions • Metadata – What added functionality would you get if you added semantics to the registry as opposed to an external ontology? – Describe how to insert semantic annotations into the OGSA-DAI data resource configuration (Larry) – Can you uniformly present histograms and data required for optimisation (Alex) • Compare against Susan Malaika’s set of statistics – Send reference to survey of scalability techniques for reasoning with ontologies (Alex) – Produce strawman documents for a set of metadata required for access; optimisation; discovery and integration to be provided by a data service (Mario to ask for examples) – How can we maintain metadata for access (asked by Dave Berry) • • • Proposals for future projects Send notes of discussions to participants and subscribe them to mailing list Put it up on datagrids.org site DIALOGUE 3/4 • Venue: near GGF, Washington DC • Date: 15 -16 September 2006 • Focus: – Proposal generation – Update on documents • Venue: Vienna • Date: 28 – 30 March 2007 (PB to confirm) • Focus: – Small group discussion and document production – Finish off deliverables