DIALOGUE

advertisement
DIALOGUE
Dialogue DataGrid
• Relational databases, files, XML databases,
object stores
• Strongly typed
• Multi-tiered metadata management system
• Incorporates elements from OGSA-DAI, Mobius,
caGrid, STORM, DataCutter, GT4 …
• Scales to very large data, high end platforms
Requirements
• Support or interoperate with caGrid, eScience
infrastructure
• Interoperate with or replace SRB
• Well defined relationship to Globus Alliance
• Services to support high end large scale data
applications
• Design should include semantic metadata
management
• Well thought out relationship to commercial
products (e.g. Information Integrator, Oracle)
Topics
• Overview of products
– Identify standard interfaces, where things could be done the
same
• Ship results from one product to another
• Interface for a data service (DAIS mapping etc)
– What scenarios do we have?
• What patterns of queries will be run on a federated system
– How do we make all our products look like they come from a
single vendor?
• All are fairly non-overlapping
• Should client APIs look similar, as well as service interfaces?
• Should security be similar, handled in a similar way
– Other products, how do they fit
• Root (from CERN) for data access (object oriented data analysis
framework)
• II, SRB, GridFTP, RFT
• Distributed querying
– Is this part of DIALOGUE?
• Multi-product data integration
– E.g. join across mobius and ogsa-dai
– What’s missing from DAIS specs to let us do this
• Output format
• Standard interfaces for e.g. query planners, plugins,…
– Locating relevant data
• How to data discovery
– Common user tool at top level to aid installation, configuration
and service development
• E.g. “Introduce” suite for deploying strongly typed GT4 services
– Metadata
• Minimal amounts of metadata for
– Discovery / Registries
– Integration
– Querying
– Performance
• Hindering uptake
• Is web services / SOAP / XML the right technology for data
integration?
– Branding
• How to do distributions outside of just the academic area
• What’s the model for contributing to something like this
• Awareness of componets
– Common vision
• Understanding how to cross-sell projects
• Multi-project data integration
– Collaboration difficulties
• Engaging the right people
• Choosing the right process
• Quality assurance, brand assurance
– Agreeing a model of QA together
• Integration and ease of use/install
• What products components are generic, which are
application/distribution specific?
• Getting more effort
– Target joint funding
• Strongly typed, strongly classified data
–
–
–
–
Is this necessary for data integration?
caBIG approach is probably not generic enough
Need programmatic interoperability, rather than user instructed
Too much burden on data provider?
• What’s the minimum barrier to entry?
• Generic components and tools
– Which could be leveraged between products
• Are they other projects which use our products together?
– If not, why not
– Are the problem spaces too far apart?
• They are both generic, but they have different focus
– Not contradictory, but not aligned
• Does it make sense to develop a tightly integrated set of products?
– Possibly, if funding allows: the DIALOGUE software
• Common Vision
• Standard interfaces and where DAIS is not
enough
• Naming
• User tools that help using products together
• Metadata
– Binding metadata and data, internally and externally
• Collaboration
Common Vision
• Either:
– Plug and play world where components fit together,
but no restrictions on what sets
– A single generic data service powerful enough to
satisfy all applications
– Combinations of tightly integrated components which
satisfy a targeted application area
• DIALOGUE should produce a convincing
demonstration of how things should work
– A portfolio of how our the bits work, what needs to be
changed, translated, etc.
– Which could later be made robust
Standard Interfaces: Is DAIS
enough?
•
Data exploration tools, administration of sets of data resources, discovery of
data resources
– Can we do all of this with data integration tech on top of DAIS interfaces
– Does DAIS give us the minimal set of metadata we require
•
Don’t want to force a particular representation
– But all, say XML operations should compose well
– Also need to define transfer operators between representational models
(structured binary, semi-structured textual, XML, relational, objects (tbd))
• Is RDF different, or a special case of XML?
– Do you need to force a set of formats for each representation
• Assume small set to allow proof of concept
•
A standard way of specifying
–
–
–
–
–
•
Query languages
Representational models
Representational formats
Transfer mechanisms
Endpoints
A way of binding data constraints and rules to data
An aside
• If data contains details of how it can be
represented as a service, plus rules and
constraints on it
– How do constraints change as you do
operations
• E.g. what happens when you derive, copy data
• Operations which change the rules
Friday Breakout groups
•
Lunchtime
– Stocktaking of components / Collaborative work / low hanging fruit (Ally, Steve,
Peter, Lucas) - Cramond
– Metadata (Jessie, Mario, Scott, Alex, Leena, Larry, Elias) - Newhaven
– Movement of data between components (Kostas, Neil, Umit, Shannon, Ivan) Breakout
– Beyond DIALOGUE (Joel, Malcolm, Peter) - Dean
•
Afternoon
– Metadata
– Collaborative architecture
•
Wrapup
– Organising next meeting
•
Unassigned
–
–
–
–
Mapping of components to scenarios
Schema federation / integration collaboration
Data Warehousing needs
Interface to bulk data / metadata
Actions
•
Share commonalities between toolkits
–
White paper on choke points common to models (editor Shannon) e.g.
•
•
•
–
What’s gained and lost by each combination / layering of components
•
•
•
–
E.g. data model, data integration, global schema
Informational document in DAIS/OGSA Data
Are there common things needed from “the Grid”
–
•
Expressed as use case, maybe tied to application scenarios (publish on our web sites, Ally)
Cross-referencing of these between sites (each group choose the 5 or so papers which describe them)
Later expand to include “external” components
Define a glossary of agreed terminology (editor Neil)
•
•
•
Common Data Model - Representing models (HDM and GME)
Schema Mappings (?+IQL and Java->XPath?)
Query translation
Common schema format representation (across our projects) from data access services
(Amy) e.g. xsd for xml, cim for relational
Component linkups
–
–
–
Explore integration of OGSA-DAI and DataCutter for image processing (Edinburgh MSc
project?)
STORM and OGSA-DAI, with MRC Human Genetics Unit application (Edinburgh Summer
Intern?)
Send across grad students from Ohio
Actions
•
Metadata
– What added functionality would you get if you added semantics to the registry as
opposed to an external ontology?
– Describe how to insert semantic annotations into the OGSA-DAI data resource
configuration (Larry)
– Can you uniformly present histograms and data required for optimisation (Alex)
• Compare against Susan Malaika’s set of statistics
– Send reference to survey of scalability techniques for reasoning with ontologies
(Alex)
– Produce strawman documents for a set of metadata required for access;
optimisation; discovery and integration to be provided by a data service (Mario to
ask for examples)
– How can we maintain metadata for access (asked by Dave Berry)
•
•
•
Proposals for future projects
Send notes of discussions to participants and subscribe them to mailing list
Put it up on datagrids.org site
DIALOGUE 3/4
• Venue: near GGF, Washington DC
• Date: 15 -16 September 2006
• Focus:
– Proposal generation
– Update on documents
• Venue: Vienna
• Date: 28 – 30 March 2007 (PB to confirm)
• Focus:
– Small group discussion and document production
– Finish off deliverables
Download