Meta Data

advertisement
Meta Data
• Larry, Stirling md on data access – data types, domain meta-data
discovery
• Scott, Ohio State – caBIG md driven architecture semantic md
• Alexander Vienna md tailored to optimise data integration – do we
need one data source or other in integration, performance capability
• Elias – OGSA-DAI – any md aspects
• Leena – schema integration add semantics to schema element to
aid finding elements and in integrating them – for matching
• Mario OGSA-DAI – extracting md, 3rd party registries for OGSA-DAI
– capabilities? OD not build registries but use globus registries
• Jessie – domain semantics in md, discovery, integration
Meta Data
• For What?
–
–
–
–
–
Discovery
Data Access
Data Integration
Optimisation
Service composition?
• What?
–
–
–
–
Ontologies – most conceptual
Schema - data types
Content Capability
MetaData for Discovery &
Integration
•
•
caBIG - Scott
Content
– Structure (XML schema)
– Semantics (relationship to ontology)
•
Data Registry
– Describe data model (UML) no constraints
•
•
•
•
•
•
Review – curated semantic ontology (EVS, UMLS – proprietary)
Bind UML model map to ontology
ISO stnd md repositories – class attribute value domain semantic concept.
If data type doesn’t exist – add concept to ontology
UML objects -> XML (GME, data type in XML schemas)
Issues
– Hard to get new types into ontology
• Review process – restrictive to certain users
– What aspects of this could be relaxed without collapsing the whole system
• e.g. remove requirement for centralised ontology
– Currently ontology resides external to the registry, Registry doesn’t understand
ontology
• what added functionality would you get if you added semantics to the registry?
MetaData for Optimisation
•
•
SemDIG (GRIDMiner) - Alex
Lots of data sources – want to choose the right one for data integration
based on the data (does it contribute to the answer?)
– If several sources contribute which one(s) would be the best
•
Technical information required not semantics as such
– Information on distribution, ranges etc (summary data)
•
•
Defining a common meta data set for data sources
Solved –
– the metadata to be collected (data statistics),
– the collection of the data
•
Questions:
– Can we uniformly present histograms and data required?
• Pmml? Predictive model markup language (xml schema for describing decision trees,
dictionaries etc)
– What is the architecture for presenting it?
•
Can OGSA-DAI, data cutter etc. help?
– Possibly integrate ideas in OGSA-DAI
– Requires background threads
MetaData for Discovery
•
•
GEODE: (Larry)
Has syntactic discovery based on DB schema
–
–
Supported by OGSA-DAI: Really data access but not discovery
Doesn’t tell the users what the data that is being exposed is about
•
•
Use OGSA-DAI to expose the semantics - for domain specific discovery
–
•
Is this possible or sensible? (Japanese team - RDF + OMII UK team - GRIMOIRES)
How?
–
Representation mechanism
•
–
–
Mark up of the semantics of the domain in whatever the external ontology uses
Storage of the semantics for discovery
•
•
•
Need a metadata registry
we’ll record terms bound to local data structure which points to the external ontology
If multiple ontologies used – need to refer to the specific ontology used
Reason about the concepts for discovery - external
•
•
•
good if you know what you want
Requires a reasoning engine
Return possible concepts for checking in the data registry
Issues
–
–
Creation/association of data with the ontology terms (similar to schema mapping)
How can OGSA-DAI help – using the registry
•
•
–
–
are there plans for OGSA-DAI to support 1 schema (to insert semantics within the existing provision)
How do you tie the term to the data item?
(Similar to automed schema to RDFS ontology mapping)
Look at WSDL-S (W3C) for inserting refs in a document that points to external ontology.
MetaData for Access
• Dave
• Maintaining metadata separately from data
– e.g. a metadata catalogue linked to files
• Issue
– How can we maintain the metadata?
• Have service controlling both
• Notification system
Generation of MetaData&
Ontologies
• All preceding technologies rely on good metadata &
ontologies
– How do we encourage users/communities to create metadata or
ontologies
– Know they are correct
– What are the limits - performance etc.
• Service infrastructure for representing controlled
vocabularies
– LEXGRID
– API for graph operations etc.
• Point at which size of ontologies makes reasoning too
expensive
– Survey of scalability techniques for reasoning with ontologies
• What they are doing about it…
Download