Meta Data • Larry, Stirling md on data access – data types, domain meta-data discovery • Scott, Ohio State – caBIG md driven architecture semantic md • Alexander Vienna md tailored to optimise data integration – do we need one data source or other in integration, performance capability • Elias – OGSA-DAI – any md aspects • Leena – schema integration add semantics to schema element to aid finding elements and in integrating them – for matching • Mario OGSA-DAI – extracting md, 3rd party registries for OGSA-DAI – capabilities? OD not build registries but use globus registries • Jessie – domain semantics in md, discovery, integration Meta Data • For What? – – – – – Discovery Data Access Data Integration Optimisation Service composition? • What? – – – – Ontologies – most conceptual Schema - data types Content Capability MetaData for Discovery & Integration • • caBIG - Scott Content – Structure (XML schema) – Semantics (relationship to ontology) • Data Registry – Describe data model (UML) no constraints • • • • • • Review – curated semantic ontology (EVS, UMLS – proprietary) Bind UML model map to ontology ISO stnd md repositories – class attribute value domain semantic concept. If data type doesn’t exist – add concept to ontology UML objects -> XML (GME, data type in XML schemas) Issues – Hard to get new types into ontology • Review process – restrictive to certain users – What aspects of this could be relaxed without collapsing the whole system • e.g. remove requirement for centralised ontology – Currently ontology resides external to the registry, Registry doesn’t understand ontology • what added functionality would you get if you added semantics to the registry? MetaData for Optimisation • • SemDIG (GRIDMiner) - Alex Lots of data sources – want to choose the right one for data integration based on the data (does it contribute to the answer?) – If several sources contribute which one(s) would be the best • Technical information required not semantics as such – Information on distribution, ranges etc (summary data) • • Defining a common meta data set for data sources Solved – – the metadata to be collected (data statistics), – the collection of the data • Questions: – Can we uniformly present histograms and data required? • Pmml? Predictive model markup language (xml schema for describing decision trees, dictionaries etc) – What is the architecture for presenting it? • Can OGSA-DAI, data cutter etc. help? – Possibly integrate ideas in OGSA-DAI – Requires background threads MetaData for Discovery • • GEODE: (Larry) Has syntactic discovery based on DB schema – – Supported by OGSA-DAI: Really data access but not discovery Doesn’t tell the users what the data that is being exposed is about • • Use OGSA-DAI to expose the semantics - for domain specific discovery – • Is this possible or sensible? (Japanese team - RDF + OMII UK team - GRIMOIRES) How? – Representation mechanism • – – Mark up of the semantics of the domain in whatever the external ontology uses Storage of the semantics for discovery • • • Need a metadata registry we’ll record terms bound to local data structure which points to the external ontology If multiple ontologies used – need to refer to the specific ontology used Reason about the concepts for discovery - external • • • good if you know what you want Requires a reasoning engine Return possible concepts for checking in the data registry Issues – – Creation/association of data with the ontology terms (similar to schema mapping) How can OGSA-DAI help – using the registry • • – – are there plans for OGSA-DAI to support 1 schema (to insert semantics within the existing provision) How do you tie the term to the data item? (Similar to automed schema to RDFS ontology mapping) Look at WSDL-S (W3C) for inserting refs in a document that points to external ontology. MetaData for Access • Dave • Maintaining metadata separately from data – e.g. a metadata catalogue linked to files • Issue – How can we maintain the metadata? • Have service controlling both • Notification system Generation of MetaData& Ontologies • All preceding technologies rely on good metadata & ontologies – How do we encourage users/communities to create metadata or ontologies – Know they are correct – What are the limits - performance etc. • Service infrastructure for representing controlled vocabularies – LEXGRID – API for graph operations etc. • Point at which size of ontologies makes reasoning too expensive – Survey of scalability techniques for reasoning with ontologies • What they are doing about it…