RDA_DFT_P5_combined Notes_20150312

advertisement
Data Foundation Interest Group Meeting Notes
Gary Berg-Cross kicked off the meeting with Co-Chair Raphael Ritz online remotely.
This was the first meeting of the groups which was formerly a working group, whose
products were delivered. Gary provided slides on this and the nature of the WG effort.
See https://www.rd-alliance.org/briefing-slides-introducing-new-dft-ig.html
One of the initial work groups to produce a document to describe our vocabulary and a
term definition tool called “Ted-T” has 100 + terms;
Gary provided an overview of term development showing a graph of terms and
relationships in early tool development. This moved from general to more specific
definitions based on discussions.
There are graphical models of relationships for things like data and digital
objects/entities, but more work is needed on these and other items.
The Ted-T tool allows you to view a list of all terms (alphabetical or hierarchical)
The Processes view includes many practical policy terms
Most are in the Data Organization area
One can add your own term: include fields such as name, definition, explanation,
example, etc. - also includes a group discussion page for a term so we can capture what
people think about terms.
There were several lessons learned from work and things to follow up on.
We need, for example, virtual sessions in between plenary’s - to inform discussion;
Going forward the tool and the term development will be useful for capturing
information and defining new terms. This will be done in coordination with several RDA
groups including:
 data fabric
 metadata
 practical policy and others
Objectives for P5 include:
 leverage existing work and approach but improve both
 facilitate community discussion on core concepts
This reflects the fact that the WG completed preliminary work; but the need for
common vocabularies in discussing things continues. One of the goals of this meeting
was to access what is the community’s interest;
Briefings from other Groups
Keith Jeffery Briefing on:
“Metadata Data Foundation and Terminology” See https://www.rdalliance.org/metadata%0Bdata-group-briefing-data-foundation-and-terminology.html
for slides
There are multiple data working groups, but the only difference between metadata and
data is mode of use or the role to which the data is put. Some data is descriptive of
other data. But metadata is not just for data, it is also for users, software services,
computing resources. (See also Mark Gehagan’s talk at P5 - Just how open are we,
really? )
MD is Not just for description and discovery. It is needed for context.
In summary we need metadata that has:
 formal syntax (structure of metadata)
 declared semantics (terms in ontological structure)1
Metadata Plan
Use cases (collecting use cases to improve template for collecting). The plan is to pull
these into a repository and move from a Directory to Catalog. In a catalog we make the
MD machine readable. This requires formalization in processable/logical languages
Recommended Metadata Packages for Purposes (canonical package good for discovery,
archiving, etc.)
For open Data we want to conceptualize MD as relationships not Elements:
 Unique Identifier (for later use including citation)
 Location (URL)
 Description
 Keywords (terms)
 Temporal coordinates
 Geospatial coordinates
 Originator (organisation(s) / person(s))
 Project
 Facility / equipment
 Quality
 Availability (licence, persistence)
 Provenance
 Citations
 Related publications (white or grey)
1
Adding semantics to metadata and definitions, such as in DFT, is a long-term goal and perhaps the MD
and DFT groups can cooperate on this, Some effort to put definitions in a RDF form with links to
appropriate ontologies may be tried within DFT. The current TeD-T has some potential here that will be
explored.



Related software
Schema
Medium / format
These relationships (provenance for example) are mapped to processes (discovery,
context, detail)
Note, we want to support E-research through metadata
Four models (user, processing, data and resource) are used to discuss data from
researcher to Information Communication Technology (ICT) environment for research.
Request from MD groups - Please:
 complete use case profiles that you come across
 document directory
Question: Are you using existing ontologies for packages?
A. We will collect all standards within 3 months - present at p6
Packages provide 1. syntactic:
2. semantic: describe elements
Question: How are packages delivered?
Frequency abuse in standards and schemes to develop common understanding for
processes
Question: Is this like building a UMLS for an interdisciplinary area?
a: Yes
Reagan Moore Presentation on “Working Group Practical Policy”
(see https://www.rd-alliance.org/practical-policy-wg-slides-dft.html for slides)
In this PP WG computer actionable policies are used to enforce data management,
automate administrative task.
Practical policy means an assertion or assurance that is enforced about a (data)
collection.
Example properties
 can be preservation assertions such as authenticity, integrity, chain of custody,
and original arrangement
 or be based on digital collection assertions such as description and arrangement
by subject
We have examples of 11 types of policies & implementation framework for policies.
A visualization of our policy components (Policy-based data management Concept
Graph) developed with Gary Berg-Cross for DFT and PP use is:
Reagan walked through the diagram noting areas of community consensus on policy and
how assertions are represented.
:
Community consensus: must define a purpose; must define the properties they
want their policy to have
Computer Actionable Implementation: each property created in community has
a policy created which includes various
Procedure: created from policy – build procedures by linking together functions
Identifiers are defined by the operations that their resolvers support. For example:





GUID
– unique identifier
Handle
– add location information
Ticket
– add access controls
Data grid logical name
– add arrangement and metadata
Workflow
– add parsing and subset extraction
There is a challenge with identifier across the large variety of objects, based on
someone else’s control of what should happen. This can be ephemeral.
We associate metadata with the procedures themselves and distinguish as does Keith
and the MD group several types of metadata:
 Provenance
 Structural
 Description
 Internal features
We do feature based indexing which extracts all words from text, extract all degrees of
freedom from data set. We automate metadata extraction.
Comment:
Q. Do you cover reservation terminology and policy?
 Yes, about 70 policies but
 There is not yet a lot of foundation vocabulary - esp. where there are
different types of preservation mechanisms;
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring
of Atmospheric Data (Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale) presented
by Cynthia. See https://www.rd-alliance.org/adoption-rda-dft-terminology-and-datamodel-description-and-structuring-atmospheric-data.html
Overview of DataFed & the Air Quality: This effort involves a collection of collections
and includes a data model to encourage interoperability.
On the “back end” represented in the diagram we can cite the catalog effectively.
DataFed developed an RDA Data Foundation and Terminology (DFT) Adoption plan
● Map DFT model to DataFed/AQ Com Cat data model
● Assess potential RDA/DFT compliance
● This is an effort to be consistent with (if not compliant with) the DFT
model
● Real-world evaluation of outcome
Work started in February and will complete in August. Comments and suggestions from
the DFT IG are welcome and encouraged by the IG chair.
We have noted some gaps in the DFT model such as:
• Where does the user fit?
• What is the granularity of PIDs.
• For use we need some best practices such as what workflow is needed to be DFT
compliant?
• How does this work for a system of aggregated datasets – or a data mediator?
• Where does the non-domain user fit into the DFT data model?
Legal Interoperability Paul Uhlir commented that he needs our method for developing
vocabularies some he could apply these to his domain and perhaps use our tool tp
define some terms from a legal perspective.
Linking to definitions
There is an issue of linking to a specific definition from our tool.
Cyndy Chandler would like to do this as part of her work. Citing the page is not really
the granularity you might want so we will have to think about how to get to what you
want.
Gary has reached out to the developers (e.g. Thomas Zastrow at Max Plank Institute
RZG) who noted:
• Its possible to link to individual wiki pages like for example:
http://smw-rda.esc.rzg.mpg.de/index.php/Access
•
But of course, such a page contains maybe more than one description.
There is work to improve the tool products and the WordCloud (http://smwrda.esc.rzg.mpg.de/index.php/File:Dftwordcloud.png) is one of the
"intermediate" results
Charles Vardeman noted that interestingly you can get to the RDF version of a page. For
example:
• Human readable:
o http://smw-rda.esc.rzg.mpg.de/index.php/Digital_Collection
• Machine version:
o http://smwrda.esc.rzg.mpg.de/index.php/Special:ExportRDF/Digital_Collection
But Semantic MediaWiki is throwing up some errors about the ExportRDF extension.
•
Discussion of Terminology evolution;
It was noted by Keith that you need a simple relationship between concept and role and
temporal component which makes things complicated. Gary replied that the same issue
applies to MD so we are in the same boat and need a common solution to definitions
and MD. This is not surprising since definitions are MD, but it is perhaps easier to see.
Next steps were discussed covering getting input ad requirements from:
• Metadata
• Practical Policy and perhaps Data Fabric
As part of Tool development we are interested in collecting additional requirements;
synonym idea and taxonomic structure formally.
Some interest groups and domain groups will push the IG support term development.
An open questions for term development is. “ can it be usefully extended to the
domain?” Dimitris Koureas (Natural History Museum London, UK)
from the IG Biodiversity Data Integration was very much interested in trying to use the
tool and the IG for their vocabulary.
EnVIVO(sp) ontology for biodiversity; ratifies standards for biodiversity work; mission is
to broaden views and perspectives from other models
Further Discussion:
Need to have some adopters to demonstrate impact of the activities
Q: What does it mean to be DFT compliant?
A: The goal is not be a validator of compliance
Q: How will you handle the overlapping terms?
A. Usually we form distinctions and formalize these.
Download