RDA_ Meeting_MinutesCombined

advertisement
Research Data Alliance (RDA)
2nd Working Group Collaboration Meeting
November 13 – 14, 2014
All meetings will be held at:
National Institute of Standards and Technology (NIST)
Administration Building, Lecture Room A
100 Bureau Drive, Building 101, Gaithersburg, MD 20899-1060
Objective
This meeting is a continuation of a meeting series that was started in Garching last February.
The objective is to have a detailed update from and focused discussion among Working Groups,
something that has proved elusive at the busy Plenary meetings, and to generate discussion on
how the WGs can work together to improve overall RDA outputs. The focus of the Garching
meeting was on the Data Foundations and Terminology WG and the focus of this meeting will
be on the Metadata, on the relationships among the Working Groups, and the evolving notion
of the Data Fabric as a framework into which the WG outputs would fit.
Schedule (flexible)
Thursday
1:00 – 1:30 Welcome, NIST Introduction
1:30 – 2:30 Metadata Standards Directory
James Warren
Materials Genome Initiative, Data, Open Science and NIST
Goal: develop materials innovation infrastructure
Achieve national goals in energy, etc.
Design process highlighted
NIST Role in MGI

Data inputs goes from quantum scale to manufacturing capabilities
Vision improve upon the process, resolve issues around digital data
Goals:
1. Establish data and model exchange protocols
2. Create means to ensure quality
3. New methods, data driven science, big data possibilities
Use Case example to demonstrate constraints, what can the data show us?
Objective: Search over multiple repositories for data on all materials that fulfill use case
constraints, simultaneously
Success scenario: can search through models to design new materials with improved properties
DISCUSSION
Use case
Incorporating metadata into the use case has benefitted the presentation’s ability to connect
with material scientist understanding.
Problem: Very few material scientists seem to appreciate the potential
Problems with terminology, labels that exist across domains
Raphael: How does NIST meet the international need beyond the US focus?
Warren: This information is a form of publishing… In short, he believes that it should be open
with an emphasis on urgency.
Write use cases using RDA to solve problem
Rainier – repositories: Are they distributed or centralized?
Warren: desire open architecture; talks are in motion to gain that advantage
2:30 – 2:50 Data Foundations and Terminology
Jane Greenberg
Metadata Standards Directory
Goals and work plan: on target
Progress made thanks intern participation!
Goals: develop an open collaborative approach to metadata standards
Establish a working group.
Analytics show people are visiting the established directory often to evaluate standards.
Masters students to do papers and earn credit on this research.
Accomplishments
Attempt to move copy of DCC into GitHub, though not yet ready for wider use
Policy Development
Outreach success
Action items from RDA 4:
Assess GitHub approach, seek new technology and feedback participation.
Do we need firm objectives?
What is it specifically that we want to move forward on?
Hope to get updates from all eight working groups today, get working groups to respond to
objectives.
Where does metadata fit into data fabric notion?
Kathy Fontaine: Any issues with RDA website?
Difficulties with email success…
What proving, convincing impact can we share to drive our initiative?
Jane – RDA has helped to spread the word, need numbers to express compelling use of the
website.
RDA needs to prove what can be done. How does prototype help to underline our cause?
Rebecca
Metadata Standards Directory
Metadata Directory on GitHub
Changes are moderated
Former directory: more difficult to navigate
Current prototype is apparently more user friendly to those who are interested.
Barrier to making changes on GitHub: high complexity for entry
Is the page usable?
How to automate policy of permitting greater openness to directory?
Move beyond human readability to computer readability.
Kathy: Working groups if there’s more to be done, get to a point where there’s a self-contained
conclusion and report on next steps.
Peter: Argument to make: where do we invest? At a critical point where we should be blunt in
this respect.
Need good presentation package that communicates the value RDA has to offer
Keith Jeffery
Metadata Interest Group
Problem with many different standards. Need standards to share the same elements to permit
wider use.
Group sessions trialed template use cases and received feedback. Now, plan to provide revision
before ‘P5’
Need to overcome threshold of participation in GitHub





Vision: difference between data and metadata is mode of use.
Metadata not just for data, also for users, software services, computing resources
Metadata is not just for description an discovery, desire to make virtual researching
environment
Metadata must be machine-understandable as well as human understandable
Management of (meta)data is also relevant.
Concentration is on datasets
What metadata is required?
Assertions and Questions
Plan: involve not only metadata groups but all RDA
Please test packages for feedback and use separate feedback and apply to all packages.
What is needed from other groups?
Interact to encourage human knowledge base
DISCUSSION
Peter: happy with these plans
Primary issue in previous meeting: length of time.
What is needed to accomplish goals? What is possible in RDA framework?
Keith: Need active project from both groups from Europe effort to U.S.
Keith: Advocate for a technical project not necessarily collaborative one
Larry: What are specific plans?
Jane: Exploring at a high level.
Kathy: Belmont forum for funding.
Demonstrate prototype to pull awareness for organizational assembly.
Peter: Need to anticipate situation of having too much data? Directory doesn’t yet help
because it’s not yet automated.
Keith: That’s the direction we need. Make directory into package so it is machine
understandable.
Keith: will document list of recommendations
Peter: still does not see how it’s feasible
Rainier: groups struggling to define metadata, esp. details. Concentration on defining vocab for
interviews. How to validate packages?
Peter: how to make use of the knowledge?
Jane: two-pronged answer: at least go to the directory.
Peter: Is use of directory traceable? Is it documented for use?
Jane: Directory has use cases, but they aren’t standardized.
Keith: problem of people developing their own standards.
Raphael: need not just wide acceptance of standards, but also tools that can really improve use
Mary: Harvesting existing data on web. Automated harvester to search catalogue.
RDA Data Foundation and Terminology
Looking to propose data terminology interest group. Main activity is to support other work
group interests and foster communication. Need to “gear up” properly to give support.
Peter: In Germany meeting on RDA next week. He has flyer to show an explanation of RDA
results. Unfortunately, did not accomplish as much as was hoped.
Many meetings, across disciplines. Intense conversations on data. People have begun
considering the small steps needed for metadata outcomes using PID’s.
Start making training courses on working groups. This will take a lot of time, but we have the
people necessary for this in our new round of funding. Plan is to intensify further community
discussion.
Peter: Need a process model that will benefit these new initiatives.
2:50 – 3:10 Break
3:10 – 3:30 Data Type Registries
Larry
Addressed problem of implicit assumptions in data.
In order to share data, it needs to be understood. If it’s not understood and agreed upon, it’s
not worth sharing.
Goal: Explicate and share assumptions using types and type registries
What is a data type? A unique and resolvable identifier.
Need to further automate data collection so we can collect from different sources and conduct
data processing.
Raphael: What is scope of automated type registries? What different types of data will be
collected?
Larry: In short, there is interest in many.
Gary: How does relate to other metadata efforts?
Larry: By meeting standards. Being a good “metadata citizen”
3:30 – 3:50 Persistent Identifier Types
WG PID Information Types
Tobias
Report of current status:


RDA outcome process on the move
Ongoing TR / PIT discussionsFuture PIT Processes
o Checkpoint next spring?

Idea is good, but cannot reach goals in scope of current working group.

Need to go back to the users (communities)
Develop practical, central types
Data fabric wish list
PIT relevance:

Every object in the DF should bear PITs that enable automated management
3:50 – 4:10 Practical Policies
Rainier
Practical Policies
Policies that can be automated
Identification of 11 policy areas
Policy information should be carried by metadata
 Integration in the data fabric
Templates create a crude structure of a vocabulary
 Creation of a human + machine accessible vocabulary
How to build a sound vocabulary for practical policies?
Need time and money and people who will do the ‘bug’ (grunt) work
4:10 – 4:30 Wheat Data
Wheat Data
Context for creation of this interest group
Need inter-operability framework for collecting this data
Achieving semantic interoperability:
Two paths towards semantic interoperability:


Make everyone speak the same language
Provide “translations” among the existing metadata
Possible interactions with other WGs




Biosharing registries WG
Data type registries WG
Biodiversity Data Integration IG
Metadata Interest Group
Noted similarities of requirements
4:30 – 4:50 Data Seal of Approval
Repository Audit and Certification
DSA (Data Seal of Approval)-WDS (World Data Seal) Partnership WG
Mary
Goals:
Develop common catalogue, and more!
General Findings:
Two catalogues have similarities and differences
Mission / Scope:
Next steps:




Map to Nestor and ISO
Finalize the harmonized requiremenents
Begin to work on aligning procedures
Determine the relationship between DSA and WDS to each other…


Create testbed for certification
Investigate shared pool of reviewers
4:50 – 5:10 Brokering Governance
Global and Multidisciplinary Interoperability: building on existing infrastructures
Standardization is at base of interoperability
Brokering Benefits




Lowers barriers
Accelerates interconnection of disparate systems
Facilitates sustainability
…
Brokering Concerns




New paradigms pose a cultural challenge
Complexity is shifted to brokering framework
That’s a new tier to be organized and governed
Scalability of brokering framework
Goal: Address the governance of the brokering framework middleware and interconnect
existing international e-infrastructures.
Expected outcomes:



Position paper
Test of a selected governance model
Recommendation document for the RDA
Hope that metadata will help to reduce the existing models
Try to push for a more common solution
Push complexity to the broker. This is practical way of addressing that there will never be a
proper standardization of terminology
5:10 – 5:30 Discussion
Wrap up, dinner plans
Friday
09:00 – 09:30 Metadata WG reflections on Wednesday
All groups represented a need for working closer with Metadata Groups in general.
 Advice on what standards to use
 Assistance in applying metadata standards
Implications
 Syntax
 Semantics
 Temporal information
 Integrity
 Represented in some form of first order logic
Keith - Metadata Principles are up for discussion
Noted the need for more formalized version of Dublin core
Keith’s plan: drive these harder projects first so as to draw out proper builds for the simpler
ones
Keith - New groups utilized at a domain level can be difficult
Objective to get some traction with current WG’s to make the problem more prevalent.
The community appears to be eager to do the work that needs to be done.
Talpady – Groups should stay vigilant with recording and sharing best practices and guidelines
that can support the push forward
Gary – Also, cross-interest use cases can create branches between work groups. This can help
highlight “sweet spots” of collaboration and innovation.
Peter: sort out how to move forward with a viable process in order to validate our statements
of what is possible. Urges for an answer / decision that must fill this need
Larry – Suggests a metadata help desk for inter-group support; thought of as part of a service
model
Jane / Kathy - Importance of making the distinction between focus of RDA US and Euro funding.
Traditionally, US has been used for coordination and Europe for new, continued projects.
Kathy – There is still work to be done by working groups
Keith – Question the need for continuing with the 18 month project intervals to help working
groups commit to a time constraint.
Rainier – Asks where the interest group-type conversations are taking place?
Keith & Jane in agreement – There currently are problems with the forum and mailing list that
have confounded participation efforts.
Jane – Leaders of these boards have definitely tried to keep the interest up
Keith – WG’s should be able to inter-operate and we need the software tools that can support
this.
Peter – Also semantics, terminology must be clear for true interoperation.
Keith – Elaborates on need for tool that allows non-local metadata to be accessible to students
for interoperation and creating opportunity for corresponding interests
Rainier – metadata core problem of RDA. Is there a need for implementation support?
Interest from Beth urges importance of this question…
Jane, Mary, Keith respond
Keith – problem is that we have many participants, groups are fractured
Raphael – make the distinctions and maintain domain differences in metadata. Harder to work
with contextual metadata as it is a tremendous task. It, therefore, can be tedious to get
scientists to participate or to back this kind of project. Then there is the unfortunate problem
of false meta-data production. Need to seriously consider the integrity of falsified data.
Keith – Idea is to find the origin of elements in research data and store it, or even cache it so
that that it’s not so tedious and researchers are supported throughout form fills. Need to
reference in right way for this to be successful.
Discuss possibility of organizing more meetings. Greater frequency of these conversations, and
moving through important conflicts will promote understandings to come to the surface.
Gary – Also supports different categories of metadata that are mutable and related for the
virtue of discovery.
Beth – in agreement
Kathy – Is it feasible to do a metadata track?
Keith – would be obliged
Larry – seems clear metadata bunch has to drive continued efforts.
Discussion of the development of marketing material documents to appease stakeholder
interest. They are in play.
09:30 – 10:00 Five-minute responses from each of the WGs
10:00 – 10:30 Open discussion, agenda bashing
Beth – urgent matter of organizing activities into areas will be handled in the afternoon.
Machine froze, lost content
Mark – process of iterative review
Kathy – bundle must stand alone
Tobias – comfort of new documentation license
Beth – is the documentation open sourced (i.e., on Git)?
Tobias – yes and seems stable.
10:30 – 10:45 Break
Tentative schedule for the rest of Friday
10:45 – 11:45 Data Fabric as WG integrator
Peter
White paper promised as step 1 in case statement, should be simple and declarative so that
those outside of the discussion will understand what this is about. Diagrams should be simple
and well-explained.
Basic terminology ought to be agreed upon
Working through legal aspects of inter-operability.
Use cases – demonstrate good solutions that come close to what is meant by “data fabric”.
Motivation…
One issue not discussed at P4 meeting – large scale infrastructure projects that need directions
to prevent “island” solutions again.
Peter seeks agreement from group with this issue.
Mark in agreement – definitely should not be run by publishers.
Peter - Find the right moment to interact, no need to integrate.
Working groups to continue:
There will be a terminology group, PID group, DTR, PP (policy) WG
Terminology…
Acknowledge John Henry’s suggested use of system engineering terminology
John’s diagram – an abstraction of terminology use, up for discussion on whether this agreed
upon terminology should make it into the Wiki.
RDA does not want to get into business of over-explaining terminology. Group is getting
hammered in discussion to explain what words we’re using. Need to find better terms than
data fabric because it’s a loaded term. The fact that we need to dwell on explanations so much
that it’s become a major hindrance.
Beth – It is a global scope
Peter – need to come up with a joint-view so that it can be of use.
Mark – architecture is a loaded term that conflicts with people’s assumptions. Data fabric is
our unique term and we don’t have an RDA architecture.
Peter – what term gets used in the white paper?
John – Not a new concept, framework captures a lot of this need, but lacks a clear goal.
Gary – Supports viewpoint of John. Believes framework to be the most useful, we’re not trying
to build an architecture, though there is a process of structuring that resembles assembled
components and is suggestive of architecture.
Peter – White Paper is not the place to hold a dictionary of our terms as this can create more
questions than answers
Gary – If we don’t go through the details somewhere, people will be confused or mislead by the
simpler diagrams.
Peter – not easy to get active working groups into Simple Diagram 2 (indicating machine
character).
Beth – commentary on successful usefulness of diagram. Happy that it looks more
comprehensive.
Stefano – Like to say that brokering is part of processing
Between Keith and Peter: “problematic” comments on the registry discussion.
Gary - challenge of diagrams adding complexity to original visual. Lacking a clear description of
changes taking place.
Next steps…
Mark – scope of data fabric has gotten bigger than originally conceived. How does TAB group
collaborate?
Peter – TAB must make a statement. Should be monitoring what other groups are doing.
Beth – If indeed we agree, there will be an overlap of organization. Strongly encourages action
on behalf of this group.
Gary – Logical connections with RDA going forward that are clarified by discussion
Talpady – What is focus of working group? (Mark clarifies)
Larry - How would we relate this to the proposal of test beds?
Peter – we have code out there (defining test beds). Need to assess whether or not it is
working together.
Larry – for it to function can’t be a paper exercise.
Mark – in agreement.
Peter – clarifies no top-down rule, need to come to grasp the need to not get so up in arms over
terminology, architecture being an example.
John – value is that it helps you make decisions with certainty. We need to help groups see
what they can trust and invest in. Do this by backing up with data and scientific evidence.
Kathy, Peter – offer to take this discussion outside of RDA meeting
11:45 – 12:45 RDA Working Group Processes
Proposal to RDA Technical Advisory Board
Beth
Timothy – happy that TAB community should be able to identify gaps.
Kathy – Is the grouping visual intended to show that working groups are working together?
Rebecca - Why is “long tail of research data” within Trust grouping? Asks that representation is
consistent in terminology.
Larry – Visual suggests how grouping by domain focus is difficult.
Mark – problem with first diagram is that it isolates activities
Peter – The visuals can help bring groups together to understand a the common ground.
Mark – As we’re communicating with our stakeholders, encourage appropriate collaboration
btw. Working groups and interest groups. Who are we clustering for?
Mary – for purposes of communication, it’s important bins come together as something people
can relate to
Dan – raises question of visibility of this on our website.
Peter – Need for applications grouping
Gary – Attempt to rationalize between two diagrams.
Beth – Need to elucidate the dual-role.
Tobias – Usefulness of diagram is that it communicates the different purposes of groups to the
outside. Distinct point where we want others to note our cause and join in. Need for
orthogonal track.
Rainier – Who came up with area director? Not the impression we want to give.
Beth – Have to do something to rid the presiding reputation we’ve made for ourselves, that
we’re very incoherent and made up of 50 disparate groups.
Peter – We’re not big data analytics, that’s not part of our effort.
…
Beth – idea to reject proposal C
Larry – need for communication is abundantly clear. We should have as much structure as
needed and no more.
Beth closes and expresses gratitude for feedback
Larry – Shown how difficult it is to keep track of what RDA is doing. Maybe need for an internal
awareness service. For example, put out short summaries once a month.
Timothy – General news feed directing users to recent work group activities.
Peter – Agrees and someone who can keep note of the pace.
Mark – better staffed than we were, so a regular newsletter is possible. Pitch idea example of
“how to use your group guidelines”
Kathy – in search of Drupal module to support this type of functionality
Jane – confusion of another RDA registry (Resource Description and Access) international effort
that has been around for a long time.
Kathy- Any new work groups that need a process, refer to her as she is trying to come up with
these types of templates. If you form something new, be prepared to hear from her on this
matter.
Kathy… Focus on Adoption Day: Look for finished adopted prototype projects.
Vast email list to come out
Mark - His effort is getting away from start-up approach to doling out responsibilities in a much
more structured manner.
12:45 – 1:45 Lunch
1:45 – 3:0 0 3rd WG Collaboration Meeting, Open Discussion, Final thoughts
Rainier
Next WG Chairs Working Meeting
Next meeting is in Karlsruhe, June 11-12 2015 @ Karlsruhe Institute of Technology
Hotel Eden recommended and is less than two miles away from meeting location
Discussion of travel funds
Download