RDA005-Publishing_Data_Workflows-v02-1

advertisement
Publishing Data
Workflows
RDA Plenary 5 -- March 11, 2015
Session Chairs: Amy Nurnberger and Mary Vardigan
Please sign in:
http://bit.ly/1Hju0LM
Agenda
• Introduction:
Objectives
• Progress so far
• Workflow Examples
• Get involved
• Dataverse workflow presentation
• SoftwareX workflow presentation
• Use case development
•
Group notes document: http://bit.ly/1MlXysR
The working group members (currently)
• Theodora Bloom (BMJ) [CO-CHAIR]
• Sünje Dallmeier-Tiessen (Switzerland,
CERN) [CO-CHAIR]
• Elizabeth Newbold (BL) [CO-CHAIR]
• Merce Crosas (US, Harvard University)
• Michael Diepenbroek (PANGAEA)
• Kim Finney (Australia, AADC)
• John Helly (US, UCSD)
• Brian Hole (Ubiquity Press, UK)
• Varsha Khodiyar (Nature Scientific Data)
• Hylke Koers (The Netherlands, Elsevier)
• Rebecca Lawrence (UK, F1000 Research Ltd.)
• Fiona Murphy (UK, Wiley-Blackwell)
• Amy Nurnberger (US, Columbia University
Libraries)
• Lisa Raymond (US, Library Woods Hole
Oceanographic Institution)
• Johanna Schwarz (Germany, Springer)
•Jonathan Tedds (UK, University of Leicester)
•Mary Vardigan (US, ICPSR)
•Ruth Wilson (UK, Nature)
•Eva Zanzerkia (US, NSF)
•Angus Whyte (UK, DCC)
•And growing…
Others are very welcome ☺
Background and Motivation
• Only a small fraction of research data is preserved and shared, often with
a bare minimum of metadata
• Often due to the lack of “established” or “trusted” services and workflows
But there are established or emerging workflows!
• Usually in selected disciplines, e.g., Earth Sciences
• Some provide credit via citation mechanisms
Objectives
• Provide an analysis of a representative range of existing
and emerging workflows and standards for data
publishing
• Including deposit and citation
• Provide reference models, a “classification”
• Test implementations of key components for application
in new workflows
• Illustrate the benefits of the reference models for
researchers and organisations
Relevance
• Information about workflows crucial for researchers and
other stakeholders to understand the options available to
practice open science
• Helps to illustrate different possibilities for data sharing,
leading to more efficient and reliable reuse of research
data
• Shows those involved in research data where they fit in the
overall scheme of things
More detailed work programme
• Identification of a smaller set of reference models covering a range of such
workflows to include:
• For example, when and where QA/QC and data peer-review fit into the
publishing process
• Who does what and when…
• Automated vs. “manual” processes
• Selection of key use cases and organizations in which components of a
reference model can be implemented and tested for suitability
• For example: dedicated data peer review
• For example: metadata checks
First results of workflow
analysis
http://tinyurl.com/mvtbrek
Workflows in the current list
-
STFC Data centre
NSIDC Data centre
ENVRI reference model
OJS/ Dataverse
INSPIRE Digital library
NPG (PubChem & Scientific Data)
Publisher
UK Data Archive/Service
PREPARDE (NCAR CISL)
Ocean Data Publication Cookbook
(UNESCO IOC)
PURR Institutional repository
ICPSR
Edinburgh Datashare
F1000 Research
- Ubiquity Press: Open Health Data Journal+...
- PANGAEA - Data Publisher for Earth and
Environmental Sciences
- WDC Climate - Data Publisher for Climate
Sciences
- CMIP / IPCC DDC - International project series
in Climate Sciences
- GigaScience
- Dryad digital repository with integrated
journals workflow
- Stanford Digital Repository
- Academic Commons: Columbia University
Institutional Research Repository
- Elsevier: Data in Brief
- Integrated data publishing solution at Elsevier
[through “traditional” journals]
Categories we are looking at
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Discipline
Function of workflow
PID assignment to dataset
PID type -- e.g., DOI, ARK, etc.
Peer review of data (e.g., by researcher & editorial review)
Curatorial review of metadata (e.g., by institutional or subject repository?)
Technical review & checks (e.g., for data integrity at repository/data centre on ingest)
Discoverability: Indexing of the data -- if yes, where?
Formats covered
Persons/Roles involved, e.g., editor, publisher, data repository manager, etc.
Link to data paper or “standalone” data
Links to grants, usage of author PIDs
Data citation facilitated
Data life cycle referred to
Standards compliance
Observations
• The researcher/author generally initiates the workflow
• Discipline-specific repositories have the most rigorous ingest and
review processes -- more general institutional repositories have a
lighter touch
• Journals vs. repositories: For the former, any peer review is
conducted externally, for many of the latter it is internal
Repository view
Simplified generic repository
workflow
Researcher with a central role: submission/deposition
Quality
Assurance
Producer
Data
Management
LT Archiving
Data
Deposit
Ingest
Review/QA mainly internal
Dissemination
Access
Consumer/
Reuse
Consumer
(interdisciplinary)
Consumer
(disciplinary)
Quality
Assurance
Detailed
Quality
Assurance
Light
Data
Deposit
Producer
Data
Management
LT Archiving
Dissemination
Access
Ingest
Project Repositories:
• Data are published in a federated
data infrastructure
• Data are added and corrected
• Poor documentation
• Usually no data backup
• Light-weight quality assurance
against intl. and project standards
• Tendency that the project data
never become stable
• Currently no PIDs assigned or
reserved but Handles planned
Dissemination
Access
Ingest
Long-term Archive:
• Data are archived for the long term at a
single location
• Data are stable and curated
• Detailed documentation
• Data backup/redundancy
• Quality assurance process is more
detailed and includes a review
• Data is a “snapshot” of the project
data at a certain time
• DOIs assigned to data collections
Designed by
M. Stockhause
Lessons Learnt and questions
• Very diverse landscape
• Discipline-specific and cross-discipline actions
• Quality assurance a big topic in discipline-specific
•
•
•
•
repositories
Widespread persistent identification
Data citation awareness
Challenge: Bidirectional data-publication linking
Challenge: Versioning
Publisher’s perspective
Simplified generic publisher workflow
Researcher takes over several roles: submitter, reviewer,
editor potentially?
Article
submission
Producer
Peer Review
Process
Article
preparation
Editing
Data
Submission
Who takes on which role and responsibility?
Publishing
Consumer/
Reuse
- Article/data
container
- Separate
article and
datasets
Example: Dryad repository integrated
with journals
Lessons learnt and questions
• Recommended repositories for collaboration? Who
decides/how?
• External review
• Open, plus invitation
• Closed, upon invitation
• Blind
•Emerging
data and software journal landscape: no
information yet on uptake
Current and future work
How to get involved
• Contribute to the workflow analysis: http://bit.ly/1BBQQPW
• Contribute your own workflow “walk-throughs” and use cases
• Tell us what is needed for a “successful” workflow in your
institute/discipline
… Moving to implementation
• Tell us if you are interested to learn from a specific example or are
maybe considering implementing data publishing workflows
• Tell us if you have code/documentation to share
Break for presentations
Dataverse: Eleni Castro
SoftwareX: Hylke Koers
DATA PUBLISHING WORKFLOWS
WITH DATAVERSE
Eleni Castro (ecastro@fas.harvard.edu)
Institute for Quantitative Social Science (IQSS)
Harvard University
RDA 5th Plenary
WG RDA/WDS Publishing Data Workflows
March 11, 2015
25
An Integrated & Automated
Journal / Data Publishing Workflow
Journal
Repository
26
Current Workflows in Dataverse:
To Connect Data to Journals
A. Journals include Dataverse as a Recommended Repository
B. Authors Contribute Directly to a Journal’s Dataverse
C. Automated Integration of Journal + Dataverse (e.g., OJS)
27
Example of Option C: Phase 1
OJS / Dataverse Integration
Project Details: 2012-2014
 Integrating Open Journal Systems (OJS) with Dataverse
 Reference Implementation: Automated via SWORD API
 Pilot with ~ 50 journals + expand to 1000s using OJS.
 Dataverse plugin is automatically available w/ OJS.
 Future: Embed Dataverse widgets into journal article.
http://projects.iq.harvard.edu/ojs-dvn
28
In the Backend: Technical Workflow
Client sends:
Repository sends:
 XML file: AtomPub "entry”
with Dublin Core Terms (e.g.,
title, creator, isReferencedBy
(article citation), …)
 Zip file: All data files
associated with that dataset.
 XML file: “Deposit Receipt”
send data citation from
repository to client.
Plus updates from client to server during lifecycle (CRUD):
In review, reject (delete), publish first version, update new versions.
29
On the Frontend:
OJS Dataverse Plugin Walkthrough
30
Journal Manager Sets Up Plugin in OJS
31
Journal Manager Sets Up Data Policies
Including Guidelines for:
1) Authors (data citation)
2) Reviewers
3) Copyeditors
Read full Data Policies / Guidelines Template: http://bit.ly/1xkLjoZ
32
Author Submits Manuscript + Data (1)
33
Author Submits Manuscript + Data (2)
To-Do: Support for
adding multiple datasets
to a journal article.
Option to: (a) deposit into Dataverse OR; (b) if data is already in a
repository can include the data citation (w/ persistent URL/identifier).
34
Editor Reviews Article + Data
35
Approved = Data Published in Dataverse
1
2
When issue is
published:
1) URL to Article
displays in Dataverse.
2) Data Citation shows
up in OJS Article (see
next slide).
36
Article in OJS: Published w/ Data Citation
37
Video of OJS Dataverse Plugin Demo
http://bit.ly/1D1hphu
38
Phase 2: Expansion of API + Workflows
2015-2016 (collaboration w/ Odum Institute)
Project Goals
1. Expand to more journals, publishing systems, & workflows
2. Develop Community-Based Repository API Standard:
Work w/ RDA, WDS, Data FAIRport, FORCE11, CODATA, etc…
Project Questions



Should we extend the Repository API beyond SWORD?
Support for additional Metadata Schemas & fields (non-DC)?
Support for more/which dataset review workflows?
39
How Do I Get Involved?
1
Find Out More:
2
Sign up to Contribute:
3
Contact Project Coordinator:
* Visit our Collaborations page: http://bit.ly/1Bg2nkw
* Dataverse Project Site: http://dataverse.org
Repositories Workshop + Dataverse Community Meeting
June 9-11, 2015 @ Harvard http://bit.ly/1A51atJ
Eleni Castro (ecastro@fas.harvard.edu)
40
Thank You! Any Questions?
Contact Me: Eleni Castro (ecastro@fas.harvard.edu)
SoftwareX – a home for
research software
Hylke Koers, Head of Content Innovation, Elsevier
RDA Plenary 5, San Diego
Open Access
Software (like data) is high-value but hard to access
High value & easy access
Ease of access
Researcher survey, 3824 respondents
(Publishing Research Consortium, 2010)
High value & difficult to access
Importance of access
|
42
Open Access
Why SoftwareX?
• Many scholars develop software , but current
paper based system does not capture this “born
digital” research output systematically
• Users (readers) can’t find this valuable content
• Developers (authors) can’t claim credit
• Software is a research method in its own right –
and deserved to receive full academic
recognition
|
43
Open Access
|
44
SoftwareX: a home for research software
SoftwareX aims to acknowledge the impact of software on today's
research practice, and on new scientific discoveries in almost all
research domains. SoftwareX also aims to stress the importance of
the software developers who are, in part, responsible for this impact.
To this end, SoftwareX aims to support publication of research software in such a
way that:
• The software is provided with a peer-reviewed recognition of scientific impact
• The software developers are given the academic credit they deserve;
• The software is citable, allowing traditional metrics of scientific excellence to
apply;
• The academic career paths of software developers are supported rather
than hindered;
• The software is publicly available for inspection, validation, and re-use.
Above all, SoftwareX aims to inform researchers about software applications,
tools and libraries with a (proven) potential to impact the process of scientific
discovery in various domains
From “Aims & Scope”, see http://www.journals.elsevier.com/softwarex
Open Access
|
45
SoftwareX: a home for research software
• Publishing “Original Software Publications”:
The software and code can include post publication updates
- Metadata is systematically captured
-
• Article is Open Access under CC-BY license
• All software and code published is, and will remain, fully owned by
their developers.
• Peer-reviewed; dedicated software Editors & Reviewers
• Multi-disciplinary
• Submission in 3 easy steps
• GitHub repository to store and expose all software and code
• Launched at FORCE15
See http://www.journals.elsevier.com/softwarex/news/you-can-now-submit-your-software-to-softwarex/
Open Access
|
How does it work?
How to submit your software to SoftwareX in 3 easy steps:
1. Select a repository for your software or pack your software into a
zip file or archive. Remember to make your software public so
that the reviewers and readers can find it.
2. Download the template for the OSP manuscript, and write your
article describing your software following this template.
3. Submit your OSP manuscript via the SoftwareX submission site.
After review and acceptance, software and/or code will be copied to
the journal archive on GitHub and integrated with the online version of
your Original Software Publication available on ScienceDirect.
See http://www.journals.elsevier.com/softwarex
46
Open Access
Template contains structured metadata
Nr
Code metadata description
Please fill in this column
C1
Current code version
For example v42
C2
Permanent link to code/repository
used of this code version
For example: https://github.com/mozart/mozart2
C3
Legal Code License
List one of the approved licenses
C4
Code versioning system used
For example svn, git, mercurial, etc. put none if none
C5
Software code languages, tools, and
services used
For example C++, python, r, MPI, OpenCL, etc.
C6
Compilation requirements, operating
environments & dependencies
C7
If available Link to developer
documentation/manual
C8
Support email for questions
For example: http://mozart.github.io/documentation/
|
47
Open Access
|
48
Template contains structured metadata
Nr
(Executable) software metadata
description
Please fill in this column
S1
Current software version
for example 1.1, 2.4 etc.
S2
Permanent link to executables of this
version
For example:
https://github.com/combogenomics/DuctApe/relea
ses/tag/DuctApe-0.16.4
S3
Legal Software License
List one of the approved licenses
S4
Computing platforms/Operating Systems
For example Android, BSD, iOS, Linux, OS X,
Microsoft Windows, Unix-like , IBM z/OS,
distributed/web based etc.
S5
Installation requirements & dependencies
S6
If available, link to user manual - if formally For example:
published include a reference to the
http://mozart.github.io/documentation/
publication in the reference list
S7
Support email for questions
Open Access
Flexible range of open-source licenses for computer
code
•
•
•
•
•
•
•
•
•
•
Apache License, 2.0 (Apache-2.0)
BSD 3-Clause "New" or "Revised" license (BSD-3-Clause)
BSD 3-Clause "Simplified" or "FreeBSD" license (BSD-2-Clause)
GNU General Public License (GPL)
GNU Library or "Lesser" General Public License (LGPL)
MIT license (MIT)
Mozilla Public License 2.0 (MPL-2.0)
Common Development and Distribution License (CDDL-1.0)
Eclipse Public License (EPL-1.0)
Creative Commons Zero (CC0)
|
49
Open Access
And now.. The moment you have all been waiting for…
|
50
Open Access
A workflow diagram
Editorial + peerreview process
|
51
OSP published on
ScienceDirect
Bi-directional
links
Submits to journal
as OSP + code
(supp. mat.)
Researcher has
code and paper
Code made
available on journal
GitHub instance
Open Access
A workflow diagram
OSP submitted
to journal
Editorial + peerreview process
|
52
OSP published on
ScienceDirect
Bi-directional
links
OSP linked
with code
Code deposited
to (or build on)
code repository
Code made
available on journal
GitHub instance
Open Access
Thank you!
Any questions?
|
53
Discussion
Use case development
Developing use cases for
workflows
● The
tools
○ Part A: http://goo.gl/forms/Wkc7KyxvX5
○ Part B: http://goo.gl/forms/ZFRrzG6krX
● The
process
○ Walk through the tools
○ Form up in groups
○ Generate use cases
The tools: Part A
http://goo.gl/forms/Wkc7KyxvX5
The tools: Part A http://goo.gl/forms/Wkc7KyxvX5
The tools: Part A http://goo.gl/forms/Wkc7KyxvX5
The tools: Part A http://goo.gl/forms/Wkc7KyxvX5
The tools: Part A http://goo.gl/forms/Wkc7KyxvX5
Thank you! You have completed Part A of this
use case. For the next part, you will be
completing multiples of a form, to address each
individual actor listed in this use case. Click this
to get to Part B:
http://goo.gl/forms/ZFRrzG6krX
The tools: Part B
http://goo.gl/forms/ZFRrzG6krX
The tools: Part B http://goo.gl/forms/ZFRrzG6krX
The tools: Part B http://goo.gl/forms/ZFRrzG6krX
The tools: Part B http://goo.gl/forms/ZFRrzG6krX
Group up!
● The
tools
○ Part A: http://goo.gl/forms/Wkc7KyxvX5
○ Part B: http://goo.gl/forms/ZFRrzG6krX
Download