P1_Access_and_Reuse-15-08-14-SDH

advertisement
1. Access and Reuse
Facilitate access to and reuse of research data. In order to enable the maximum degree of
interoperability, access to and reuse of research data should be either open and unrestricted
by default or otherwise be granted to users with the fewest limitations possible. Data in the
public domain ensures that there are no restrictions.
Implementation Guidelines: Definition of terms
Interoperability of data:
Interoperability may be defined as the ‘property of a product or system … to work with other
products or systems, present or future, without any restricted access or implementation’
[http://interoperability-definition.info/en]. Interoperability is an attribute that greatly facilitates
usability of research data. For example, semantic interoperability depends on shared and
unambiguous properties to which data refer allowing comparison or integration at scale. Similarly,
legal interoperability facilitates the reuse and recombination of research data through waiving
proprietary rights and providing clarity about any restrictions.
It is widely recognized that the value of data lies in reuse: government and funder policies requiring
that data created by publicly-funded research should – in the public good – be made available for
reuse.
The facility to reuse data is impaired when there is an absence of clarity about the legal conditions
under which the data may be reused and when restrictions are placed on the reuse of datasets. In
most circumstances legal restrictions on reuse run counter to the obligation to make research data
publicly available.
Restrictions can inhibit reuse to a greater extent than is sometimes realized. This can be illustrated
by analogy to the idea of a ‘lowest common denominator’. In the mathematics of fractions,
operations must be performed with the ‘lowest common denominator’. Similarly, when considering
the legal restrictions on reuse of datasets, the ‘lowest common denominator’ means that for a
derivative dataset that is the result of the combination of parts of two or more other datasets, the
most restrictive terms and conditions of the underlying datasets will be transferred to the entire
derivative dataset. In this way, the legal restrictions, perhaps unnecessarily imposed, can have
broader, unwanted effects limiting the reuse of derived datasets in which most of the components
may be subject to unrestrictive licences or rights waivers.
The full definition of legal interoperability is provided above. It occurs most readily when data are
clearly labeled with a waiver of rights or public domain dedication that communicates that the data
may be reused without restriction.
Open Access:
Definitions of Open Access started in debates to promote the wider availability of scientific
literature. Open Access to scholarly literature is defined by the Budapest (Feb 2002), Bethesda (June
2003) and Berlin (Oct 2003) statements or declarations. The following text is shared by the Bethesda
and Berlin definitions:
‘The author(s) and copyright holder(s) grant(s) to all users a free, irrevocable, worldwide,
perpetual right of access to, and a license to copy, use, distribute, transmit and display the
work publicly and to make and distribute derivative works, in any digital medium for any
responsible purpose, subject to proper attribution of authorship’.
[http://dash.harvard.edu/bitstream/handle/1/4725199/suber_bethesda.htm?sequence=1]
The Budapest initiative is more specific about that the types of reuse includes computer assisted
processing and analysis at scale:
By "open access" to this literature, we mean its free availability on the public internet,
permitting any users to read, download, copy, distribute, print, search, or link to the full texts
of these articles, crawl them for indexing, pass them as data to software, or use them for any
other lawful purpose, without financial, legal, or technical barriers other than those
inseparable from gaining access to the internet itself. The only constraint on reproduction
and distribution, and the only role for copyright in this domain, should be to give authors
control over the integrity of their work and the right to be properly acknowledged and cited.
[http://www.budapestopenaccessinitiative.org/]
Open Access to research data derives many principles from the Open Access movement and from
such definitions. A key statement is found in the OECD Principles and Guidelines for Access to
Research Data from Public Funding, which offers the following definition of openness and Open
Access to Research Data:
Openness means access on equal terms for the international research community at the
lowest possible cost, preferably at no more than the marginal cost of dissemination. Open
access to research data from public funding should be easy, timely, user-friendly and
preferably Internet-based. [OECD Principles and Guidelines for Access to Research Data from
Public Funding (2007), http://www.oecd.org/sti/sci-tech/38500813.pdf]
The OECD Principles and Guidelines have had widespread influence and a mentioned in many
research funder policy documents. There has, in recent years, been an effort to clarify that for
publicly-funded research data, it is not enough simply to make the data available, but that usability
must be facilitated. Influenced by the definition of ‘intelligent openness’ advanced in the Royal
Society’s Science as an Open Enterprise report, the G8 Science Ministers Statement declared that:
Open scientific research data should be easily discoverable, accessible, assessable,
intelligible, useable, and wherever possible interoperable to specific quality standards. [G8
Science Ministers Statement, 13 June 2013 https://www.gov.uk/government/news/g8science-ministers-statement].
In turn, the G8 definition is used in the documents presenting and supporting the European
Commission’s Guidelines on Open Access to Scientific Publications and Research Data
[http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oapilot-guide_en.pdf, see also Guidelines on Data Management in Horizon 2020,
p.6; http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020hi-oa-data-mgt_en.pdf]
From each of these definitions it can be seen that Open Access means unrestricted access to and use
of scientific information and data. Open Access exists to facilitate reuse and legal interoperability is
an important component of this process.
Unrestricted Reuse:
There are widely acknowledged and necessary restrictions which may – and sometimes by
definition must – be placed on the reuse of research data. These include the need to protect
personal privacy, issues of national security and in some instances commercially confidential
information. All statements and policies on Open Access to research data acknowledge these
limitations.
Unrestricted reuse means, therefore, the absence of any restrictions over and above these and
similar principles. For example, the Budapest Open Access Initiative statement observes: ‘The only
constraint on reproduction and distribution, and the only role for copyright in this domain, should be
to give authors control over the integrity of their work and the right to be properly acknowledged
and cited.’ [Ibid] An analogous principle holds for the assertion of IPR over data produced by
publicly funded research.
The OECD Principles and Guidelines argue that where data produced by publicly funded research are
protected by intellectual property rights, ‘the holders of these rights should nevertheless facilitate
access to such data particularly for public research or other public-interest purposes.’ [p.17]
Rights Waivers and Licences:
The principle of legal interoperability maintains that data should be clearly labeled with conditions of
reuse. When data is already in the public domain, this should be clearly stated. Otherwise, a waiver
of rights or non-restrictive licence should be used.
Specific Creative Commons licences were established to formally allow reuse, subject to certain
conditions.
CC-NC (non-commercial) allows uses that are non-commercial, defined as uses that are
primarily intended for commercial advantage or monetary compensation.
CC-SA (share alike) compels any user to license a derivative product under the same terms of
the original dataset. It may create incompatibilities among licenses.
Each of these licences imposes conditions which may create incompatibilities and licencing
difficulties in derivative datasets. It is recommended not to use licences that impose these
restrictions.
CC-BY (attribution), allows unrestricted use of data with so long as the creator is attributed.
As attribution is a fundamental principle of scholarly discourse it is commonly deemed acceptable to
impose a legal requirement for attribution in this way. However, it can be argued that attribution is
something covered by the conventions and norms of scholarly discourse and by the author or
creators moral rights so that such a licence is unnecessary. [See discussion under Principle 6:
Attribution and Credit]
Ideally, then, to achieve the objectives of legal interoperability, data should be placed in the public
domain with no restrictions. This can be done by assigning a CC0 waiver of rights, for example, or
other form of dedication to the public domain.
Public Domain:
Works in the public domain are those whose intellectual property rights have expired, are
inapplicable (for whatever reason the data is deemed not to be subject to IPR), or has been explicitly
waived and dedicated to the public domain through the use of an appropriate licence. It is the
recommendation of the Implementation Guidelines for the Principles on the Legal Interoperability of
Research Data that legal interoperability is best achieved when datasets are explicitly and clearly
dedicated to the public domain.
Access and Reuse: Guidelines for Implementation
On the basis of the foregoing definitions and discussion we provide the following guidelines for
implementation to promote access and reuse.
The conditions of reuse should be clearly stated, ideally with an appropriate rights waiver
The data should be clearly labelled with the conditions for reuse. Ideally, this should be by means of
one of the following:

A clear label or statement that the data is already in the public domain;

A waiver of rights that dedicates the data to the public domain;

An open licence that imposes no greater restriction that the requirement to acknowledge the
data creator.
The waiver of rights should be clearly displayed and explained. Ideally, the information contained
should be both human and machine readable. For humans, the explanation of the waiver of rights
and its implications should be clear and consistent. Where restrictions have been imposed, the
implications of these should be made clear to the potential user. Where the waiver of rights imposes
no conditions this should be equally transparent and should not be taken for granted.
Legal interoperability is best achieved if the dataset is assigned a waiver of rights
Some data by definition may already be in the public domain. Where this is the case, the data should
be clearly labelled as such.
Conditions under which datasets may be dedicated to the public domain: the ‘anglo-saxon’ definition
of copyright and the European sui generis database right means that in most cases the
creator can assign a public domain rights waiver to a database which has required reasonable effort
or originality in its making.
Where the data has been created as a result of public funding, the data should be made available
and easily accessible and should be dedicated to the public domain using an appropriate waiver of
rights.
Examples of rights waivers and unrestrictive licences and how they may be applied.
The GEO Data Sharing Working Group’s White Paper: Mechanisms to Share Data as Part of GEOSS
Data-CORE makes the following recommendation of ‘voluntary waivers or standard common-use
licenses’ which meet the objectives of legal interoperability. [White Paper: Mechanisms to Share
Data as Part of GEOSS DataCORE, https://www.earthobservations.org/documents/dswg/Annex%20VI%20%20%20Mechanisms%20to%20share%20data%20as%20part%20of%20GEOSS%20Data_CORE.pdf]
a.
Creative Commons Public Domain Mark.
b.
Statutory waiver of copyright.
c.
Creative Commons Public Domain Waiver (CC0).
d.
Open Data Commons Public Domain Dedication and Licence (PDDL).
e.
Creative Commons Attribution Licence (CC BY 4.0).
For example, using CC0 the rights owner can waive all copyrights and related or neighbouring rights
over the work, covering database rights and rights protecting the extraction, dissemination and reuse
of data.
A clear legal and administrative framework for public data should be developed
The GEO Data Sharing Working Group has recommended ‘legislative, regulatory or administrative
and other government measures placing all data and information produced by government entities
in the public domain’. In the meantime, it rests with data creators and curators to apply public
domain waivers of rights. There is need then to promote this practice through education, training
and by clarifying the normative language and understanding around the waiving of rights to research
data.
Basic prerequisites for open access and unrestricted reuse
It should be observed that in order to be ‘open’ and ‘unrestricted’, access and reuse must be
facilitated. This requires online visibility, facilitated by data publication and citation good practices,
as well as data preservation and curation standards (encoding, formats, protocols, PIDs, metadata
schemas etc). Some of these aspects will be taken up in other parts of the Guidelines.
Download