1. Access and Reuse Facilitate access to and reuse of research data. In order to enable the maximum degree of interoperability, access to and reuse of research data should be either open and unrestricted by default or otherwise be granted to users with the fewest limitations possible. Data in the public domain ensures that there are no restrictions. Implementation Guidelines: Definition of terms Interoperability of data: Interoperability may be defined as the ‘property of a product or system … to work with other products or systems, present or future, without any restricted access or implementation’ [http://interoperability-definition.info/en]. Interoperability is an attribute that greatly facilitates usability of research data. For example, semantic interoperability depends on shared and unambiguous properties to which data refer allowing comparison or integration at scale. Similarly, legal interoperability facilitates the reuse and recombination of research data through waiving proprietary rights and providing clarity about any restrictions. It is widely recognized that the value of data lies in reuse: government and funder policies requiring that data created by publicly-funded research should – in the public good – be made available for reuse. The facility to reuse data is impaired when there is an absence of clarity about the legal conditions under which the data may be reused and when restrictions are placed on the reuse of datasets. In most circumstances legal restrictions on reuse run counter to the obligation to make research data publicly available. Restrictions can inhibit reuse to a greater extent than is sometimes realized. This can be illustrated by analogy to the idea of a ‘lowest common denominator’. In the mathematics of fractions, operations must be performed with the ‘lowest common denominator’. Similarly, when considering the legal restrictions on reuse of datasets, the ‘lowest common denominator’ means that for a derivative dataset that is the result of the combination of parts of two or more other datasets, the most restrictive terms and conditions of the underlying datasets will be transferred to the entire derivative dataset. In this way, the legal restrictions, perhaps unnecessarily imposed, can have broader, unwanted effects limiting the reuse of derived datasets in which most of the components may be subject to unrestrictive licences or rights waivers. The full definition of legal interoperability is provided above. It occurs most readily when data are clearly labeled with a waiver of rights or public domain dedication that communicates that the data may be reused without restriction. Open Access: Definitions of Open Access started in debates to promote the wider availability of scientific literature. Open Access to scholarly literature is defined by the Budapest (Feb 2002), Bethesda (June 2003) and Berlin (Oct 2003) statements or declarations. The following text is shared by the Bethesda and Berlin definitions: ‘The author(s) and copyright holder(s) grant(s) to all users a free, irrevocable, worldwide, perpetual right of access to, and a license to copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship’. [http://dash.harvard.edu/bitstream/handle/1/4725199/suber_bethesda.htm?sequence=1] The Budapest initiative is more specific about that the types of reuse includes computer assisted processing and analysis at scale: By "open access" to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited. [http://www.budapestopenaccessinitiative.org/] Open Access to research data derives many principles from the Open Access movement and from such definitions. A key statement is found in the OECD Principles and Guidelines for Access to Research Data from Public Funding, which offers the following definition of openness and Open Access to Research Data: Openness means access on equal terms for the international research community at the lowest possible cost, preferably at no more than the marginal cost of dissemination. Open access to research data from public funding should be easy, timely, user-friendly and preferably Internet-based. [OECD Principles and Guidelines for Access to Research Data from Public Funding (2007), http://www.oecd.org/sti/sci-tech/38500813.pdf] The OECD Principles and Guidelines have had widespread influence and a mentioned in many research funder policy documents. There has, in recent years, been an effort to clarify that for publicly-funded research data, it is not enough simply to make the data available, but that usability must be facilitated. Influenced by the definition of ‘intelligent openness’ advanced in the Royal Society’s Science as an Open Enterprise report, the G8 Science Ministers Statement declared that: Open scientific research data should be easily discoverable, accessible, assessable, intelligible, useable, and wherever possible interoperable to specific quality standards. [G8 Science Ministers Statement, 13 June 2013 https://www.gov.uk/government/news/g8science-ministers-statement]. In turn, the G8 definition is used in the documents presenting and supporting the European Commission’s Guidelines on Open Access to Scientific Publications and Research Data [http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oapilot-guide_en.pdf, see also Guidelines on Data Management in Horizon 2020, p.6; http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020hi-oa-data-mgt_en.pdf] From each of these definitions it can be seen that Open Access means unrestricted access to and use of scientific information and data. Open Access exists to facilitate reuse and legal interoperability is an important component of this process. Unrestricted Reuse: There are widely acknowledged and necessary restrictions which may – and sometimes by definition must – be placed on the reuse of research data. These include the need to protect personal privacy, issues of national security and in some instances commercially confidential information. All statements and policies on Open Access to research data acknowledge these limitations. Unrestricted reuse means, therefore, the absence of any restrictions over and above these and similar principles. For example, the Budapest Open Access Initiative statement observes: ‘The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.’ [Ibid] An analogous principle holds for the assertion of IPR over data produced by publicly funded research. The OECD Principles and Guidelines argue that where data produced by publicly funded research are protected by intellectual property rights, ‘the holders of these rights should nevertheless facilitate access to such data particularly for public research or other public-interest purposes.’ [p.17] Rights Waivers and Licences: The principle of legal interoperability maintains that data should be clearly labeled with conditions of reuse. When data is already in the public domain, this should be clearly stated. Otherwise, a waiver of rights or non-restrictive licence should be used. Specific Creative Commons licences were established to formally allow reuse, subject to certain conditions. CC-NC (non-commercial) allows uses that are non-commercial, defined as uses that are primarily intended for commercial advantage or monetary compensation. CC-SA (share alike) compels any user to license a derivative product under the same terms of the original dataset. It may create incompatibilities among licenses. Each of these licences imposes conditions which may create incompatibilities and licencing difficulties in derivative datasets. It is recommended not to use licences that impose these restrictions. CC-BY (attribution), allows unrestricted use of data with so long as the creator is attributed. As attribution is a fundamental principle of scholarly discourse it is commonly deemed acceptable to impose a legal requirement for attribution in this way. However, it can be argued that attribution is something covered by the conventions and norms of scholarly discourse and by the author or creators moral rights so that such a licence is unnecessary. [See discussion under Principle 6: Attribution and Credit] Ideally, then, to achieve the objectives of legal interoperability, data should be placed in the public domain with no restrictions. This can be done by assigning a CC0 waiver of rights, for example, or other form of dedication to the public domain. Public Domain: Works in the public domain are those whose intellectual property rights have expired, are inapplicable (for whatever reason the data is deemed not to be subject to IPR), or has been explicitly waived and dedicated to the public domain through the use of an appropriate licence. It is the recommendation of the Implementation Guidelines for the Principles on the Legal Interoperability of Research Data that legal interoperability is best achieved when datasets are explicitly and clearly dedicated to the public domain. Access and Reuse: Guidelines for Implementation On the basis of the foregoing definitions and discussion we provide the following guidelines for implementation to promote access and reuse. The conditions of reuse should be clearly stated, ideally with an appropriate rights waiver The data should be clearly labelled with the conditions for reuse. Ideally, this should be by means of one of the following: A clear label or statement that the data is already in the public domain; A waiver of rights that dedicates the data to the public domain; An open licence that imposes no greater restriction that the requirement to acknowledge the data creator. The waiver of rights should be clearly displayed and explained. Ideally, the information contained should be both human and machine readable. For humans, the explanation of the waiver of rights and its implications should be clear and consistent. Where restrictions have been imposed, the implications of these should be made clear to the potential user. Where the waiver of rights imposes no conditions this should be equally transparent and should not be taken for granted. Legal interoperability is best achieved if the dataset is assigned a waiver of rights Some data by definition may already be in the public domain. Where this is the case, the data should be clearly labelled as such. Conditions under which datasets may be dedicated to the public domain: the ‘anglo-saxon’ definition of copyright and the European sui generis database right means that in most cases the creator can assign a public domain rights waiver to a database which has required reasonable effort or originality in its making. Where the data has been created as a result of public funding, the data should be made available and easily accessible and should be dedicated to the public domain using an appropriate waiver of rights. Examples of rights waivers and unrestrictive licences and how they may be applied. The GEO Data Sharing Working Group’s White Paper: Mechanisms to Share Data as Part of GEOSS Data-CORE makes the following recommendation of ‘voluntary waivers or standard common-use licenses’ which meet the objectives of legal interoperability. [White Paper: Mechanisms to Share Data as Part of GEOSS DataCORE, https://www.earthobservations.org/documents/dswg/Annex%20VI%20%20%20Mechanisms%20to%20share%20data%20as%20part%20of%20GEOSS%20Data_CORE.pdf] a. Creative Commons Public Domain Mark. b. Statutory waiver of copyright. c. Creative Commons Public Domain Waiver (CC0). d. Open Data Commons Public Domain Dedication and Licence (PDDL). e. Creative Commons Attribution Licence (CC BY 4.0). For example, using CC0 the rights owner can waive all copyrights and related or neighbouring rights over the work, covering database rights and rights protecting the extraction, dissemination and reuse of data. A clear legal and administrative framework for public data should be developed The GEO Data Sharing Working Group has recommended ‘legislative, regulatory or administrative and other government measures placing all data and information produced by government entities in the public domain’. In the meantime, it rests with data creators and curators to apply public domain waivers of rights. There is need then to promote this practice through education, training and by clarifying the normative language and understanding around the waiving of rights to research data. Basic prerequisites for open access and unrestricted reuse It should be observed that in order to be ‘open’ and ‘unrestricted’, access and reuse must be facilitated. This requires online visibility, facilitated by data publication and citation good practices, as well as data preservation and curation standards (encoding, formats, protocols, PIDs, metadata schemas etc). Some of these aspects will be taken up in other parts of the Guidelines.