Concept Pre-Proposal Investment Development Management & Close Data Access Technical Guidance Note This document provides guidance on technical solutions for data access to assist grantees and vendors when completing the Data Access Module. Background and Criteria The foundation has a strong commitment to maximizing the public value of data produced with our funding, and to practicing organizational transparency. Therefore, we seek to ensure that data from foundation-sponsored investments are made as widely available as possible. Grantees and vendors who are required to complete the Data Access Module need to specify the technological means of ensuring data accessibility, and comply with the following criteria: Datasets must be publicly accessible for a minimum of 5 years1 Data should be easily discoverable through conventional search mechanisms by an informed lay person (e.g. researchers and graduate students in the field) Metadata on the dataset should be made available Data must be anonymized to protect individual personal identifiable information Open data platform should honor any special ownership and access preferences as agreed between the foundation and the data producer. If you are unsure how to prepare datasets, are sharing datasets for which standards do not exist, or have questions about open data repositories and would like further guidance on how to select one, please speak with your Program Officer. They will work with our Business Intelligence and IT departments to advise you on a solution that complies with the Data Access Requirements, Guiding Principles and Definitions and Data Access Technical Guidance Note. The preferred means of providing public access to data is to deposit them in a publicly available data repository, archive, or other well-established open-data platform (described below). Other methods are acceptable, such as data enclaves, direct sharing by investigators or their institutions and mixed mode sharing (described below). The most cost-effective method for providing data access is likely to depend on several factors, including but not limited to the volume, sensitivity, and complexity of the dataset, and the volume of access requests anticipated. Use of an IT platform facilitates wide availability and may offer the benefits of a professional curator and maintenance services. Existing public access repositories often provide guidance on the preparation of data for sharing and public access and sometimes provide technical assistance to meet the data standards to which deposited data must conform. Acceptable IT Options for Data Access Data Archives Data archives can be particularly attractive for those publishing a large volume of data available to a broad set of users. Archives frequently require applications for data use and have well developed data use agreements. Many data archives charge fees for hosting data. Pricing varies and generally depends on the size of the dataset to be made accessible. Many data archive providers also supply professional services to assist grantees and vendors preparing the data for publishing. This may include identifying appropriate metadata and structuring the dataset to improve search and data discovery. Some data archives include data enclaves, which provide a controlled, secure environment, managed by the grantee or vendor, in which researchers can access the data and even perform some analyses. This can be particularly useful when participant confidentiality concerns, third-party licensing or data use agreements prohibit wider sharing or distribution. Alternative Solutions Less frequently, or in cases where there is very limited IT infrastructure or connectivity, the foundation may agree to grantees and vendors providing access under their own auspices. Other dissemination methods are recommended, however, in order to ensure widespread knowledge of data availability. This may entail simply posting information about data availability on their institutional website and/or in published work, and mailing a CD containing the datasets to requestors. Alternatively, they may post the data on their institutional or personal website, and include the URL in published work. Mixed Mode Data Sharing Partners may also wish to develop a “mixed mode” for data sharing that allows for more than one version of the dataset and provides variable levels of access depending on the version. For example, datasets with sensitive information may be partially censored or redacted for general use, but stricter controls through a data enclave would be applied if access to more sensitive data were required. Some data archives offer mixed mode sharing options. See Data Access Requirements, Guiding Principles and Definitions for definitions of data, datasets, metadata, and other relevant terms 1 Page 1 of 2 © 2013 Bill & Melinda Gates Foundation / For Internal Use Only Data Access Technical Guidance Note v1.1 Recommended IT Solutions There are a number of open data platforms available. The following vendor platforms have been previously used or are recommended by the foundation and in most cases will enable the data publisher to meet the minimum criteria specified in the Data Access Requirements, Guiding Principles and Definitions. Data Access Platforms Socrata Junar Dataverse Subject Area Focus No No Social Science research data Platform Access Both Both Public Branding Both Both Link Pricing Model (U.S.$) $250 $18,500/year* Starting at $3,000/year $0 Professional Services Availability Yes Yes No defined service. Will provide some assistance / guidance if data is of particular interest to the curators of the site (Harvard University) URL http://www.socrata.com/ http://www.junar.com http://thedata.org/home Contact mailto://gatesfoundation@ socrata.com Comments/Notes Socrata offers the ability to collaborate in data creation and management. It also offers multiple channels for data access including automated interfaces and visualizations. Junar offers the ability to create charts and dashboards from the data. It also provides some social features like the ability to "Like" datasets and charts. TABLE NOTES: Subject Area Focus: The platforms host data related to specific topics or subject areas. Platform Access: Describes how and to whom data access is granted to the public PUBLIC: Open to all without intervention by the data publisher LIMITED: Data publisher can selectively grant permission to limited audience. The data publisher will be required to administer access and identify audience by email address, who may incur additional fees BOTH: Both PUBLIC and LIMITED access models available. Branding: Platforms may accommodate a need to allow the data publisher to publish the data under the branding of their affiliated organization. The primary mechanisms are: SUB-SITE: Platforms allow data publishers to create a "sub-site" with customized branding on their platform. This allows the data publisher to customize the page through which data is accessed with logos, links and contact information. Please note that additional costs may be incurred. Page 1 of 2 The Dataverse Network is an open source application to publish, share, reference, extract and analyze research data. It standardizes the citation of data sets from published work. For more information see http://thedata.org/citation Dataset LINK: A URL can be provided to the dataset. The URL can be published on data publisher's site. BOTH: Both branding models are supported Pricing Model: All pricing information is based on PUBLIC platform access unless otherwise specified. In most all cases, pricing may vary based on: number of data publishers, i.e., number of people who require access to create and/or manage the dataset AND the size of the dataset being published *optional services that may be requested (e.g. dedicated site hosting multiple datasets, advanced security model) Professional Services Availability: All platforms are designed to allow the data publisher to self-serve the publication of the data. Most all vendors will provide some upfront assistance / guidance to orient data publisher to the platform features and functions. In addition to this some vendors may provide consulting services to advise data publisher on structuring the data and/or metadata. Furthermore, if data publishers chose to create restricted access and/or customized 'sub-sites' they may require assistance from the platform vendor. If you have further questions, please contact Shami Reddy in Business Intelligence. © 2013 Bill & Melinda Gates Foundation / For Internal Use Only Data Access Technical Guidance Note v1.1