Data Access Technical Guidance Note

advertisement
Concept
Pre-Proposal
Investment Development
Management & Close
Data Access Technical Guidance Note
This document provides guidance on technical solutions for data access to assist grantees and vendors when completing the Data Access Module.
Background and Criteria
The foundation has a strong commitment to maximizing the
public value of data produced with our funding, and to
practicing organizational transparency. Therefore, we seek to
ensure that data from foundation-sponsored investments are
made as widely available as possible. Grantees and vendors
who are required to complete the Data Access Module need to
specify the technological means of ensuring data accessibility,
and comply with the following criteria:





Datasets must be publicly accessible for a minimum of 5
years1
Data should be easily discoverable through conventional
search mechanisms by an informed lay person (e.g.
researchers and graduate students in the field)
Metadata on the dataset should be made available
Data must be anonymized to protect individual personal
identifiable information
Open data platform should honor any special ownership
and access preferences as agreed between the foundation
and the data producer.
If you are unsure how to prepare datasets, are sharing datasets
for which standards do not exist, or have questions about open
data repositories and would like further guidance on how to
select one, please speak with your Program Officer. They will
work with our Business Intelligence and IT departments to
advise you on a solution that complies with the Data Access
Requirements, Guiding Principles and Definitions and Data
Access Technical Guidance Note.
The preferred means of providing public access to data is to
deposit them in a publicly available data repository, archive,
or other well-established open-data platform (described
below). Other methods are acceptable, such as data enclaves,
direct sharing by investigators or their institutions and mixed
mode sharing (described below). The most cost-effective
method for providing data access is likely to depend on
several factors, including but not limited to the volume,
sensitivity, and complexity of the dataset, and the volume of
access requests anticipated.
Use of an IT platform facilitates wide availability and may
offer the benefits of a professional curator and maintenance
services. Existing public access repositories often provide
guidance on the preparation of data for sharing and public
access and sometimes provide technical assistance to meet
the data standards to which deposited data must conform.
Acceptable IT Options for Data Access
Data Archives
Data archives can be particularly attractive for those
publishing a large volume of data available to a broad set of
users. Archives frequently require applications for data use
and have well developed data use agreements.
Many data archives charge fees for hosting data. Pricing varies
and generally depends on the size of the dataset to be made
accessible. Many data archive providers also supply
professional services to assist grantees and vendors
preparing the data for publishing. This may include
identifying appropriate metadata and structuring the dataset
to improve search and data discovery.
Some data archives include data enclaves, which provide a
controlled, secure environment, managed by the grantee or
vendor, in which researchers can access the data and even
perform some analyses. This can be particularly useful when
participant confidentiality concerns, third-party licensing or
data use agreements prohibit wider sharing or distribution.
Alternative Solutions
Less frequently, or in cases where there is very limited IT
infrastructure or connectivity, the foundation may agree to
grantees and vendors providing access under their own
auspices. Other dissemination methods are recommended,
however, in order to ensure widespread knowledge of data
availability. This may entail simply posting information about
data availability on their institutional website and/or in
published work, and mailing a CD containing the datasets to
requestors. Alternatively, they may post the data on their
institutional or personal website, and include the URL in
published work.
Mixed Mode Data Sharing
Partners may also wish to develop a “mixed mode” for data
sharing that allows for more than one version of the dataset
and provides variable levels of access depending on the
version. For example, datasets with sensitive information may
be partially censored or redacted for general use, but stricter
controls through a data enclave would be applied if access to
more sensitive data were required. Some data archives offer
mixed mode sharing options.
See Data Access Requirements, Guiding Principles and Definitions for
definitions of data, datasets, metadata, and other relevant terms
1
Page 1 of 2
© 2013 Bill & Melinda Gates Foundation / For Internal Use Only
Data Access Technical Guidance Note v1.1
Recommended IT Solutions
There are a number of open data platforms available. The following vendor platforms have been previously used or are
recommended by the foundation and in most cases will enable the data publisher to meet the minimum criteria specified in the
Data Access Requirements, Guiding Principles and Definitions.
Data Access Platforms
Socrata
Junar
Dataverse
Subject Area Focus
No
No
Social Science research data
Platform Access
Both
Both
Public
Branding
Both
Both
Link
Pricing Model (U.S.$)
$250  $18,500/year*
Starting at $3,000/year
$0
Professional Services
Availability
Yes
Yes
No defined service. Will
provide some assistance /
guidance if data is of
particular interest to the
curators of the site (Harvard
University)
URL
http://www.socrata.com/
http://www.junar.com
http://thedata.org/home
Contact
mailto://gatesfoundation@
socrata.com
Comments/Notes
Socrata offers the ability to
collaborate in data creation
and management. It also
offers multiple channels for
data access including
automated interfaces and
visualizations.
Junar offers the ability to
create charts and
dashboards from the data. It
also provides some social
features like the ability to
"Like" datasets and charts.
TABLE NOTES:

Subject Area Focus: The platforms host data related to specific
topics or subject areas.

Platform Access: Describes how and to whom data access is
granted to the public



PUBLIC: Open to all without intervention by the data
publisher
LIMITED: Data publisher can selectively grant permission
to limited audience. The data publisher will be required
to administer access and identify audience by email
address, who may incur additional fees
BOTH: Both PUBLIC and LIMITED access models
available.
Branding: Platforms may accommodate a need to allow the
data publisher to publish the data under the branding of their
affiliated organization. The primary mechanisms are:
SUB-SITE: Platforms allow data publishers to create a
"sub-site" with customized branding on their platform.
This allows the data publisher to customize the page
through which data is accessed with logos, links and
contact information. Please note that additional costs may
be incurred.
Page 1 of 2
The Dataverse Network is an
open source application to
publish, share, reference,
extract and analyze research
data. It standardizes the
citation of data sets from
published work.
For more information see
http://thedata.org/citation
Dataset LINK: A URL can be provided to the dataset. The
URL can be published on data publisher's site.
BOTH: Both branding models are supported
Pricing Model: All pricing information is based on PUBLIC
platform access unless otherwise specified. In most all cases,
pricing may vary based on:



number of data publishers, i.e., number of people who
require access to create and/or manage the dataset AND
the size of the dataset being published
*optional services that may be requested (e.g. dedicated
site hosting multiple datasets, advanced security model)
Professional Services Availability: All platforms are designed
to allow the data publisher to self-serve the publication of the
data. Most all vendors will provide some upfront assistance /
guidance to orient data publisher to the platform features and
functions. In addition to this some vendors may provide
consulting services to advise data publisher on structuring the
data and/or metadata. Furthermore, if data publishers chose
to create restricted access and/or customized 'sub-sites' they
may require assistance from the platform vendor.
If you have further questions, please contact Shami Reddy in
Business Intelligence.
© 2013 Bill & Melinda Gates Foundation / For Internal Use Only
Data Access Technical Guidance Note v1.1
Download