Data Access Requirements Guiding Principles and Definitions

Investment Development
Management & Close
Data Access Requirements, Guiding Principles
and Definitions
This document articulates the minimum expectations for access to datasets funded in whole or in part by the Bill & Melinda Gates foundation.
Your Program Officer has determined that a Data Access
Module must be completed as part of developing this
proposed investment.
3. The requirement to provide for data access applies to
data generated at all phases of the value chain; i.e. from
discovery and solution development; to pilot or proof of
deliverability testing and scaled implementation; to
policy and advocacy; and evaluation.
This document articulates the foundation’s minimum
expectations with respect to data access. For further guidance
please speak with your foundation Program Officer.
4. The following table lists some, but not all, of the data
types generated by our investments and indicates the
types of data to which the Data Access Module may apply.
Background and Applicability
1. Information generated during the course of activities
funded by the foundation – in the form of research
studies, datasets and evaluation results – can be
significant public goods. Data of value to the foundation
and to our partners can and should be shared to make
better, faster and more well-informed decisions and to
advance fields of technical endeavor.
2. Accelerating the translation of knowledge into products,
delivery models and policies can save and improve lives.
The completion of the Data Access Module will result in
data access plan that adheres to our principles, promotes
accessibility of data generated by foundation-sponsored
investments and demonstrates our commitment to
organizational transparency.
1. The requirement to produce a Data Access Module
(resulting in a Data Access Plan) may apply to all new and
renewing investments, as well as investments that
receive supplemental funding. For completed or existing
investments, grantees and vendors are encouraged but
not required to provide access to relevant datasets in a
manner consistent with the foundation’s principles.
Data Type
Requirements Apply
Datasets generated by
focused research
studies, and clinical or
community trials
Data from surveillance
systems or surveys
Datasets generated by
modeling or simulation
Datasets generated by
evaluation studies
Financial and
Information datasets
No, unless data are of
clear scientific,
evaluative, or policy
relevance as
determined in
discussion with a
foundation program
Physical material such
as tissue samples,
blood-spots, or assays
2. The requirement to provide data access applies to data
generated from activities sponsored in whole or in part
by the foundation, where “data” includes information
stored in electronic form resulting from experimental or
clinical measurements; observations obtained via
surveys; interviews; questionnaires; modeling or
simulation; or abstraction of documents. Quantitative and
qualitative information stored in datasets and
accompanying metadata, including codebooks, data
dictionaries, and questionnaires (see below for key
definitions related to data access) are the focus of our
commitment to increased access to data that we fund.
Page 1 of 3
© 2013 Bill & Melinda Gates Foundation / For Internal Use Only
Data Access Requirements, Guiding
Principles and Definitions v1.1
1. In order to complete the Data Access Module, you will
need to provide information about what data and
datasets will be generated; what will be made available
and accessible; how access will be ensured; the
technological means of ensuring accessibility (see the
Data Access Technical Guidance Note); the costs of
making data available; and a timeframe for data release.
2. We expect the chosen solution for making datasets
available will be implemented as soon as possible
following finalization of the identified datasets, and in
accordance with the timeline agreed with your Program
3. The data access plan that results from completing the
module should also align with the Guiding Principles
articulated below.
4. Data should be made accessible for a period of at least
five years.
5. Satisfactory implementation of data access plans may be
taken into consideration for future funding requests and
Guiding Principles
The following principles underpin our approach to data
Respect: Respect must be given to matters of identity,
privacy, and confidentiality as they pertain to the
individuals and communities from or about whom data
are collected. Respect must also be given to matters of
attribution as they pertain to researchers, evaluators, and
their collaborators.
Accountability: All processes and procedures for data
access will be transparent, clear, and consistent with data
management standards that ensure quality data,
appropriate security, and equitable access.
Stewardship: All who produce, share, and use data are
stewards of those data. They share responsibility for
ensuring that data are collected, accessed, and used in
appropriate ways, consistent with applicable laws,
regulations, and international standards of ethical
research conduct.
Cost-effectiveness: We recognize that making data
available can be costly, and therefore not all data
generated in the course of a foundation-funded activity
needs to be made publicly available. There are also
multiple options for providing access. The foundation
Program Officer therefore has discretion in deciding what
datasets should be shared and made accessible, and the
most cost-appropriate means of making them available.
Proportionality: The needs of investigators must be
balanced against those of communities and sponsors that
Page 2 of 3
expect benefits to arise from the activities to which they
contribute information or resources.
Innovation: Data access encourages diversity of analysis
and opinion; facilitates the evaluation of alternative
hypotheses; permits meta-analyses; and allows synthesis
of data from individual projects into a larger whole.
Efficiency: Providing widespread access to datasets
prevents unnecessary duplication of effort, enabling the
redirection of scarce resources to the most promising
new research endeavors, and maximizing the potential
impact of investments.
Capacity Strengthening: Data access can expedite
professional development among up-and-coming
researchers and evaluators, particularly in the global
Collaboration: Ensuring access to data among institutions
and across disciplines can also result in greater
productivity and creativity.
Data Access: Key Definitions
Data Access/Accessibility
The procedures by which any individual or organization can
freely acquire and use datasets collected or generated by
foundation grantees or vendors with funding provided by the
Data access generally involves activities such as cleaning,
storage and retrieval of data. A grantee or vendor has
provided data access when a party who is not a member of the
funded study team is able to undertake new analysis and
generate new knowledge using study data accessible through
implementing these data access principles. An example of
satisfying data access is depositing data in a public access data
Factual information, especially information organized for
analysis or used to make decisions or produce research
outputs such as publications or working papers.
In this context, ‘data’ includes experimental measurements;
clinical measurements; or observations obtained via surveys;
interviews; questionnaires; modeling or simulation, and
abstraction of documents. For the purposes of a Data Access
Plan, data does not include laboratory notebooks, partial
datasets, preliminary analyses, communication with
colleagues, drafts of scientific papers, unpublished research
protocols, or physical objects such as tissue samples or
An electronically stored collection of data and associated files
The data contained in a dataset may be from primary data
collection (e.g. a survey) or secondary data generation via
aggregation or synthesis. Datasets may contain one or more
© 2013 Bill & Melinda Gates Foundation / For Internal Use Only
Data Access Requirements, Guiding
Principles and Definitions v1.1
files and should include files that contain the data themselves;
that document and explain the individual variables; and that
explain on the collection or synthesis methodology. Some of
the information describing the data may be contained in
‘metadata’ stored with the dataset (see below).
Data Repository or Enclave
Information stored electronically with or as part of the dataset
and should be provided along with the data whenever they are
downloaded, accessed, or shared.
This may include items such as:
An online storage solution for datasets that meets the following
set of criteria and satisfies the Data Access Principles
Data must be accessible for a minimum of 5 years
Data should be easily discoverable through conventional
search mechanisms by an informed lay person (e.g.
researchers and graduate students in the field)
Metadata on the dataset should be made available (see
Metadata definition below)
Data must be anonymized to protect individual personal
identifiable information (PII)
Open data platforms should honor any special ownership
and access preferences as agreed between the foundation
and the data producer; data access may be limited to a
specific audience or granted on a case by case basis
Discoverable (or Findable)
Year of data production
Data Dictionary
Known Data Quality profile/issues
Data completeness
Other salient features of the data and dataset
Methodology used to collect/compile/create data
Research Outputs
Reports, publications, scientific presentations, policy briefs,
working papers that present summary statistics, analysis and
conclusions derived from primary or secondary data.
Research outputs are distinct from datasets. Reporting on or
sharing research results and outputs (e.g. summary statistics
or tables) fulfills some of the objectives of the foundation’s
global access principles, but does not satisfy the requirement
of data access.
Datasets are discoverable when reference links to the datasets
are included in online directories (e.g. from repositories); a
reference link to the dataset is provided in any publications or
reports, or on the project/institution website; and/or returned
when running a standard internet search
A common internet search engine should return a clear
description of the data and a working link to the dataset or
the repository where the data are housed.
Page 3 of 3
© 2013 Bill & Melinda Gates Foundation / For Internal Use Only
Data Access Requirements, Guiding
Principles and Definitions v1.1