Concept Pre-Proposal Investment Development Management & Close Data Access Requirements, Guiding Principles and Definitions This document articulates the minimum expectations for access to datasets funded in whole or in part by the Bill & Melinda Gates foundation. Your Program Officer has determined that a Data Access Module must be completed as part of developing this proposed investment. 3. The requirement to provide for data access applies to data generated at all phases of the value chain; i.e. from discovery and solution development; to pilot or proof of deliverability testing and scaled implementation; to policy and advocacy; and evaluation. This document articulates the foundation’s minimum expectations with respect to data access. For further guidance please speak with your foundation Program Officer. 4. The following table lists some, but not all, of the data types generated by our investments and indicates the types of data to which the Data Access Module may apply. Background and Applicability Background 1. Information generated during the course of activities funded by the foundation – in the form of research studies, datasets and evaluation results – can be significant public goods. Data of value to the foundation and to our partners can and should be shared to make better, faster and more well-informed decisions and to advance fields of technical endeavor. 2. Accelerating the translation of knowledge into products, delivery models and policies can save and improve lives. The completion of the Data Access Module will result in data access plan that adheres to our principles, promotes accessibility of data generated by foundation-sponsored investments and demonstrates our commitment to organizational transparency. Applicability 1. The requirement to produce a Data Access Module (resulting in a Data Access Plan) may apply to all new and renewing investments, as well as investments that receive supplemental funding. For completed or existing investments, grantees and vendors are encouraged but not required to provide access to relevant datasets in a manner consistent with the foundation’s principles. Data Type Requirements Apply Datasets generated by focused research studies, and clinical or community trials Yes Data from surveillance systems or surveys Yes Datasets generated by modeling or simulation studies Yes Datasets generated by evaluation studies Yes Financial and Management Information datasets No, unless data are of clear scientific, evaluative, or policy relevance as determined in discussion with a foundation program officer Physical material such as tissue samples, blood-spots, or assays No 2. The requirement to provide data access applies to data generated from activities sponsored in whole or in part by the foundation, where “data” includes information stored in electronic form resulting from experimental or clinical measurements; observations obtained via surveys; interviews; questionnaires; modeling or simulation; or abstraction of documents. Quantitative and qualitative information stored in datasets and accompanying metadata, including codebooks, data dictionaries, and questionnaires (see below for key definitions related to data access) are the focus of our commitment to increased access to data that we fund. Page 1 of 3 © 2013 Bill & Melinda Gates Foundation / For Internal Use Only Data Access Requirements, Guiding Principles and Definitions v1.1 Requirements 1. In order to complete the Data Access Module, you will need to provide information about what data and datasets will be generated; what will be made available and accessible; how access will be ensured; the technological means of ensuring accessibility (see the Data Access Technical Guidance Note); the costs of making data available; and a timeframe for data release. 2. We expect the chosen solution for making datasets available will be implemented as soon as possible following finalization of the identified datasets, and in accordance with the timeline agreed with your Program Officers. 3. The data access plan that results from completing the module should also align with the Guiding Principles articulated below. 4. Data should be made accessible for a period of at least five years. 5. Satisfactory implementation of data access plans may be taken into consideration for future funding requests and decisions. Guiding Principles The following principles underpin our approach to data access: Respect: Respect must be given to matters of identity, privacy, and confidentiality as they pertain to the individuals and communities from or about whom data are collected. Respect must also be given to matters of attribution as they pertain to researchers, evaluators, and their collaborators. Accountability: All processes and procedures for data access will be transparent, clear, and consistent with data management standards that ensure quality data, appropriate security, and equitable access. Stewardship: All who produce, share, and use data are stewards of those data. They share responsibility for ensuring that data are collected, accessed, and used in appropriate ways, consistent with applicable laws, regulations, and international standards of ethical research conduct. Cost-effectiveness: We recognize that making data available can be costly, and therefore not all data generated in the course of a foundation-funded activity needs to be made publicly available. There are also multiple options for providing access. The foundation Program Officer therefore has discretion in deciding what datasets should be shared and made accessible, and the most cost-appropriate means of making them available. Proportionality: The needs of investigators must be balanced against those of communities and sponsors that Page 2 of 3 expect benefits to arise from the activities to which they contribute information or resources. Innovation: Data access encourages diversity of analysis and opinion; facilitates the evaluation of alternative hypotheses; permits meta-analyses; and allows synthesis of data from individual projects into a larger whole. Efficiency: Providing widespread access to datasets prevents unnecessary duplication of effort, enabling the redirection of scarce resources to the most promising new research endeavors, and maximizing the potential impact of investments. Capacity Strengthening: Data access can expedite professional development among up-and-coming researchers and evaluators, particularly in the global south. Collaboration: Ensuring access to data among institutions and across disciplines can also result in greater productivity and creativity. Data Access: Key Definitions Data Access/Accessibility The procedures by which any individual or organization can freely acquire and use datasets collected or generated by foundation grantees or vendors with funding provided by the foundation. Data access generally involves activities such as cleaning, storage and retrieval of data. A grantee or vendor has provided data access when a party who is not a member of the funded study team is able to undertake new analysis and generate new knowledge using study data accessible through implementing these data access principles. An example of satisfying data access is depositing data in a public access data repository. Data Factual information, especially information organized for analysis or used to make decisions or produce research outputs such as publications or working papers. In this context, ‘data’ includes experimental measurements; clinical measurements; or observations obtained via surveys; interviews; questionnaires; modeling or simulation, and abstraction of documents. For the purposes of a Data Access Plan, data does not include laboratory notebooks, partial datasets, preliminary analyses, communication with colleagues, drafts of scientific papers, unpublished research protocols, or physical objects such as tissue samples or specimens. Dataset An electronically stored collection of data and associated files The data contained in a dataset may be from primary data collection (e.g. a survey) or secondary data generation via aggregation or synthesis. Datasets may contain one or more © 2013 Bill & Melinda Gates Foundation / For Internal Use Only Data Access Requirements, Guiding Principles and Definitions v1.1 files and should include files that contain the data themselves; that document and explain the individual variables; and that explain on the collection or synthesis methodology. Some of the information describing the data may be contained in ‘metadata’ stored with the dataset (see below). Data Repository or Enclave Information stored electronically with or as part of the dataset and should be provided along with the data whenever they are downloaded, accessed, or shared. This may include items such as: An online storage solution for datasets that meets the following set of criteria and satisfies the Data Access Principles Metadata Data must be accessible for a minimum of 5 years Data should be easily discoverable through conventional search mechanisms by an informed lay person (e.g. researchers and graduate students in the field) Metadata on the dataset should be made available (see Metadata definition below) Data must be anonymized to protect individual personal identifiable information (PII) Open data platforms should honor any special ownership and access preferences as agreed between the foundation and the data producer; data access may be limited to a specific audience or granted on a case by case basis Discoverable (or Findable) Year of data production Content Data Dictionary Known Data Quality profile/issues Data completeness Other salient features of the data and dataset Methodology used to collect/compile/create data Research Outputs Reports, publications, scientific presentations, policy briefs, working papers that present summary statistics, analysis and conclusions derived from primary or secondary data. Research outputs are distinct from datasets. Reporting on or sharing research results and outputs (e.g. summary statistics or tables) fulfills some of the objectives of the foundation’s global access principles, but does not satisfy the requirement of data access. Datasets are discoverable when reference links to the datasets are included in online directories (e.g. from repositories); a reference link to the dataset is provided in any publications or reports, or on the project/institution website; and/or returned when running a standard internet search A common internet search engine should return a clear description of the data and a working link to the dataset or the repository where the data are housed. Page 3 of 3 © 2013 Bill & Melinda Gates Foundation / For Internal Use Only Data Access Requirements, Guiding Principles and Definitions v1.1