NIH and Biomedical ‘Big Data’ Belinda Seto, Ph.D. Deputy Director

advertisement
NIH and Biomedical ‘Big Data’
Belinda Seto, Ph.D.
Deputy Director
National Institute of Biomedical Imaging and Bioengineering
Myriad Data Types
Genomic
Other ‘Omic
Imaging
Phenotypic
Exposure
Clinical
Data and Informatics Working Group
acd.od.nih.gov/diwg.htm
Overarching Themes
 At a pivotal point:
Risk failing to capitalize on technology
advances
Bordering on “institutional malpractice”
 Cultural changes at NIH are essential
 Aim to develop new opportunities for:
Data sharing
Data analysis
Data integration
 Long-term NIH commitment is required
NIH is Tackling the ‘Big Data’ Problem
1. New NIH Leadership Position:
Associate Director for Data Science (ADDS)
2. New Internal NIH Governing/Oversight Body:
Scientific Data Council (SDC)
3. New Trans-NIH Initiative:
Big Data to Knowledge (BD2K)
What’s in a Name?
Big Data
Bioinformatics
Computational Biology
Biomedical Informatics
Information Science
Biostatistics
Quantitative Biology
Data Science
Associate Director for Data Science: Overview
 NIH Data Science ‘Programmatic Czar’
(aka, Point Person, Strategic Leader, etc.)
 Reports to NIH Director
 Eric Green, Acting
 Search underway (Eric Green & Jim Anderson,
Co-Chairs of Search Committee)
Associate Director for Data Science:
Responsibilities
 Principal advisor to NIH Director and NIH leadership
 Provides vision and leadership in data science
 Chair, Scientific Data Council (and thus chief steward of
Scientific Data Council responsibilities)
 Program lead for Big Data to Knowledge (BD2K)
 Coordinates data science activities, both within and
outside of NIH
 Leads long-term NIH strategic planning in data science
 NIH leader responsible for promoting trans-NIH, national,
and global policies for data sharing
 Coordination with NIH Chief Information Officer
Scientific Data Council: Overview
 High-level internal NIH group
 Chaired by Associate Director for Data Science
 Reports to NIH Steering Committee
 Trans-NIH representation
Scientific Data Council: Membership
Acting Chair:
Eric Green (Acting ADDS & NHGRI)
Members:
James Anderson (DPCPSI)
Sally Rockey (OER)
Michael Gottesman (OIR)
Kathy Hudson (OD)
Andrea Norris (CIT)
Judith Greenberg (NIGMS)
Betsy Humphreys (NLM)
Douglas Lowy (NCI)
John J. McGowan (NIAID)
Alan Koretsky (NINDS)
Michael Lauer (NHLBI)
Belinda Seto (NIBIB)
Acting Executive Secretary:
Allison Mandich (NHGRI)
Scientific Data Council:
Responsibilities
 Trans-NIH programmatic leadership and coordination
of data science activities
 Oversight of BD2K
 Trans-NIH intellectual and programmatic ‘Hub’ for data
science (coordination and convening functions)
 Coordination with data science activities beyond NIH
(e.g., other government agencies, other funding
agencies, and private sector)
 Long-term NIH strategic planning in data science
 Major role in data sharing policy development and
oversight
 Coordination with ‘parallel’ Administrative Data Council
Big Data to Knowledge (BD2K): Overview
 Major trans-NIH initiative addressing an NIH
imperative and key roadblock
 Aims to be catalytic and synergistic
 Overarching goal:
By the end of this decade, enable a quantum leap in
the ability of the biomedical research enterprise to
maximize the value of the growing volume and
complexity of biomedical data
http://bd2k.nih.gov
BD2K: Four Programmatic Areas
I. Facilitating Broad Use of Biomedical
Big Data
II. Developing and Disseminating
Analysis Methods and Software for
Biomedical Big Data
III. Enhancing Training for Biomedical
Big Data
IV. Establishing Centers of Excellence
for Biomedical Big Data
BD2K: Four Programmatic Areas
IA. Facilitating Broad Use of Biomedical
Big Data -- Data Catalog
Big
•
RFI responses received – June 25
• 62 responses received
•
Data Catalog Workshop held Aug 21, 22
•
•
Fran Berman, chair
Jenny Larkin (NHLBI), Ron Margolis (NIDDK),
co-organizers
BD2K: Four Programmatic Areas
IB. Facilitating Broad Use of Biomedical
Data – Data/Metadata Standards
•
Big Big
Frameworks for Community-based Standards
Efforts Workshop
•
•
•
September 25,26
Susanna Santone & David Kennedy, co-chairs
Mike Huerta (NLM), Leslie Derr (OD) co-org
BD2K: Four Programmatic Areas
IC. Facilitating Broad Use of Biomedical Big Data Enabling research use of clinical data
• Workshop September 11, 12
•
•
•
Robert Cardiff & Dan Masys, co-chairs
Leslie Derr (OD), Jerry Sheehan (NLM) co-org
Webcast w/ real-time, online discussion forum
• To identify actionable steps that NIH can
take to accelerate the use of clinical data in
research
• Near and long-term needs for research,
infrastructure, standards and policies
• Organizers are collecting information about
relevant initiatives
BD2K: Four Programmatic Areas
II. Developing and Disseminating Analysis Methods
and Software for Biomedical Big Data
• FOAs for BD2K-specific software needs in FY15
•
•
RFI issued August 8, responses due Sept 6
4 topic areas: data visualization,
compression/reduction, provenance, wrangling
• Software Catalogue Workshop:
•
•
Feb 18-19, 2014
Chairs: Asif Dhar and Owen White
BD2K: Four Programmatic Areas
II. Developing and Disseminating Analysis
Methods and Software for Biomedical Big Data
• Updated broad-based software development
FOAs (“BISTI”), notice of intent to publish
• Cloud computing:
• joint BD2K-Infrastructure Plus working
group initiated
• on-going discussion with NCI, joint survey
results being written up
• on-going discussion with commercial
providers.
BD2K: Four Programmatic Areas
II. Developing and Disseminating Analysis
Methods and Software for Biomedical Big Data
• Dynamic Community Engagement: micro-blog
and twitter developed for BD2K workshops
BD2K: Four Programmatic Areas
III. Enhancing Training for Biomedical
Big Data
•
RFI, >100 responses received
•
•
•
•
Workshop held July 29, 30
Karen Bandeen-Roche, Zak Kohane, co-chairs
Michelle Dunn (NCI), Bettie Graham (NHGRI),
organizers
Webcast, archived
BD2K: Four Programmatic Areas
III. Enhancing Training for Biomedical
Big Data – Workshop recommendations
•
•
•
•
•
Opportunity for extraction of knowledge from Big Data
is often highest at the interface of at least two
disciplines; training programs should be designed to
work at interfaces
Training programs should be designed to provide skills
to work effectively in Team Science
Dual mentoring should be encouraged
Flexibility needed to encourage innovation and to take
best advantage of local expertise and talent
Trainees need access to large data sets
BD2K: Four Programmatic Areas
III. Enhancing Training for Biomedical
Big Data – Workshop recommendations
•
•
•
•
•
Training in quantitative science and experimental
design will be increasingly important to clinical
researchers and even clinicians
Principles of reproducible research must be stressed
There are training needs across the full spectrum of
scientists, in terms of both experience and activities
The jobs that need to be done in effective Big Data
science may not correspond to traditional academic
jobs
A diverse workforce should be a major goal of data
science training activities
BD2K: Four Programmatic Areas
IV. Establishing Centers of Excellence
for Biomedical Big Data
•
Investigator-initiated centers
• FOA released July 22
• Applications due November 20
• Technical Information Webinar Sept 12
•
NIH-Initiated centers
• LINCS-BD2K Data Coordination and
Integration Center (+ $2.5M from Common
Fund)
• Principles being developed
Nature | News & Views:
Alzheimer's disease: From big data to mechanism
Vivek Swarup & Daniel H. Geschwind
This work is also exemplary in demonstrating the
extraordinary value of publicly available data
resources. Published data on human gene expression,
Alzheimer's disease GWAS and neuroimaging provide the
pillars of Rhinn and collaborators' paper. Integrative analyses
of these data by the authors, and previously by others, weaken
the view that substantive biological experimentation only
takes place at the wet bench, and highlight the value of
innovative re-analyses of existing data.
Nature | News & Views:
Alzheimer's disease: From big data to mechanism
Vivek Swarup & Daniel H. Geschwind
This work is also exemplary in demonstrating the
extraordinary value of publicly available data
resources. Published data on human gene expression,
Alzheimer's disease GWAS and neuroimaging provide the
pillars of Rhinn and collaborators' paper. Integrative analyses
of these data by the authors, and previously by others,
weaken the view that substantive biological
experimentation only takes place at the wet bench,
and highlight the value of innovative re-analyses of existing
data.
Nature | News & Views:
Alzheimer's disease: From big data to mechanism
Vivek Swarup & Daniel H. Geschwind
This work is also exemplary in demonstrating the
extraordinary value of publicly available data
resources. Published data on human gene expression,
Alzheimer's disease GWAS and neuroimaging provide the
pillars of Rhinn and collaborators' paper. Integrative analyses
of these data by the authors, and previously by others,
weaken the view that substantive biological
experimentation only takes place at the wet bench,
and highlight the value of innovative re-analyses of
existing data.
Closing Thoughts
 The biomedical research enterprise is
undergoing a major ‘phase change’ with
respect to Big Data and data science
 Trans-NIH problem needing trans-NIH solutions
 Solutions include multifaceted cultural changes
 New NIH plans are:
Mission critical
Transformational
Transitional-- en route to longer-term commitment
Questions?
Download