Jerry Sheehan

advertisement
BioMedical Data
Everywhere:
Recent Developments in Data
Management and Policy at NIH
Jerry Sheehan
Assistant Director for Policy Development
National Library of Medicine - National Institutes of Health
sheehanjr@nlm.nih.gov
CASC Fall Meeting
September 8, 2011, Arlington, VA
National Library of Medicine:
More than a Library
•
World’s largest medical library
–
–
–
•
Intramural research laboratories
–
–
•
www.nlm.nih.gov
Lister Hill Nat’l Center for Biomedical Comms.
National Center for Biotechnology Information
Extramural research and training
–
–
•
•
>12 million physical artifacts (books, journals,
technical reports, photographs)
>22,000 print and electronic serial subscriptions
Historical collection of rare and old medical works
~ 100 research projects per year, $36M
18 funded research training sites, 250 trainees
Health data standards and vocabularies
Information resources and services
–
–
–
–
–
Publications and metadata
Genomic, chemical, clinical trial data
Environmental health and toxicology data
Disaster information services & systems
Medical images, analytical tools
2
NLM Information Resources
• Publications
– Citations/metadata (PubMed)
– Full-text articles (PubMed
Central)
• Data
– Genomic (GenBank, dbGaP, GEO,
GeneTest)
– Clinical trials (ClinicalTrials.gov)
– Drug (RxNorm, Daily Med, Pillbox)
– Chemical (PubChem)
– Environmental & toxicology
• Images
– Visible Human
– Spine x-rays, cervical images
– Historical photos
• Synthesized information
– Evidence summaries
– Guidelines
– Consumer health information
(MedlinePlus)
• Vocabulary resources
–
–
–
–
Unified Medical Language System
Standard clinical terms (SNOMED)
Health data interchange
Biomedical terms
• Software & Tools
–
–
–
–
APIs
Natural language processing
Image analysis
Mobile apps
3
4
PubMed/Medline: Journal Citations
http://www.pubmed.gov
CONTENT
• 21+ million citations
and abstracts
– 700,000 added per year
– 50%+ link to full text
• 5500+ journals
– 120-130 added per year
USAGE (2010)
• 120+ million visitors
• 2 million searches per
day
• 2.4 billion page views
• Google, Bing, others
• Content used by
outside developers
• Mobile version
Growth in Medline, the fully indexed subset of PubMed which accounts for approximately 90% of
all PubMed citations. Original graph: http://www.nlm.nih.gov/bsd/stats/cit_added.html
QUALITY
5
PubMed Central: Full-Text Articles
www.pubmedcentral.gov
+ 2.2 million full-text articles,
26 thousand more added per month
Typical weekday usage:
• 420,000 different users
• 740,000 articles retrieved
Annually
• ~ 99% of articles downloaded at least once
• 28% downloaded more than 100 times
6
ClincalTrials.gov
http://clinicaltrials.gov/
Registry and Results Database
• Federally and privately
supported trials
• Conducted in the United
States and 170+ countries
• Mandatory submission for
some trials
Current content
• 100,000+ registered trials
• 330 new registrations/week
• 3,000+ results (summary)
of approved products
o Outcome measures
o Statistical analyses
o Adverse events
Studies Registered at ClinicalTrials.gov since May 1, 2005
120,000
100,000
80,000
60,000
40,000
20,000
0
Usage (2010)
• 28,000 visitors per day
7
08-SEP-2011
CASC Fall Meeting
8
Repository for NIH-funded GWA studies
As of Aug 2011:
• 161 studies
• 2045 data sets
• 2727 documents
• 5890 Analyses
• 128190 Variables
9
• Database of biological
activities of small molecules
• Repository for data from
NIH Molecular Libraries
program
As of August, 2011:
• 85 million deposited substance records
o Representing more than 30 million chemically unique compounds
• 500 thousand bioassay records
o Representing more than 130 million experimental bioactivity results
10
08-SEP-2011
CASC Fall Meeting
11
ToxMap: Environmental Health Maps
12
Almost 900
In English & Spanish
> 170 tutorials
> 75 anatomy videos
> 125 surgery videos
~ 40,000 links
~1,000 drugs
100 supplements
>1,200 links to
ClinicalTrials.gov
15-20 stories
added daily
Since 2006
English &
bilingual issues
>40 languages
>250 topics
>3,300 links
Over 100
directories of
doctors, hospitals,
clinics & libraries
~ 3,500 articles
> 2,000 images
13
MedlinePlus: Trusted Health Information
www.medlineplus.gov
2.3M
128K
179K
1.5M
906K
208K
1.2M
174K
5.4M
1.8M
109K
3.2M
462K
403K
2.4M
436K
3.5M
507K
25.8M
298K ME
270K NH
240K VT
2.2M MA
307K RI
834K CT
4.1M NJ
117K DE
1.7M MD
120K
1.5M
1.6M
656K
1M
210K
10M
651K
1.9M
1.3M
306K
1.4M
623K
296K
711K
343K
725K
3.1M
322K
6.1M
765K
4.2M
Map of 100+ Million visits in the United States
in 2010
MEDLINEPLUS USAGE
150 million visitors in 2010
420,000 visitors per day.
MEDLINEPLUS MOBILE
Streamlines content specifically
tailored for users particular type of
cell phone or tablet.
MEDLINEPLUS CONNECT
Links from diagnosis, drug, and laboratory
information in EHR/PHR to relevant material in
MedlinePlus,
14
Genetic test means an analysis of human
DNA, RNA, chromosomes, proteins, or
metabolites, if the analysis detects
genotypes, mutations, or chromosomal
changes. Genetic test does not include an
analysis of proteins or metabolites that is
directly related to a manifested disease,
disorder, or pathological condition.
08-SEP-2011
15
08-SEP-2011
CASC Fall Meeting
16
NLM is Not Alone:
Growing interest in data at NIH
“[High throughput technologies] provide us with the
opportunity to ask questions that have the word ‘ALL’
in them. What are ALL the transcripts in a cell? What
are ALL the protein interactions? . .
Those kinds of questions are now approachable,
especially if we do the right job of making really
powerful databases publicly accessible to all those
who need them and empower investigators in small
labs as well as big labs to plunge into that kind of
mindset.”
- Francis S. Collins, MD, PhD [Director, NIH]
17
http://report.nih.gov/biennialreport/
http://report.nih.gov/UploadDocs/Biomed_Info_Resources_FY08_09.pdf
08-SEP-2011
18
http://report.nih.gov/UploadDocs/Biomed_Info_Resources_FY08_09.pdf
08-SEP-2011
19
Select NIH Data Initiatives
• NDAR – National Database for Autism Research (NIMH)
– Repository for NIH-funded autism studies and centers of excellence
– Genomic, phenotypic, imaging data and associated information
• ADNI – Alzheimer’s Disease Neuroimaging Initiative (NIA)
– Multisite study, public-private partership, validated biomarkers
– Centralized FMRI and PET data, linked clinical database
• NIDDK Data Repository
– Archival datasets from NIDDK-funded studies (diabetes, digestive, kidney)
– 29 datasets to-date; more than 100 access requests in 2009-10
• BTRIS – Biomedical Translational Research Information System (CC)
– Repository for data from NIH intramural clinical studies
– Allow aggregation and analysis across multiple Institute studies
20
Data Sharing Policies
NIH Public Access Policy (journal articles)
NIH
GWAS
Policy
dbGaP
NIH
Sequence
Data
Sharing
Policy
GenBank
GEO
Clinical
Trials Info
Clinical
Trials.gov
IC or domain-specific
policies
• Autism Research – National
Database for Autism Research
• NIAAA Genetics of
Alzheimer’s
• Alzheimer’s Disease
Neuroimaging Initiative (LONI
Repository)
• Others. . .
NIH Data Sharing Policy (data sharing plan)
21
Recent Guidance for NIH Data Sharing
Plans
http://grants.nih.gov/grants/sharing_key_elements_data_sharing_plan.pdf
22
NLM 175th Anniversary
08-SEP-2011
23
Download