How to make ImmPort data
fit for secondary use
Barry Smith
http://ontology.buffalo.edu/smith
Goals of ImmPort
• Accelerate a more collaborative and coordinated
research environment
• Create an integrated database that broadens the
usefulness of scientific data
• Advance the pace and quality of scientific discovery
• Integrate relevant data sets from participating
laboratories, public and government databases, and
private data sources
• Promote rapid availability of important findings
• Provide analysis tools to advance immunological
research
Improve immunology research
through enhanced
•
•
•
•
•
Collaboration
Coordination
Discoverability
Integration
Analyzability
Hypothesis: all of these ends will be promoted by
describing ImmPort data using terms from shared
high quality ontologies
ImmPort data is already being tagged
with ontology terms
For example
• where data is prepared to meet FDA requirements
• where data is published to meet NIH mandates for
reusability
• in the post-submission phase, where data is analyzed by
third parties
But this tagging is
• partial
• uncoordinated
• uses ontologies and analysis tools of varying quality
SDY 165: Characterization of in vitro Stimulated
B Cells from Human Subjects shared to SemiPublic Workspace (SPW) Project
SDY 165: Characterization of in vitro Stimulated
B Cells from Human Subjects shared to SemiPublic Workspace (SPW) Project
During the human B cell (Bc) recall response, rapid cell division
results in multiple Bc subpopulations. RNA microarray and
functional analyses showed that proliferating CD27lo cells are a
transient pre-plasmablast population, expressing genes associated
with Bc receptor editing. Undivided cells had an active
transcriptional program of non-ASC B cell functions, including
cytokine secretion and costimulation, suggesting a link between
innate and adaptive Bc responses. Transcriptome analysis suggested
a gene regulatory network for CD27lo and CD27hi Bc
differentiation.
• In vitro stimulated B cells from human subjects
• B cell receptor editing
SDY 165: Characterization of in vitro Stimulated
B Cells from Human Subjects shared to SemiPublic Workspace (SPW) Project
Pubmed 22468229
Discoverability: examples
• Find [ImmPort] data pertaining to in vitro
stimulated B cells from human subjects
• Find studies of genes associated with B cell
receptor editing in human subjects
• Find all data in public and government
databases relating to B cell receptor editing
Discoverability through literature
search
Two queries:
– In vitro stimulated B cells from human subjects
– B cell receptor editing
on
• Pubmed
• MeSH (Medical Subject Headings)
• Google
Pubmed 22468229
PubMed retrieves 144 results for “In vitro
stimulated B cells from human Subjects” –
Zand paper not found
PubMed retrieves 0 results for
“Zand[Author] AND In vitro stimulated
B cells from human subjects”
Pubmed retrieves 179 results for “B cell
receptor editing” – Zand paper not found
MeSH results for “In vitro stimulated B
cells from human subjects”
MeSH results for “in vitro stimulated B
cells from human subjects”
MeSH results for “B Cell receptor
editing”
Google retrieves 180 results for “In vitro
stimulated B cells from human subjects” –
Zand paper not found
Jackpot
How to make this [ImmPort data]
SDY 165: Characterization of in vitro Stimulated B Cells from Human
Subjects shared to Semi-Public Workspace (SPW) Project
During the human B cell (Bc) recall response, rapid cell division results in
multiple Bc subpopulations. RNA microarray and functional analyses
showed that proliferating CD27lo cells are a transient pre-plasmablast
population, expressing genes associated with Bc receptor editing.
Undivided cells had an active transcriptional program of non-ASC B cell
functions, including cytokine secretion and costimulation, suggesting a
link between innate and adaptive Bc responses. Transcriptome analysis
suggested a gene regulatory network for CD27lo and CD27hi Bc
differentiation.
discoverable?
B cell receptor editing
GO:0002452
GO definition
GO provides a definition
and position in
GO hierarchy
-- hierarchy
allows logical
reasoning
GOPubMed: 179 results for “B cell
receptor editing”
(B cell receptor editing Zand) AND ("Zand"[au])
why are zero
documents
retrieved?
Proposal
1. Tag ImmPort SDY abstracts with GO URIs
2. Publish the results to the GO Annotation database
During the human B cell recall response, rapid cell division
results in multiple B cell subpopulations. RNA microarray
and functional analyses showed that proliferating CD27lo
cells are a transient pre-plasmablast population, expressing
genes associated with B cell receptor editing. Undivided
cells had an active transcriptional program of non-ASC B
cell functions, including cytokine secretion and
costimulation, suggesting a link between innate and
adaptive Bc responses. Transcriptome analysis suggested a
gene regulatory network for CD27lo and CD27hi Bc
But GO is not enough
See http://ncorwiki.buffalo.edu/index.php/
Immunology_Ontologies
immune disorders
infectious diseases
allergies
immune epitopes, etc. etc.
For special case of Flow Cytometry and CyTOF:
ImmPort Ontology Meeting, Stanford, September
4-5, 2013: http://x.co/1W1Om
Files in SDY 165
lk_race.txt
American Indian or Alaska Native
Asian
Black or African American
Native Hawaiian or Other Pacific Islander
Not_Specified
Other
Unknown
White
ImmPort Templates
https://immport.niaid.nih.gov/immportWeb/experimental/
displaySubmitTemplates.do
ImmPort Templates: Race
https://immport.niaid.nih.gov/immportWeb/experimental/
displaySubmitTemplates.do
ImmPort Templates
How specify Race if Race = ‘Other’?
ImmPort Templates
How specify “Subject Phenotype”?
NG / BISC proposal
create controlled vocabularies (ontology drop down
lists) for fields currently populated by submitters with
free text
Files in SDY 165
lk_sample_type
proposal: where controlled vocabularies exist,
provide definitions for all terms
Two kinds of definitions
• human readable definitions support consistency
of data entry
• logical definitions
– allow logical analysis of data
– support aggregation of data
– allow automatic validation of consistent data entry
Definitions can often be taken over from already
existing public domain ontologies such as GO
• use of ready-made definitions supports discoverability,
and creates automatic linkage to huge bodies of public
domain data
ImmPort Antibody Registry (Diehl, et al)
from BD Lyoplate Screening Panels Human Surface
Markers
Discoverability
Where did this lk_sample_type list
come from?
CDISC
• Clinical Data Interchange Standards
Consortium
• http://www.cdisc.org/
CDISC Glossary
SDTM
• Study Data Tabulation Model developed by
FDA as part of CDISC
– for Race, Gender, Ethnicity, …
– no human readable definitions
– no logical definitions
Jan 2013: release of CDISC SDTM Model by
CDISC2RDF (Kerstin Forsberg of AstraZeneca)
PHUSE (EU, Roche, AstraZeneca, FDA, …)
project to incorporate ontology technology
into CDISC
BRIDG
• http://bridgmodel.nci.nih.gov/files/BRIDG_M
odel_3.2_html/index.htm
• Biomedical Research Integrated Domain
Group (BRIDG) Project
BRIDG 3.2 Domain Analysis Model
Other strategies to simplify creation of
structured data for submission into
ImmPort
• ELN: Electronic Lab Notebooks
– PRIME: “Contur ELN has been automating the process
of data deposition into ImmPort, making it much
easier for our researchers to submit data to ImmPort”
• CTMS: Clinical Trial Management Systems
• EHR: Electronic Health Records
– experiments to prepopulate EHR data into CTMS and
from there into case report forms (and into ImmPort?)
• Minimal Information Checklists
MIFLOWCYT: Minimal Information for
a Flow Cytometry Experiment
Checklist strategy for creating public
data repositories via journals
• 75% of articles in Cytometry A are MiFlowCyt
compliant
• Result: a growing repository of flow cytometry
data (flowrepository.org)
• OBI = Ontology for Biomedical Investigations,
an ontology to support creation of structured
data about clinical and biological experiments
http://mibbi.sourceforge.net/portal.shtml
Proposal
advertise on ImmPort website best (= most
successful) practices from
• ELN: Electronic Lab Notebooks
• CTMS: Clinical Trial Management Systems
• EHR: Electronic Health Records
• Minimal Information Checklists
NIAID Sample Data Sharing Plan
(Last Reviewed February 12, 2013)
• Sharing of data generated by this project is an essential part of our proposed
activities and will be carried out in several different ways.
• Presentations at national scientific meetings. … it is expected that approximately
four presentations at national meetings would be appropriate. …
• Annual lectureship. A lectureship has brought to the University distinguished
scientists and clinicians …
• Newsletter. The [disease interest group] publishes a newsletter …
• Web site of the Interest Group. The [interest group] currently maintains a Web
site where information [about the disease] is posted …
• Annual [Disease] Awareness week….
• SAGE Library Data. It is our explicit intention that these [Serial analysis of gene
expression] data will be placed in a readily accessible public database. …
NIAID Sample Data Sharing Plan
• SAGE Library Data. It is our explicit intention
that these [Serial analysis of gene expression]
data will be placed in a readily accessible
public database. …
–but how will these data be described?
Proposal
All data sharing plans for NIAID-funded research
should require:
• paper abstracts and SDY summaries be tagged
with ontology terms
• tables and figures in papers be tagged with
ontology terms
See http://ncorwiki.buffalo.edu/index.php/
Immunology_Ontologies
ImmPort Ontology Meeting, Stanford,
September 4-5, 2013: http://x.co/1W1Om
Further information from phismith@buffalo.edu