ImmPort - Buffalo Ontology Site

advertisement
Ontology-Informed Knowledge Representation in the
Immunology Database and Analysis Portal (ImmPort)
Megan Kong1, Carl Dahlke2, Diane Xiang1, David Karp4, Richard H. Scheuermann1,3
1Department
of Pathology, 3Division of Biomedical Informatics, 4Division of
Rheumatology, U.T. Southwestern Medical Center, Dallas, TX, 2 Health Information
Systems, Northrop Grumman, Inc., Rockville, MD
1
ImmPort Purpose and History
•
NIH/NIAID/DAIT would like to:
– maximize the return on the public investment in basic, translational and clinical
research
– allow investigators to more effectively extract meaningful information from the vast
amounts of data generated from advanced research technologies
– => data sharing policy
•
Bioinformatics Integration Support Contract (BISC) to support data sharing for all
DAIT-funded programs - basic, translational and clinical research
•
Immunology Database and Analysis Portal (ImmPort) - www.ImmPort.org
– Archive and manage basic and clinical research data
– Integrate these research data with extensive biological knowledge
– Support analysis of these integrated data
Home page
• Support for many large projects that use a
variety of different experiment
methodologies, including FCM
Browse Data/ ImmPort Research Data/ ImmPort Supported Programs
ImmPort Research Data | My Work Bench
Immune Function and Biodefense in Children, Elderly, and Immunocompromised Populations Program
Population Genetics Analysis Program: Immunity to Vaccines/Infections Program
HLA Region Genetics in Immune-Mediated Diseases Program
Immune Modeling Centers
Other Consortium Projects
Browse Data/ ImmPort Research Data/ ImmPort Supported Programs
ImmPort Research Data | My Work Bench
Immune Function and Biodefense in Children, Elderly, and Immunocompromised Populations Program
Grants/Contracts/Projects:
Project Title
An Improved Influenza A Vaccine for Rapid
Protection of the Elderly
Institution
The Wistar Institute
Principle
Investigator
Hildegund C. J. Ertl,
M.D.
Objective
To conduct pre-clinical studies needed to develop a vaccine
for the elderly that would provide protection in the event of a
bioterror attack with influenza A virus.
To determine the effects of chronic immunosuppressive
therapies on adaptive, innate and specific immunity
To propose a comprehensive analysis of the immunologic
Kinetic analysis of immunologic repletion and Children´s Hospital of
Sullivan, Kathleen,
response to killed trivalent influenza vaccine in different
influenza vaccine responsiveness
Philadelphia
Ph.D.
immunocompromised populations in order to understand
how to improve vaccine responses
To define comprehensively and in molecular and cellular
Immune Function and Biodefense in Children,
detail differences in recognition and response to microbeElderly, and Immunocompromised Populations:
Wilson, Christopher, derived danger signals between adults, neonates and infants,
University of Washington
TLRs in Innate Immunity and the Induction of
M.D.
and how these, in turn, contribute to differences in innate
Adaptive Immunity in the Neonate and Infant
immunity and the induction of antigen-specific (adaptive)
immunity
Oklahoma Medical
To understand why many patients with systemic lupus
Responses to Influenza Vaccination in Systemic
Thompson, Linda,
Research Foundation
erythematosus (SLE) fail to make adequate responses to
Lupus
Ph.D.
(OMRF)
immunization with the influenza vaccine.
To characterize immune markers and mechanisms in the
Immune function and Biodefense in Children, Oregon Health and Science Nikolich-Zugich,
elderly that determine their vulnerability to infectious and
Elderly and Immunocompromised Populations University
Janko, M.D., Ph.D.
bioterrorism agents in categories A-C.
To determine whether the different trimesters of pregnancy,
characterized by unique hormonal environments, are
Immune Response to Virus Infection During
Mt. Sinai School of
associated with (a) identifiable, discrete changes in maternal
Moran, Thomas, Ph.D.
Pregnancy
Medicine
systemic immunity and/or (b) recognizable alterations in
susceptibility to select bio-defense pathogens and/or (c)
differential responses to influenza vaccination
To identify the specific immune defects that make
Rochester Biodefense
University of Rochester
Sanz, Ignacio, M.D.
immunocomprised populations specially susceptible at
bioterrorists attack.
This proposal will explore the hypothesis that altered innate
Innate Immune Responsiveness in the Elderly
immune responsiveness in the elderly and the
Yale School of Medicine Fikrig, Erol, M.D.
and the Immunosuppressed
immunosuppressed contributes to vaccine unresponsiveness
or disease susceptibility.
Protective Immunity in Transplant Recipients
Emory Transplant Center
Larsen, Christian, Ph.D.
Number of
Subjects
Mechanistic Assays
600
ELISA, ELISPOT, Flow
Cytometry
90
ELISA, ELISPOT,
Microarry, RT - PCR
54
ELISPOT, Sequencing,
Flow Cytometry
17 adults, 17
neonates
RT-PCR, ELISA, Flow
Cytometry, Microarray
60
ELISPOT, ELISA,
multiplex RT-PCR
130
Flow Cytometry, ELISA,
RT-PCR, Gene expression,
75
Flow Cytometry, multiplex
ELISA, RT-PCR
280
Flow Cytometry
1160
SNP genotyping, Flow
Cytometry, ELISA
ImmPort Overview - System Components
•
Semi-public web-based database and analysis
portal
–
–
•
–
Reference data
•
–
•
•
•
Interpreted results
Analytical metadata
Clinical research data
•
•
Study design – from clinical protocol
Participant characteristics/phenotype – from
assessments captured in CRF’s and laboratory testing
Query tools
–
–
–
–
To support retrieval of reference and experiment
data based on specified criteria
Pre-defined QBE
Customized semantic queries
Statistical analysis of distributions
Biological network analysis
•
•
•
Standard FACS statistics
Novel population identification based on high
dimensional data clustering
Measurement of immune response (e.g., ELISA,
ELISPOT)
•
–
Filtering/normalization
Clustering
Classification
Cell population analysis
•
•
–
LD analysis
TagSNP selection
Haplotype reconstruction
Genotype-phenotype association
Gene expression analysis
•
•
•
Methodologies – microarrays, SNP genotyping, flow
cytometry, ELISA, ELISPOT, qPCR, imaging, etc.
Metadata (defining how the experiment was
performed) - common features of all experiments
Primary results - from all experiment measurement
techniques
Processed results
–
–
–
–
Types - Gene structure, protein function,
polymorphisms, metabolic, regulatory, signaling and
other networks, protein-protein, gene-gene, hostpathogen interactions
Sources - NCBI, Uniprot, Swissprot, BIND, Reactome
Experiment data
•
Genetic analysis
•
•
•
•
Multi-level access control
Data sharing
•
•
Analysis tools
Data
–
•
•
Quantification of topological parameters
Module identification
Visualization tools
–
–
–
–
–
Genome display, including genes, introns, exons,
SNPs, tagSNPs, etc.
Genetic analysis results
Gene expression results
Networks, pathways, and molecular interactions
Graphing and charting of statistical results
Ontology
–
–
Thesaurus function
Organize terms and define relationships
7
Varicella:
Immune Response to Varicella Vaccination in Subjects
with Atopic Dermatitis Compared to Nonatopic
Controls
Objectives:
To determine if children with AD have Varicellaspecific cell mediated immune (CMI) responses
to varicella vaccination that differ from those of
nonatopic controls (Th2)
Endpoint:
1.
2.
3.
ELISPOT: number of Varicella-specific T-cell
ELISA: levels of Varicella-specific antibodies
Flow cytometry: perforin generation using 2color flow cytometry staining for both CD8 and
perforin.
(Perforin levels, which may contribute to defects
in CD8+ cytotoxic T lymphocyte function).
8
9
Driving Projects - ADVN
10
Why OBX?
• Challenges
–
–
–
–
–
Extensive phenotypic characteristics of interest
Complex study designs of treatments, assessments, samplings, etc.
Few standards for how they are described or captured
Each supported project captures their data in a different format
Integration with mechanistic studies of specimens
11
Why OBX?
• Challenges
–
–
–
–
–
Extensive phenotypic characteristics of interest
Complex study designs of treatments, assessments, samplings, etc.
Few standards for how they are described or captured
Each supported project captures their data in a different format
Integration with mechanistic studies of specimens
• Evaluation of existing clinical data standards
– BRIDG – designed for supporting the conduct of clinical trials, not a study
repository optimized for analysis
– CDISC STDM – specification for SAS transport files to the FDA
– CDISC CDASH – models clinical encounters, but not sequence of events;
requires use of EPOCHs (contiguous blocks of time)
– None designed to manage mechanistic studies
12
Why OBX?
• Challenges
–
–
–
–
–
Extensive phenotypic characteristics of interest
Complex study designs of treatments, assessments, samplings, etc.
Few standards for how they are described or captured
Each supported project captures their data in a different format
Integration with mechanistic studies of specimens
• Evaluation of existing clinical data standards
– BRIDG – designed for supporting the conduct of clinical trials, not a study
repository optimized for analysis
– CDISC STDM – specification for SAS transport files to the FDA
– CDISC CDASH – models clinical encounters, but not sequence of events;
requires use of EPOCHs (contiguous blocks of time)
– None designed to manage mechanistic studies
• Ontology-Based eXtensible Data Model (OBX)
– Logical ontology structure (BFO/OBI) => extensible data model
– Data element value sets derived from ontology hierarchy
13
Entities of Interest
• Objects
–
–
–
–
•
Human subjects
Specimens
Materials
Plans
Qualities – clinical phenotype
– Malar rash
– WBC count
– Weight
– Blood glucose levels
– Atopic dermatitis
• Processes
–
–
–
–
Recruitment, enrollment, approval
Assays, physical exams
Blood draw, PBMC isolation
Treatment, surgery (interventions)
• Context
– Time
– Environment
• Roles
– Principal investigator, study
coordinator, study participant
– Reagent, therapeutic
14
OBX Framework
15
OBX Study Design
16
OBX Biomaterial Transformation
17
Assay Database Schema
18
Lab test mapping into OBX
Blood Test:
ADVN Fields
ADVN Fields Description
ImmPort Table
ID
Subject ID
Lab Test
SITE
Study Site
will not capture
VISITDT
VISIT Date
Lab Test
HGB
Hemoglobin (g/dl):
Lab Test
CSHGB
Hemoglobin Clinical Significant
Lab Test
HCT
Hematocrit (%):
Lab Test
CSHCT
Hematocrit Clinical Significant
Lab Test
WBC
Total WBC (x1000/μl):
Lab Test
CSWBC
Total WBC Clinical Significant
Lab Test
NEUT
Neutrophils (x1000/microliter):
Lab Test
CSNEUT
Neutrophils Clinical Significant
Lab Test
LYMP
Lymphocytes (x1000/μl):
Lab Test
CSLYMP
Lymphocytes Clinical Significant
Lab Test
EOSIN
Eosinophils (x1000/μl):
Lab Test
CSEOS
Eosinophils Clinical Significant
Lab Test
ImmPort Attribute
Subject ID
Visit Date
Type = Blood Test
Subtype = Hemoglobin Measurement
Test Result Value = HGB
Test Result Unit = grams per deciliter
Clinical Significance = CSHGB
Type = Blood Test
Subtype = Hematocrit Measurement
Test Result Value = HCT
Test Result Unit = percentage
Clinical Significance = CSHCT
Type = Blood Test
Subtype = Total White Blood Cell
Measurement
Test Result Value = HGB
Test Result Unit = 1000/μl
Clinical Significance = CSHGB
Type = Blood Test
Subtype = Neutrophils Measurement
Test Result Value = NEUT
Test Result Unit = 1000/μl
Clinical Significance = CSNEUT
Type = Blood Test
Subtype = Lymphocytes Measurement
Test Result Value = LYMP
Test Result Unit = 1000/μl
Clinical Significance = CSLYMP
Type = Blood Test
Subtype = Eosinophils Measurement
Test Result Value = EOSIN
Test Result Unit = 1000/μl
19
Clinical Significance = CSEOS
Load File Examples
20
FLOCK v2.0 STEPS
1. File Conversion - Convert binary .fcs file into a data matrix
2. Data Cleansing - Remove boundary events (noise) in FSC and SSC dimensions
3. Data Shrinking - Collapse data toward distribution modes
4. Normalization - Z-score normalization for values in each dimension ((xi - µ)/SD)
5. Dimension Selection - Select most informative dimensions based on measures of dispersion and
distortion
6. FLOCK LoD
i. Partition each dimension to generate a hyper-grid
ii. Identify dense hyper-regions in hyper-grid
iii. Merge neighboring dense hyper-regions to define hyper-region groups (n)
iv. Determine centroids for each hyper-region group
v. Use n centroids to seed single round of distance-based clustering
7. FLOCK HiD - Refine population definition based on histogram partitioning
8. Group Merging - Merge close hyper-region groups based on [distance metric]
9. Centroid Calculation - Compute centroid for each hyper-region group
10. Clustering - Cluster events to nearest centroid
11. Population statistics - Summarize population proportions, intensity levels, etc.
12. Visualization
17 B Cell Populations in Blood
A
PB
GSM
GNSM
UM3-4
CD38
B220
CD27
UM1-2
N1-3
DNM
IgD
CD24
IgG
Population characteristics
Populationa
N1
N2
N3
Colorb
Gray
Magenta
Purple
CD27c
-
IgDc
+
+
+
IgGc
-
CD38c
+
+
CD24c
int
+
+
B220c
+
+
low
Proportiond
48.94%
4.69%
4.41%
Putative cell typea
naïve (CD38+)[Bm2?]
naïve (CD38-)
naïve (CD38+B220low)
UM1
UM2
UM3
UM4
Darkred
Salmon
Darkblue
Green
+
+
+
+
+
+
int
int
-
+
+
-
+
+
+
+
+
+
low
low
1.55%
0.94%
6.16%
11.50%
unswitched memory (CD38+)
unswitched memory (CD38-)[Bm1?]
IgDlow unswitched memory (CD38+)
IgDlow unswitched memory (CD38-)
GSM1
GSM2
GSM3
Grayishgreen
Yellow
Blue
+
+
+
+
-
+
+
+
+
+
-
+
+
+
+
low
low
0.36%
4.05%
4.40%
switching memory (IgD+IgG+CD38+)
switched memory (CD38+)[early Bm5?]
switched memory (CD38-)[late Bm5?]
GNSM1
GNSM2
GNSM3
GNSM4
Cyan
Darkgreen
Teal
Orange
+
+
+
+
-
-
+
+
-
+
+
+
-
low
low
+
low
4.84%
3.84%
1.30%
0.51%
IgD-IgG- memory
IgD-IgG- memory
IgD-IgG- memory
IgD-IgG- memory
DNSM1
DNSM2
Pink
Darkgray
-
-
+
-
-
-
+
+
0.85%
0.91%
double negative memory (IgG+)
double negative memory (IgG-)
PB
Red
high
-
-
high
-
low
0.75%
plasmablasts
Summary Statistics
B cell component of the Cell Ontology
http://www.obofoundry.org/
Summary
• OBX is an ontology-informed clinical research data model
• Ontological framework incorporated provides for
extensibility and straightforward incorporation of ontology
terms as value sets
• Ontological framework also facilitated distributed data
model development and guides ongoing mapping strategies
• Successfully loaded several large, distinct clinical studies
• Developed user interfaces for data extraction and analysis
• Refinement through use – mapping, loading, extracting,
analysis
• Term submission to OBO Foundry ontologies, especially
OBI
45
Acknowledgments
UT Southwestern
Yu (Max) Qian
Jamie Lee
Megan Kong
Jennifer Cai
Jie Huang
Nishanth Marthandan
Diane Xiang
Young Bun Kim
Paula Guidry
Eva Sadat
David Karp
Northrop Grumman
Carl Dahlke
John Campbell
Liz Thompson
Jeff Wiser
Mike Attasi
OBO Foundry
BFO
OBI
Supported by NIH N01AI40076
46
Download