Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

advertisement
Integrative Analysis of Pathology, Radiology
and High Throughput Molecular Data
Joel Saltz MD, PhD
Director Center for Comprehensive
Informatics
Objectives
• Reproducible anatomic/functional characterization at
gross level (Radiology) and fine level (Pathology)
• Integration of anatomic/functional characterization with
multiple types of “omic” information
• Create categories of jointly classified data to describe
pathophysiology, predict prognosis, response to
treatment
• Data modeling standards, semantics
• Data research issues
In Silico Program Objectives (from NCI)
• In silico is an expression used to mean "performed on computer or via
computer simulation.“ (Wikipedia)
• In silico science centers: support investigator-initiated, hypothesisdriven research in the etiology, treatment, and prevention of cancer
using in silico methods
– Generating and publishing novel cancer research findings leveraging
caBIG tools and infrastructure
– Identifying novel bioinformatics processes and tools to exploit existing
data resources
• Encouraging the development of additional data resources and caBIG
analytic services
• Assessing the capabilities of current caBIG tools
• Emory, Columbia, Georgetown, Fred Hutchinson Cancer ,
Translational Genomics Research Institute
In Silico Center for
Brain Tumor Research
Specific Aims:
1.
Influence of necrosis/
hypoxia on gene expression and
genetic classification.
2. Molecular correlates of high
resolution nuclear morphometry.
3.
Gene expression profiles
that predict glioma progression.
4.
Molecular correlates of MRI
enhancement patterns.
Integrative Analysis: Tumor Microenvironment
• Structural and functional
differentiation within tumor
• Molecular pathways are time
and space dependent
• “Field effects” – gradient of
genetic, epigenetic changes
• Radiology, microscopy, high
throughput genetic, genomic,
epigenetic studies, flow
cytometry, microCT,
nanotechologies …
• Create biomarkers to
understand disease
progression, response to
treatment
Tumors are organs consisting of
many interdependent cell types
•
From John E. Niederhuber, M.D. Director
National Cancer Institute, NIH presented at
Integrating and Leveraging the Physical
Sciences to Open a New Frontier in Oncology,
Feb 2008
Informatics Requirements
•Parallel initiatives
Pathology, Radiology,
“omics”
•Exploit synergies
between all initiatives
to improve ability to
forecast survival &
response.
Radiology
Imaging
Patient
Outco
me
“Omic”
Data
Pathologic
Features
In Silico Center for
Brain Tumor Research
Key Data Sets
REMBRANDT: Gene expression and genomics data
set of all glioma subtypes
The Cancer Genome Atlas (TCGA): Rich “omics” set
of GBM, digitized Pathology and Radiology
Vasari Feature Set: Standardized annotation of
gliomas of all subtypes
TCGA Research Network
Digital Pathology
Neuroimaging
Distinguishing Among the Gliomas
“There are also many cells which appear to be
transitions between gigantic oligodendroglia
and astrocytes. It is impossible to classify
them as belonging in either group”
Bailey P, Bucy PC. Oligodendrogliomas of the brain.
J Pathol Bacteriol 1929: 32:735
Nuclear Qualities
Oligodendroglioma
Astrocytoma
Progression to GBM
Anaplastic Astrocytoma
(WHO grade III)
Glioblastoma
(WHO grade IV)
TCGA Neuropathology Attributes
120 TCGA specimens; 3 Reviewers
Presence and Degree of:
Microvascular hyperplasia
 Complex/glomeruloid
 Endothelial hyperplasia
Necrosis
 Pseudopalisading pattern
 Zonal necrosis
Inflammation
 Macrophages/histiocytes
 Lymphocytes
 Neutrophils
Differentiation:
 Small cell component
 Gemistocytes
 Oligodendroglial
 Multi-nucleated/giant cells
 Epithelial metaplasia
 Mesenchymal metaplasia
Other Features
 Perineuronal/perivascular
satellitosis
 Entrapped gray or white matter
 Micro-mineralization
TCGA Whole Slide Images
Feature Extraction
Jun Kong
Astrocytoma vs Oligodendroglima
Overlap in genetics, gene expression, histology
Astrocytoma vs Oligodendroglima
• Assess nuclear size (area and
perimeter), shape (eccentricity,
circularity major axis, minor axis, Fourier
shape descriptor and extent ratio),
intensity (average, maximum, minimum,
standard error) and texture (entropy,
energy, skewness and kurtosis).
Nuclear Qualities
Which features carry most prognostic significance?
Which features correlate with genetic alterations?
Machine-based Classification of TCGA GBMs (J Kong)
Whole slide scans from 14 TCGA GBMS (69 slides)
7 purely astrocytic in morphology; 7 with 2+ oligo component
399,233 nuclei analyzed for astro/oligo features
Cases were categorized based on ratio of oligo/astro cells
TCGA Gene
Expression Query:
c-Met overexpression
Nuclear Feature Analysis: TCGA
• Using the parallel computation infrastructure of Sun Grid
Engine, we analyzed image tiles of 4096x4096 of 213 wholeslide TCGA images of permanent tissue sections.
•
Approximately 90 million nuclei segmented.
• 79 patients:
57 are diagnosed as GBM (‘oligo 0’)
17 are classified as GBM with ‘oligo 1’,
5 as GBM with ‘oligo 2+’.
• With each data file including all nuclear features from one
patient, all nuclei were classified with color blue, green, and
red representing nuclei scored as 1~3, 4~6, and 7~10,
respectively.
Nuclear Feature Analysis: TCGA
• The ratio of nuclei classified ≤ 5 to >5 was computed for each
79 patients.
• Ratios associated with ‘oligo 0’ and ‘oligo 2+’ patient
populations were compared with two-sample t-test (p=0.0145)
Discriminating Features (Grade 1 vs. Grade 7-10)
Discriminating Features (Grade 1 vs. Grade 7-10)
Discriminating Features (Grade 1 vs. Grade 7-10)
Determine the influence of tumor
microenvironment on gene expression
profiling and genetic classification using
TCGA data
GBM
Necrosis =
Severe Hypoxia
Microvascular
Hyperplasia
3 Gene Families are Altered in GBM:
RTK, p53 and RB
GBM: necrosis, hypoxia, angiogenesis
and gene expression
• Does the presence or degree of necrosis within
digitized frozen section slides correlate with specific
gene expression patterns or determine algorithmbased unsupervised clustering of GBMs gene
expression categories?
• Does the presence or degree of necrosis influence
the type of angiogenesis or pro-angiogenic gene
expression patterns within human gliomas?
GBM:
% Necrosis
TCGA: GBM Frozen Sections
• 179 cases were assessed for % necrosis on frozen
section slides for TCGA quality assurance.
• Cox-based regression analysis for % necrosis vs. gene
expression (795 probe sets; 647 distinct genes)
Network Analysis based on % Necrosis
Carlos
Moreno
David
Gutman
Aim 4. Identify correlates of MRI enhancement
patterns in astrocytic neoplasms with
underlying vascular changes and gene
expression profiles.
No enhancement
Normal Vessels
Stable lesion
?
Rim-enhancement
Vascular Changes
Rapid progression
Angiogenesis Segmentation
H&E
Image
Color
Deconvolution
Hematoxylin
Image
Eosin
Image
Eosin intensity image
Angiogenic Segmentation
Eosin
Image
Spatial
Norm.
Density
Calculation
Density
Image
Object
ID
Boundary
Smoothing
Density
Image
Segmented
Vessels
States of Angiogenesis
Endothelial Hypertrophy
Endothelial Hyperplasia
Complex Microvascular
Hyperplasia
Lee Cooper
Sharath Cholleti
Vessel Characterization
• Bifurcation detection
Vasari Imaging Criteria
(Adam Flanders, TJU; Dan Rubin, Stanford, Lori Dodd, NCI)
• Require standardized validated feature sets to
describe de novo disease.
• Fundamental obstacle to new imaging criteria
as treatment biomarkers is
lack of standard terminology:
– To define a comprehensive set of imaging
features of cancer
– For reporting imaging results
– To provide a more quantitative, reproducible
basis for assessing baseline disease and
treatment response
Defining Rich Set of Qualitative and
Quantitative Image Biomarkers
• Community-driven ontology development
project; collaboration with ASNR
• Imaging features (5 categories)
– Location of lesion
– Morphology of lesion margin (definition, thickness,
enhancement, diffusion)
– Morphology of lesion substance (enhancement, PS
characteristics, focality/multicentricity, necrosis, cysts, midline
invasion, cortical involvement, T1/FLAIR ratio)
– Alterations in vicinity of lesion (edema, edema
crossing midline, hemorrhage, pial invasion, ependymal invasion,
satellites, deep WM invasion, calvarial remodeling)
– Resection features (extent of nCE tissue, CE tissue, resected
components)
Results: Reader Agreement
• High inter-observer agreement among
the three readers
– (kappa = 0.68, p<0.001)
• Percentage agreement was also high for most features
individually
– 22 of 30 features (73%) had agreement greater than
50%
– Twelve features (40%) had >80% agreement
– No feature had less than 20% agreement
• Feature agreement rose substantially when used with
tolerance (+/- 1).
Preliminary Relationships of Features to Survival
• Cox proportional hazards models were fit to
each of the thirty features related to overall
survival.
• Features associated with lower survival
included (p<.0001):
– Proportion of enhancing tissue at baseline.
– Thick or nodular enhancement characteristics.
– Contralateral hemisphere invasion.
• Proportion of non contrast enhancing tumor
(nCET) had positive correlation with survival.
• Tumor size at baseline had no relationship to
survival.
Coupling silico methodology with a clinical trial:
Will Treatment work and if not, why not?
Example: Avastin and Glioblastoma as
in RTOG-0825 (plus institutional
accrual)
Treatment: Radiation therapy and
Avastin (anti angiogenesis)
Predict and Explain: Genetic, gene
expression, microRNA, Pathology,
Imaging
Imaging/RT reproducibility, Integration
with EMR, PACS, RT systems
Overview
Crucial to Leverage Institutional
Data
Acquisition
ADT
Lab
Respiratory
Blood
Endoscopy
Cardiology
Siemens Img
Transfer
Real
time
CPOE
OR system
Daily
Patient Mgmt
Dictated reports
Pathology reports
Patient Billing
Practice Plans
Weekly
Pt Satisfaction
Monthly
Cancer Genetics
Wound
Images
Tissue
Web
Pulmonary
Genomic Data
Error Report
D
A
T
A
I
N
T
E
G
R
A
T
I
O
N
Information Warehouse
User Access
Multi-Dimensional
Analysis & Data
Mining
Ad-hoc
Query
Business
Clinical
Text Mining, NLP
Meta Data
Image Analysis
Research
External
Web Scorecards
& Dashboards
DeIdentification
Honest Broker
Ohio State Information Warehouse Infrastructure
Wound
Center
Research
Benchmarking
Annotations and Imaging Markup (AIM)
Annotations and Imaging Markup Developer (AIM) provides a standard for medical image annotation and markup
for images used in the research space, and in particular, the image based cancer clinical trial. It is notable that
there is no existing standard for radiology annotation and markup. The caBIG® program is working with almost
every standards body such as DICOM to elicit consensus regarding use of AIM as the accepted standard for
radiology annotation and markup, and is positioned to extend AIM to digital pathology.
The pixel at the tip of the
arrow [coordinates (x,y)]
in this image
[DICOM: 1.2.814.234543.23243]
represents the
Ascending Thoracic Aorta
[SNOMED:A3310657]
Aim Data Service Emory (AIME) is a caGrid data
service that manages AIM documents in XML
databases. AIME supports query,
enumerationQuery, queryByTransfer, submit
and submitByTransfer methods
MicroAIM
• Provide a semantically enabled data model for pathology
and microscopy image markups and observations
• Goal: interoperable data exchange and knowledge
sharing
• Ongoing collaborative efforts to standardize via caBIG
and Association for Pathology Informatics
• microAIM data service provides comprehensive query
support, and can be efficiently implemented either
through an XML-based or a relational based approach.
Pathology AIM; Semantic Annotation and Spatial
Reasoning
Observation on a single or multiple objects
Support multiple objects and associate
different observations to each object
Calculation on a single or multiple objects
Class of type to represent mask or field
value
Represent provenance information for
computed markup and annotation
Identify all instances of a set of cell types
Complex spatial queries: Quantify areas of
glomeruloid microvascular proliferation
within X microns of pseudopalisading
necrosis
Use of caBIG Tools in Emory in Silico Brain
Tumor
Research
Center
Enterprise Architecture 2.0
VCDE/
ARCH
In vivo
Imaging
ICR
Osirix
NBIA
caB2B
TCGA Portal
TCGA Portal
Clear Canvas
Science
caIntegrator 2
AIME
XIP/AVT
CIP:
Research &
Clinical Trials
TBPT
Modified Workstation
Extended AIME Service
CBIIT Research
Team
Carl Schaefer
Jinghui Zhang
Data Science Research Challenges Driven by
In Silico Discovery Research
• Data integration that targets multiple data sources with
conflicting metadata and conflicting data
• Efficient methods for semantic query that targets
questions involving complex multi-scale features
associated with petascale and exascale ensembles of
highly annotated images
• Computer assisted annotation and markup for very large
datasets
• Systems to support combinations of structured and
irregular accesses to exascale datasets
Data Science Research Challenges
• Structural and semantic metadata management: how to
manage tradeoff between flexibility and curation
• Data and semantic modeling infrastructures and policies
able to scale to handle distributed systems with an
aggregate of 10*9 or more data models/concepts
• Three dimensional (time dependent) reconstruction,
feature detection and annotation of 3-D microscopy
imagery
• Workflow infrastructure for large scale data intensive
computations
Final Data Science Challenge:
Large Dataset Size
– Basic small mouse is 10 cm3
– 1 μ resolution – very roughly
1013 bytes/mouse
– Molecular data (spatial location)
multiply by 102
– Vary genetic composition,
environmental manipulation,
systematic mechanisms for
varying genetic expression;
multiply by 103
Total: 1018 bytes per big science
animal experiment
Thanks to:
•
•
•
•
•
•
In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish
Sharma, Tony Pan, David Gutman, Jun Kong, Sharath Cholleti, Carlos
Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen,
Adam Flanders, Joel Saltz (Director)
caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella
co-Directors; Tahsin Kurc, Himanshu Rathod Emory leads
caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam
Flanders, David Channon, Daniel Rubin, Fred Prior, Larry Tarbox and
many others
In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz
Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi
Schreibmann, Paul Pantalone
Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti,
Fusheng Wang, Tony Pan, Tahsin Kurc, Ashish Sharma, David
Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J.
Foran (Rutgers)
Thanks!
Download