Presentation Template with Photo Split Screen

Mining Medical Images
R. Bharat Rao
Glenn Fung
Balaji Krishnapuram
Jinbo Bi
Murat Dundar
Vikas Raykar
Shipeng Yu
Sriram Krishnan
Xiang Zhou
Arun Krishnan
Marcos Salganicoff
Luca Bogoni
Matthias Wolf
Anna Jerebko
Jonathan Stoeckel
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Outline of the talk

Mining medical images

Computer aided diagnosis (CAD)

Key data mining challenges

Clinical impact

Lessons learnt
Several thousand units of the products described in this paper have
been commercially deployed in hospitals around the world since 2004
Page 2
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Medical Imaging
 1895 X-ray used for broken bones, locating foreign objects
 1970 Computed tomography (CT) 3-D imaging
 As resolution increased in-vivo imaging is widely used to locate
medical abnormalities for diagnosis and surgery planning
Digital Mammogram
CT Scan
Page 3
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Mining medical imaging data
 Increased resolution has resulted in Data Overload
 Increased total study time
 Increase in data does not always translate to improved diagnosis
 Automatically extract the actionable information from the imaging data
 in order to ensure improvement in patient care
 simultaneous reduction in total study time
Clinically relevant information
Raw imaging data
Knowledge based
data-mining algorithms
Computer aided diagnosis/detection CAD
Page 4
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Computer-aided diagnosis/detection (CAD)
CAD technologies support the physician by drawing attention to structures in the image
that may require further review.
 Used as a second reader
 Improves the detection
performance of a radiologist
 Reduces mistakes related to
misinterpretation
 The principal value of CAD is
determined by carefully
measuring the incremental
value of CAD in normal
clinical practice
Page 5
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Lung CAD
Identify suspicious regions called nodules (which are
known to be precursors of cancer) in CT scans of the lung.
Page 6
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Colon PEV Polyp Enhanced Viewer
Identify suspicious regions called polyps in CT scans of
the colon.
Page 7
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Mammo CAD
Identify abnormal masses/calcifications in digital
mammograms.
PECAD and MammoCAD are only sold outside the US.
Page 8
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
PE CAD
Pulmonary Embolism (PE) is a sudden blockage in a pulmonary artery caused by an
embolus that is formed in one part of the body and travels to the lungs in the
bloodstream through the heart.
PECAD and MammoCAD are only sold outside the US.
Page 9
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
CAD
 Goal is to detect potentially malignant
 nodules (lung)
 polyps (colon)
 lesions (breast)
 Pulmonary emboli (lung)
in medical images like CT scans, X-ray, MRI, etc.
 Early detection provides the best prognosis
Page 10
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Typical CAD architecture
Potential candidates
Image [ X-ray | CT scan | MRI ]
Candidate Generation
> 90% sensitivity
60-300 FP/image
Feature Computation
Lesion
Classification
Location of lesions
> 80% sensitivity
2-5 FP/image
Focus of the current talk
Page 11
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Key Data Mining Challenges
High accuracy 2-5 FP/image
sensitivity > 80%
1. The breakdown of assumptions
2. Highly unbalanced data
3. Feature computation cost
4. Incorporating domain knowledge
5. No objective ground truth
Page 12
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
The breakdown of assumptions
region on a mammogram
lesion
not a lesion
Traditional classification algorithms
Neural networks
Support Vector Machines
Logistic Regression ….
Make two key assumptions
(1) Training samples are independent
(2) Maximize classification accuracy over all
candidates
Page 13
Copyright © 2009
Siemens
Medical Solutions
USA, Inc. All rights reserved.
Often
violated
in CAD
Violation 1: Training examples are correlated
Candidate generation produces a lot of spatially adjacent candidates.
Hence there are high level of correlations among candidates.
Also correlations exist across different images/detector type/hospitals.
Page 14
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Violation 2: Candidate level accuracy is not important
Most algorithms maximize classification accuracy.
Try to classify every candidate correctly.
Several candidates from the CG point to the
same lesion in the breast.
Lesion is detected if at least one of them is
detected.
It is fine if we miss adjacent overlapping
candidates.
Hence CAD system accuracy is measured in
terms of per lesion/image/patient sensitivity.
So why not optimize the performance metric we
use to evaluate our system?
Page 15
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Solution 1: Multiple Instance Learning
Fung, et al. 2006, Bi, et al. 2007, Raykar et al. 2008, Krishnapuram, et al. 2008,
How do we acquire labels ?
Candidates which overlap with the radiologist mark is a positive.
Rest are negative.
Single Instance Learning
1
Multiple Instance Learning
Positive Bag
1
1
0
0
0
0
0
0
0
0
Classify at-least one candidate correctly
Classify every candidate correctly
We have modified SVM and logistic regression for multiple instance learning
Page 16
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Simple Illustration
Accounts for correlation during training
 Single instance learning:
 Multiple instance learning:
 Reject as many negative candidates as
possible.
 Reject as many negative candidates as
possible.
 Detect as many positives as possible.
 Detect at-least one candidate in a positive bag.
Multiple Instance Learning
Single Instance Learning
Page 17
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Solution 2: Batch Classification
Vural et al., 2009
Accounts for correlation during testing
Change the decision boundary during test time.
Page 18
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Skewed data and expensive features
1. Highly unbalanced class distribution (less than
1% are abnormal)
2. Huge number of experimentally engineered
features
3. Lot of them are irrelevant and redundant.
4. Feature computation is expensive
5. Stringent run-time requirements
1. Feature selection/Sparse classifiers
2. Cascaded classification architecture
Page 19
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Cascaded classification architecture
Bi, et al. 2006
Page 20
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Novel AND-OR training of cascades
Dundar and Bi 2007
Page 21
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Incorporating domain knowledge
We know that lesions have different shapes/sizes/appearance
Page 22
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Gated Classification architecture
Page 23
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Incorporating domain knowledge
Dundar et al. 2007
 Exploit different sub-classes of False Positives
Page 24
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Subjective Ground truth
Raykar et al. 2009
Each radiologist is asked to annotate whether a lesion is malignant (1) or not (0).
Lesion ID
Radiologist
1
Radiologist
2
Radiologist
3
Radiologist
4
Unknown
12
0
0
0
0
x
32
0
1
0
0
x
10
1
1
1
1
x
11
0
0
1
1
x
24
0
1
1
1
x
23
0
0
1
0
x
40
0
1
1
0
x
We have proposed an EM algorithm to simultaneously
learn the ground truth and the classifier.
In practice there is a substantial
amount of disagreement.
Page 25
Truth
We have no knowledge of the
actual golden ground truth.
Getting absolute ground truth
(e.g. biopsy) can be expensive.
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Key Data Mining Challenges
Challenge
Solutions
1.
Training/testing data is correlated
Multiple instance learning
batch classification
2.
Evaluation metric is CAD specific
Multiple instance learning
3.
Highly unbalanced data
Cascaded classifiers
4.
Feature computation cost
Cascaded classifiers
Feature selection methods
5.
Incorporating domain knowledge
Gated classifiers
Polyhedral classifiers
6.
No objective ground truth
EM algorithm
Page 26
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Clinical Impact
1. How much can a radiologist benefit by using the CAD
software ?
2. CAD is mostly deployed in second reader mode.
3. Measure the improvement in performance of a radiologist
with CAD.
4. Several independent clinical studies/trials have been
conducted by our collaborators worldwide.
Page 27
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Lung CAD
1. FDA clinical validation study with17 radiologists,196 cases
from 4 hospitals. Average reader AUC increased by 0.048
(p<0.001) because of CAD.
2. Recent study at NYU by Godoy et al. 2008
Sensitivity without CAD
Sensitivity with CAD
Increase in
sensitivity
56.2 %
79.2 %
66.0 %
89.8 %
9.8 %
10.6 %
Reader 1
Reader 2
3. New prototype also helps detect different kinds of nodules.
Solid Nodules
Part-solid Nodules
Ground Glass Opacities
Page 28
Mean sensitivity
without CAD
Mean sensitivity
with CAD
Increase in sensitivity
60%
80%
75%
85%
95%
86%
15 %
15%
11%
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Colon PEV
Colon PEV (Polyp Enhanced Viewer) was evaluated by
Baker, et al. 2007
 Study with seven less-experienced readers
 Without PEV average sensitivity was 0.810
 With
PEV average sensitivity was 0.908
 A 9.8% increase in average sensitivity (p=0.0152).
Page 29
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
PE CAD
Das et al. 2008 conducted a study with 43 patients to asses
the sensitivity of detection of pulmonary embolism.
.
Sensitivity
without CAD
Sensitivity
with CAD
Increase
sensitivity
Reader 1
87%
98%
11%
Reader 2
82%
93%
11%
Reader 3
77%
92%
15%
Page 30
in
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Key data mining lessons
1. True measure of impact is how much does CAD help the
radiologists.
2. Design algorithms to optimize the metric you care about
3. Careful analysis of the assumptions behind off-the-shelf
data-mining algorithms. In CAD most of these assumptions
break down. Need to design new methods.
4. Domain knowledge is very important. Collaboration with
radiologists is crucial in eliciting the domain knowledge.
Page 31
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Conclusions
1. Radiologists have access to orders of magnitude more data
for diagnosing various cancers.
2. Difficult and time-consuming to identify key clinical findings.
3. We described the data-mining challenges in a
commercially deployed CAD software.
4. Use of CAD as second reader improves radiologist's
detection performance.
5. Key opportunity for data mining technologies to impact
patient care worldwide.
Page 32
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.
Acknowledgements
Dr. D. Naidich, MD, of New York University
Dr. M. E. Baker, MD, of the Cleveland Clinic Foundation
Dr. M. Das, MD, of the University of Aachen
Dr. U. J. Schoepf, MD, of the Medical University of South Carolina
Dr. Peter Herzog, MD, of Klinikum Grossharden, Munich.
Alok Gupta, Ph.D., Ingo Schmuecking, MD,
Harald Steck, Ph.D., Stefan Niculescu, Ph.D., Romer Rosales, Ph.D.,
Sangmin Park, Ph.D., Gerardo Valadez Ph.D.
Maleeha Qazi, and the entire SISL team.
Page 33
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.