Medical Decision Making with Bayesian Networks and Influence

advertisement
Bayesian Network for Predicting Invasive
and In-situ Breast Cancer using
Mammographic Findings
Jagpreet Chhatwal1
O. Alagoz1, E.S. Burnside1, H. Nassif1, E.A. Sickles2
1University
of Wisconsin-Madison
2University of California, San Francisco
Outline
•
•
•
•
Introduction
Model Formulation
Results
Conclusions
2
Background: Facts
• Breast cancer is the most common non-skin cancer affecting
women in the U.S.
• Every two minutes a woman is diagnosed with breast cancer
• Estimated number of deaths because of breast cancer in
2007 - 40,460
3
Mammography
A low dose X-ray examination of breasts
• Mammography is shown to be the most cost-effective diagnostic
procedure for early diagnosis of breast cancer
– American Cancer Society recommends that women above the age
40 should have a screening mammogram every year
– More than 20 million mammograms are performed in the US
annually
Breast Biopsy
Tissue-sampling procedure to confirm
the presence of cancer
• Types of biopsies: Needle aspiration
and surgical
• Estimated number of biopsies
performed annually - 700,000
• 55−85% of breast biopsies result in
benign (non-cancerous) findings
• Estimated overspending on benign
biopsies - $250 million
• Significant anxiety associated with
biopsy
5
Invasive and In-situ Cancer
• Nearly all breast cancer arises in the
milk ducts of the breast
• Invasive cancer
• Ductal carcinoma in-situ (DCIS)
• DCIS lesions contain cells that appear to
be cancer but not all such lesions behave
as cancer
6
Invasive and In-situ Cancer (Cont.)
 DCIS is a non-invasive malignant condition with a very
favorable prognosis.
 Depending on the grade of the DCIS and the expected life
span of older women, DCIS often will not cause morbidity
or mortality for many years, if ever.
 Invasive breast cancer has an increased risk of axillary node
metastasis or distant disease
 Quickly results in morbidity and mortality (also in older women).
7
Diagnosis or Over-diagnosis
 Only some DCIS lesions will eventually become invasive
cancer.
 What percent will become invasive cancer is not known
 Which DCIS will become invasive is not known
 Detecting DCIS on mammograms may benefit those
women whose DCIS would become invasive cancer.
 Detecting DCIS may potentially harm those women who
have breast surgery but whose DCIS would never become
invasive cancer.
 Incidence of DCIS has increased significantly, with the same
predominance in older women as invasive breast cancer.
8
Objective
To build a quantitative model to predict the risk of
DCIS and invasive breast cancer using patient
demographic factors and mammography findings.
9
Data Source
• Mammography data from University of California San
Francisco Medical Center between 1997 to 2007
• Combination of structured data and extracted variables
from dictated text reports
– Patient demographic factors
– Imaging features according to the standardized Breast
Imaging Reporting and Data Systems (BI-RADS) lexicon.
• 2,211 malignant biopsy records
– 1,544 invasive cancer and 667 DCIS.
10
Sample Text-report
• “Possible clustered microcalcifications, right breast. FINDINGS:
Spot compression magnification mammography of the right breast
was performed. In the right upper outer breast, there is a cluster of
amorphous appearing microcalcifications. These are slightly suspicious
for malignancy and therefore biopsy is recommended. No other
suspicious clusters of microcalcifications are present. There are few
scattered microcalcifications elsewhere in the right breast.
Recommend needle localization followed by surgical biopsy…”
11
MBNi
• Developed a Mammography Bayesian Network for
Invasive and In-situ cancer risk prediction (MBNi)
• Structural Training
– NP hard problem
– Using Tree Augmented Naïve Bayes (TAN) algorithm
– WEKA (Waikato Environment for Knowledge Analysis)
12
MBNi
13
Performance Measures

Sensitivity and Specificity
Patient with
Disease

Patients
without disease
Test +
a
b
Test -
c
d
Sensitivity=a/(a+c)
Specificity = d/(b+d)
Receiver Operating Characteristic (ROC) Curve


Graphical plot of sensitivity versus 1-specificity for varying
cut-off points (thresholds)
Area under the ROC curve (Az)
Performance Measures (Cont.)

Precision and Recall
Patient with
Disease

Patients
without disease
Test +
a
b
Test -
c
d
Recall = a/(a+c)
Precision = a/(a+b)
Precision-Recall (PR) Curve

Graphical plot of precision versus recall for varying cut-off
points (thresholds)
Validation Technique

10 fold cross-validation
Fold 
Data set
1
2
3
…
10

Test fold
Training fold
…
Merge tested folds for performance analysis
Fold 

1
2
3
…
10
Performance: ROC Curve
17
Performance: PR Curve
18
Older versus Younger Women
• Mammography is known to perform better in older
women
• We stratified our data set in two parts as follows:
– Mammography data of women less than age 50 (177
DCIS and 361 invasive cancers),
– Mammography data of women above the age 65 (219
DCIS and 600 invasive cancers).
19
Performance: ROC
P=0.039
20
Performance: PR Curve
P=0.038
21
Conclusions
• Our MBNi can predict the risk of DCIS versus invasive
cancer and may be superior in older.
• Our MBNi has the potential to aid in the clinical
management decisions such as the need for increased
sampling at biopsy and the appropriate selection of
surgical interventions.
• Our MBNi is a step towards shared decision-making
and may empower older women to better manage their
health in the context of their co-morbidities and life
expectancy.
22
Ongoing and Future Research
• Validation of “text extraction” features
• Three-class prediction model – Benign, DCIS and
Invasive cancer
• Predict the risk of breast diseases type
• Ensemble learning:
– Logistic Regression
– Artificial Neural Networks
– Bayesian Networks
– Support Vector Machines
23
Thank You!
24
Download