Bayesian Network for Predicting Invasive and In-situ Breast Cancer using Mammographic Findings Jagpreet Chhatwal1 O. Alagoz1, E.S. Burnside1, H. Nassif1, E.A. Sickles2 1University of Wisconsin-Madison 2University of California, San Francisco Outline • • • • Introduction Model Formulation Results Conclusions 2 Background: Facts • Breast cancer is the most common non-skin cancer affecting women in the U.S. • Every two minutes a woman is diagnosed with breast cancer • Estimated number of deaths because of breast cancer in 2007 - 40,460 3 Mammography A low dose X-ray examination of breasts • Mammography is shown to be the most cost-effective diagnostic procedure for early diagnosis of breast cancer – American Cancer Society recommends that women above the age 40 should have a screening mammogram every year – More than 20 million mammograms are performed in the US annually Breast Biopsy Tissue-sampling procedure to confirm the presence of cancer • Types of biopsies: Needle aspiration and surgical • Estimated number of biopsies performed annually - 700,000 • 55−85% of breast biopsies result in benign (non-cancerous) findings • Estimated overspending on benign biopsies - $250 million • Significant anxiety associated with biopsy 5 Invasive and In-situ Cancer • Nearly all breast cancer arises in the milk ducts of the breast • Invasive cancer • Ductal carcinoma in-situ (DCIS) • DCIS lesions contain cells that appear to be cancer but not all such lesions behave as cancer 6 Invasive and In-situ Cancer (Cont.) DCIS is a non-invasive malignant condition with a very favorable prognosis. Depending on the grade of the DCIS and the expected life span of older women, DCIS often will not cause morbidity or mortality for many years, if ever. Invasive breast cancer has an increased risk of axillary node metastasis or distant disease Quickly results in morbidity and mortality (also in older women). 7 Diagnosis or Over-diagnosis Only some DCIS lesions will eventually become invasive cancer. What percent will become invasive cancer is not known Which DCIS will become invasive is not known Detecting DCIS on mammograms may benefit those women whose DCIS would become invasive cancer. Detecting DCIS may potentially harm those women who have breast surgery but whose DCIS would never become invasive cancer. Incidence of DCIS has increased significantly, with the same predominance in older women as invasive breast cancer. 8 Objective To build a quantitative model to predict the risk of DCIS and invasive breast cancer using patient demographic factors and mammography findings. 9 Data Source • Mammography data from University of California San Francisco Medical Center between 1997 to 2007 • Combination of structured data and extracted variables from dictated text reports – Patient demographic factors – Imaging features according to the standardized Breast Imaging Reporting and Data Systems (BI-RADS) lexicon. • 2,211 malignant biopsy records – 1,544 invasive cancer and 667 DCIS. 10 Sample Text-report • “Possible clustered microcalcifications, right breast. FINDINGS: Spot compression magnification mammography of the right breast was performed. In the right upper outer breast, there is a cluster of amorphous appearing microcalcifications. These are slightly suspicious for malignancy and therefore biopsy is recommended. No other suspicious clusters of microcalcifications are present. There are few scattered microcalcifications elsewhere in the right breast. Recommend needle localization followed by surgical biopsy…” 11 MBNi • Developed a Mammography Bayesian Network for Invasive and In-situ cancer risk prediction (MBNi) • Structural Training – NP hard problem – Using Tree Augmented Naïve Bayes (TAN) algorithm – WEKA (Waikato Environment for Knowledge Analysis) 12 MBNi 13 Performance Measures Sensitivity and Specificity Patient with Disease Patients without disease Test + a b Test - c d Sensitivity=a/(a+c) Specificity = d/(b+d) Receiver Operating Characteristic (ROC) Curve Graphical plot of sensitivity versus 1-specificity for varying cut-off points (thresholds) Area under the ROC curve (Az) Performance Measures (Cont.) Precision and Recall Patient with Disease Patients without disease Test + a b Test - c d Recall = a/(a+c) Precision = a/(a+b) Precision-Recall (PR) Curve Graphical plot of precision versus recall for varying cut-off points (thresholds) Validation Technique 10 fold cross-validation Fold Data set 1 2 3 … 10 Test fold Training fold … Merge tested folds for performance analysis Fold 1 2 3 … 10 Performance: ROC Curve 17 Performance: PR Curve 18 Older versus Younger Women • Mammography is known to perform better in older women • We stratified our data set in two parts as follows: – Mammography data of women less than age 50 (177 DCIS and 361 invasive cancers), – Mammography data of women above the age 65 (219 DCIS and 600 invasive cancers). 19 Performance: ROC P=0.039 20 Performance: PR Curve P=0.038 21 Conclusions • Our MBNi can predict the risk of DCIS versus invasive cancer and may be superior in older. • Our MBNi has the potential to aid in the clinical management decisions such as the need for increased sampling at biopsy and the appropriate selection of surgical interventions. • Our MBNi is a step towards shared decision-making and may empower older women to better manage their health in the context of their co-morbidities and life expectancy. 22 Ongoing and Future Research • Validation of “text extraction” features • Three-class prediction model – Benign, DCIS and Invasive cancer • Predict the risk of breast diseases type • Ensemble learning: – Logistic Regression – Artificial Neural Networks – Bayesian Networks – Support Vector Machines 23 Thank You! 24