Uncovering Age-Specific Invasive and DCIS Breast Cancer Rules Using Inductive Logic Programming Houssam Nassif, David Page, Mehmet Ayvaci, Jude Shavlik, Elizabeth S. Burnside The American Cancer Society, Cancer Facts & Figures 2009. Radiology Report Mammogram Radiologist Malignant Yes Abnormal finding? Cancer Biopsy No Invasive In Situ Benign Routine screening Biopsy • Biopsy: – Costly – Invasive – Potentially painful • Models based on mammography report and personal data help identify pre-biopsy cancer stage. Cancer Stages: In Situ Basement Membrane • Cancer cells localized • Did not spread Abnormal cells Cancer Stages: Invasive Basement Membrane • Cancer cells break through basement membrane • Invade surrounding tissue Abnormal cells Treatment • In Situ: – Can develop into Invasive – Excellent prognosis, less intensive treatment Treat it • “Overdiagnosis” (unnecessary treatment) • Time to spread may be long – Patient may die of other causes Problem Constraints • Identify patient subgroups that would benefit most from treatment • Use biopsy alternatives (like follow-up) • Help patients make informed decisions, personalized medicine Task Formulation • Given: – Radiology reports – No biopsy • Do: – Identify patient subgroups – Specify Invasive/In Situ probabilities Data • Match mammograms to biopsies • 1063 Invasive, 412 In Situ cases Radiology Report Biopsy - Personal/family history - BIRADS code - Palpable lump - Mass specs - Calcifications ... - Date - Breast side - Cancer stage ... Acronyms • BI-RADS code: Breast Imaging Reporting and Data System – A number (0-6) summarizing the radiologist opinion and findings concerning a mammogram. – In increasing probability of malignancy: 1<2<3<0<4<5<6 • DCIS: Ductal Carcinoma In Situ – One [and only?] type of In Situ Breast Cancer Age Matters • Apply Logistic Regression – Different attributes predict cancer stages in different age groups • Stratify data (~menopausal status): – Older cohort (age => 65) (post-) – Middle cohort (50 <= age < 65) (peri-) – Younger cohort (age < 50) (pre-) Age-Specific Attributes • Find accurate age-specific attributes • Inductive Logic Programming (ILP) confers added benefits beyond Logistic Regression: – Human comprehensible rules – Specific data subsets Inductive Logic Programming • • • • Machine learning approach White-box classifier Constructs if-then rules Allows user interaction using background knowledge • Operates on relational datasets Example Record Patient Date BIRADS Patient Date 10 100 08/2010 5 100 09/2010 Invasive 11 100 02/2008 3 100 03/2008 Benign 12 200 06/2009 4 200 07/2009 In Situ • Assign mammograms to biopsies • Discard: Record 11 since benign Stages since target of prediction • Non-relational learner extracts: – BIRADS(10,100,5) – BIRADS(12,100,4) Stage ILP Predicate Invention • Link patients records, e.g: – Old study (id, old id) – Old biopsy (id, old id, result) – Access old study/biopsy attributes • Compare attributes, e.g: – Mass size decrease (id, old id) – This-side breast BIRADS code increase (id, old id) Example Cont’d Record Patient Date BIRADS Patient Date 10 100 08/2010 5 100 09/2010 Invasive 11 100 02/2008 3 100 03/2008 Benign 12 200 06/2009 4 200 07/2009 In Situ • Link records: OldStudy(10,11) • Access previous study predicates: – BIRADS(11,100,3) – OldBiopsy(10,11,Benign) • Compare predicates: – BIRADSincrease(10,11,3) Stage Methodology Older Cohort Reports Younger Cohort Reports ILP Classifier Differential Prediction Invasive v/s In Situ Rules Older-Specific Invasive/In Situ Rules Differential Prediction • Limit to Older and Younger: – Maximize age and attribute difference – Leave-out peri-menopausal • Define Invasive rules in Older: – Good Invasive prediction on older • Precision > 60%, Recall > 10% – And significantly worse prediction on younger • Precision difference p-value < 0.05 Invasive Rules in Older 1. The mammogram has a palpable lump in thisside breast. 2. The mammogram's indication for exam is “palpable lump”. 3. The mammogram's indication for exam is “palpable lump", and its other side BI-RADS < 3, and its mass margin is not reported. Palpable Lump • Higher occurrence in Younger • Tendency in younger: – Rapid proliferation – Poor differentiation – In Situ thus more likely to be palpable • Tendency in older: – Slow growth – When big enough to be palpable, almost certainly Invasive Invasive Rules in Older Cont'd 1. The mammogram has an old-biopsy that was invasive 2. The mammogram has an old-biopsy that was invasive, and the biopsy happened within the same age group. • Due to: – Longer life-span of older women – Higher recurrence of invasive tumors In Situ Rules in Younger 1. The mammogram has a personal history of cancer in this-side breast, and this-side breast has a prior surgery, and its combined BI-RADS increased by at least 2 points compared to a previous study. Recurrence • A recurrence is a better predictor of In Situ in younger • Contrast with previous rules, where invasive tumor recurrence is a better predictor of Invasive in older Other Rules • No rules met our criteria for: – In Situ in Older – Invasive in Younger • Middle cohort behavior: – 2 rules like Older – 2 rules like Younger – 2 rules neither Probabilities Rules Precision Precision Recall Older Younger Older Recall Younger Palpable 1 94% 87% 42% 65% Palpable 2 95% 86% 35% 62% Palpable 3 98% 87% 19% 41% Invasive Biopsy 1 97% 86% 50% 18% Invasive Biopsy 2 100% 86% 44% 18% Younger Recurrence 8% 67% 2% 11% Problem Solutions • Identify patient subgroups that would benefit most from treatment => Rule coverage • Use biopsy alternatives (like follow-up) => Pre-biopsy mammography report • Help patients make informed decisions, personalized medicine => Assigning probabilities Conclusion • First differential predictive rules extraction method and application • Personalized age-specific prediction • New insight on: – Palpable lump – Recurrence