View Learning: An extension to SRL An application in Mammography Jesse Davis, Beth Burnside, Inês Dutra Vítor Santos Costa, David Page, Jude Shavlik & Raghu Ramakrishnan Background Breast cancer is the most common cancer Mammography is the only proven screening test At this time approximately 61% of women have had a mammogram in the last 2 years Translates into 20 million mammograms per year The Problem Radiologists interpret mammograms Variability in among radiologists differences in training and experience Experts have higher cancer detection and less benign biopsies Shortage of experts Common Mammography findings Microcalcifications Masses Architectural distortion Calcifications Mass Architectural distortion Other important features Microcalcifications Masses Shape, distribution, stability Shape, margin, density, size, stability Associated findings Breast Density Other variables influence risk • Demographic risk factors Family History Hormone therapy Age Standardization of Practice -Passage of the Mammography Quality Standards Act (MQSA) in 1992 -Requires tracking of patient outcomes through regular audits of mammography interpretations and cases of breast cancer -Standardized lexicon: BI-RADS was developed incorporating 5 categories that include 43 unique descriptors BI-RADS Margins -circumscribed -microlobulated -obscured -indistinct -Spiculated Shape -round -oval -lobular -irregular Typically Benign -skin -vascular -coarse/popcorn -rod-like -round -lucent-centered -eggshell/rim -milk of calcium -suture -dystrophic -punctate Associated Findings Mass Skin Thickening Density -high -equal -low -fat containing Lymph Node Architectural Distortion Distribution -amorphous Tubular Density Skin Lesion Trabecular Thickening Calcifications Intermediate Special Cases -clustered -linear -segmental -regional -diffuse/scattered Higher Probability Malignancy -pleomorphic -fine/linear/branching Nipple Retraction Axillary Adenopathy Skin Retraction Assymetric Breast Tissue Focal Assymetric Density Mammography Database Radiologist interpretation of mammogram Patient may have multiple mammograms A mammogram may have multiple abnormalities Expert defined Bayes net for determining whether an abnormality is malignant Original Expert Structure Mammography Database Patient Abnormality Date Mass Shape … P1 P1 P1 … 1 2 3 … 5/02 5/04 5/04 … Spic Var Spic … Mass Size 0.03 0.04 0.04 … Loc RU4 RU4 LL3 … Be/Mal B M B … Types of Learning Hierarchy of ‘types’ of learning that we can perform on the Mammography database Level 1: Parameters Be/Mal Shape Size Given: Features (node labels, or fields in database), Data, Bayes net structure Learn: Probabilities. Note: probabilities needed are Pr(Be/Mal), Pr(Shape|Be/Mal), Pr (Size|Be/Mal) Level 2: Structure Be/Mal Shape Size Given: Features, Data Learn: Bayes net structure and probabilities. Note: with this structure, now will need Pr(Size|Shape,Be/Mal) instead of Pr(Size|Be/Mal). Mammography Database Patient Abnormality Date Mass Shape … P1 P1 P1 … 1 2 3 … 5/02 5/04 5/04 … Spic Var Spic … Mass Size 0.03 0.04 0.04 … Loc RU4 RU4 LL3 … Be/Mal B M B … Mammography Database Patient Abnormality Date Mass Shape … P1 P1 P1 … 1 2 3 … 5/02 5/04 5/04 … Spic Var Spic … Mass Size 0.03 0.04 0.04 … Loc RU4 RU4 LL3 … Be/Mal B M B … Mammography Database Patient Abnormality Date Mass Shape … P1 P1 P1 … 1 2 3 … 5/02 5/04 5/04 … Spic Var Spic … Mass Size 0.03 0.04 0.04 … Loc RU4 RU4 LL3 … Be/Mal B M B … Level 3: Aggregates Avg size this date Be/Mal Shape Size Given: Features, Data, Background knowledge – aggregation functions such as average, mode, max, etc. Learn: Useful aggregate features, Bayes net structure that uses these features, and probabilities. New features may use other rows/tables. Mammography Database Patient Abnormality Date Mass Shape … P1 P1 P1 … 1 2 3 … 5/02 5/04 5/04 … Spic Var Spic … Mass Size 0.03 0.04 0.04 … Loc RU4 RU4 LL3 … Be/Mal B M B … Mammography Database Patient Abnormality Date Mass Shape … P1 P1 P1 … 1 2 3 … 5/02 5/04 5/04 … Spic Var Spic … Mass Size 0.03 0.04 0.04 … Loc RU4 RU4 LL3 … Be/Mal B M B … Mammography Database Patient Abnormality Date Mass Shape … P1 P1 P1 … 1 2 3 … 5/02 5/04 5/04 … Spic Var Spic … Mass Size 0.03 0.04 0.04 … Loc RU4 RU4 LL3 … Be/Mal B M B … Level 4: View Learning Shape change in abnormality at this location Increase in average size of abnormalities Avg size this date Be/Mal Shape Size Given: Features, Data, Background knowledge – aggregation functions and intensionally-defined relations such as “increase” or “same location” Learn: Useful new features defined by views (equivalent to rules or SQL queries), Bayes net structure, and probabilities. Structure Learning Algorithms Three different algorithms Naïve Bayes Tree Augmented Naïve Bayes (TAN) Sparse Candidate Algorithm Naïve Bayes Net Simple, computationally efficient Class Value … Attr 1 Attr 2 Attr 3 Attr N-2 Attr N-1 Attr N Example TAN Net Also computationally efficient [Friedman,Geiger & Goldszmidt ‘97] Class Value … Attr 1 Attr 2 Attr 3 Attr N-2 Attr N-1 Attr N TAN Arc from class variable to each attribute Less Restrictive than Naïve Bayes Polynomial time bound on constructing network Each attribute permitted at most one extra parent O((# attributes)2 * |training set|) Guaranteed to maximize LL(BT | D) TAN Algorithm Constructs a complete graph between all the attributes (excluding class variable) Edge weight is conditional mutual information between the vertices Find maximum weight spanning tree over the graph Pick root in tree and make edges directed Add edges from directed tree to network General Bayes Net Attr 2 Class Value Attr N Attr 1 Attr 3 Attr N-1 Attr N-2 Attr N-3 Sparse Candidate Friedman et al ‘97 No restrictions on directionality of arcs for class attribute Limits possible parents for each node to a small “candidate” set Sparse Candidate Algorithm Greedy hill climbing search with restarts Initial structure is empty graph Score graph using BDe metric (Cooper & Herskovits ’92, Heckerman ’96) Selects candidate set using an information metric Re-estimate candidate set after each restart Sparse Candidate Algorithm We looked at several initial structures Expert structure Naïve Bayes TAN Scored network on tune set accuracy Our Initial Approach for Level 4 Use ILP to learn rules predictive of “malignant” Treat the rules as intensional definitions of new fields The new view consists of the original table extended with the new fields Using Views malignant(A) :massesStability(A,increasing), prior_mammogram(A,B,_), H0_BreastCA(B,hxDCorLC). Sample Rule malignant(A) :BIRADS_category(A,b5), MassPAO(A,present), MassesDensity'(A,high), HO_BreastCA(A,hxDCorLC), in_same_mammogram(A,B), Calc_Pleomorphic(B,notPresent), Calc_Punctate(B,notPresent). Methodology 10 fold cross validation Split at the patient level Roughly 40 malignant cases and 6000 benign cases in each fold Methodology Without the ILP rules With ILP 6 folds for training set 3 folds for tuning set 4 folds to learn ILP rules 3 folds for training set 2 folds for tuning set TAN/Naïve Bayes don’t require tune set Evaluation Precision and recall curves Why not ROC curves? With many negatives ROC curves look overly optimistic Large change in number of false positives yields small change in ROC curve Pooled results over all 10 folds ROC: Level 2 (TAN) vs. Level 1 Precision-Recall Curves Related Work: ILP for Feature Construction Pompe & Kononenko, ILP’95 Srinivasan & King, ILP’97 Perlich & Provost, KDD’03 Neville, Jensen, Friedland and Hay, KDD’03 Ways to Improve Performance Learn rules to predict “benign” as well as “malignant.” Use Gleaner (Goadrich, Oliphant & Shavlik, ILP’04) to get better spread of Precision vs. Recall in the learned rules. Incorporate aggregation into the ILP runs themselves. Richer View Learning Approaches Learn rules predictive of other fields. Use WARMR or other first-order clustering approaches. Integrate Structure Learning and View Learning…score a rule by how much it helps the current model when added Level 4: View Learning Shape change in abnormality at this location Increase in average size of abnormalities Avg size this date Be/Mal Shape Size Given: Features, Data, Background knowledge – aggregation functions and intensionally-defined relations such as “increase” or “same location” Learn: Useful new features defined by views (equivalent to rules or SQL queries), Bayes net structure, and probabilities. Integrated View/Structure Learning sc(X):- id(X,P), id(Y,P), loc(X,L), loc(Y,L), date(Y,D1), date(X,D2), before(D1,D2), shape(X,Sh1), shape(Y,Sh2), Sh1 \= Sh2. Increase in average size of abnormalities Avg size this date Be/Mal Shape Size Integrated View/Structure Learning sc(X):- id(X,P), id(Y,P), loc(X,L), loc(Y,L), date(Y,D1), date(X,D2), before(D1,D2), shape(X,Sh1), shape(Y,Sh2), Sh1 \= Sh2, size(X,S1), size(Y,S2), S1 > S2. Increase in average size of abnormalities Avg size this date Be/Mal Shape Size Integrated View/Structure Learning sc(X):- id(X,P), id(Y,P), loc(X,L), loc(Y,L), date(Y,D1), date(X,D2), before(D1,D2), shape(X,Sh1), shape(Y,Sh2), Sh1 \= Sh2, size(X,S1), size(Y,S2), S1 > S2. Increase in average size of abnormalities Avg size this date Be/Mal Shape Size Integrated View/Structure Learning sc(X):- id(X,P), id(Y,P), loc(X,L), loc(Y,L), date(Y,D1), date(X,D2), before(D1,D2), shape(X,Sh1), shape(Y,Sh2), Sh1 \= Sh2, size(X,S1), size(Y,S2), S1 > S2. Avg size this date Be/Mal Shape Size Richer View Learning (Cont.) Learning new tables Just rules for non-unary predicates Train on pairs of malignancies for the same mammogram or patient Train on pairs (triples, etc.) of fields, where pairs of values that appear in rows for malignant abnormalities are positive examples, while those that appear only in rows for benign are negative examples Conclusions Graphical models over databases were originally limited to the schema provided Humans find it useful to define new views of a database (new fields or tables intensionally defined from existing data) View learning appears to have promise for increasing the capabilities of graphical models over relational databases, perhaps other SRL approaches WILD Group Jesse Davis Beth Burnside Ines Dutra Vitor Santos Costa Raghu Ramakrishnan Jude Shavlik David Page Others: Hector Corrada-Bravo Irene Ong Mark Goadrich Louis Oliphant Bee-Chung Chen