View learning

advertisement
View Learning: An extension to SRL
An application in Mammography
Jesse Davis, Beth Burnside, Inês Dutra Vítor
Santos Costa, David Page, Jude Shavlik &
Raghu Ramakrishnan
Background

Breast cancer is the most common cancer

Mammography is the only proven screening test


At this time approximately 61% of women have
had a mammogram in the last 2 years
Translates into 20 million mammograms per
year
The Problem




Radiologists interpret mammograms
Variability in among radiologists
differences in training and
experience
Experts have higher cancer
detection and less benign biopsies
Shortage of experts
Common Mammography findings



Microcalcifications
Masses
Architectural distortion
Calcifications
Mass
Architectural distortion
Other important features

Microcalcifications


Masses



Shape, distribution, stability
Shape, margin, density, size, stability
Associated findings
Breast Density
Other variables influence risk
•
Demographic risk factors
Family History
Hormone therapy
Age
Standardization of Practice
-Passage of the Mammography Quality Standards Act
(MQSA) in 1992
-Requires tracking of patient outcomes through regular
audits of mammography interpretations and cases of
breast cancer
-Standardized lexicon: BI-RADS was developed
incorporating 5 categories that include 43 unique
descriptors
BI-RADS
Margins
-circumscribed
-microlobulated
-obscured
-indistinct
-Spiculated
Shape
-round
-oval
-lobular
-irregular
Typically Benign
-skin
-vascular
-coarse/popcorn
-rod-like
-round
-lucent-centered
-eggshell/rim
-milk of calcium
-suture
-dystrophic
-punctate
Associated
Findings
Mass
Skin
Thickening
Density
-high
-equal
-low
-fat containing
Lymph
Node
Architectural
Distortion
Distribution
-amorphous
Tubular
Density
Skin
Lesion
Trabecular
Thickening
Calcifications
Intermediate
Special
Cases
-clustered
-linear
-segmental
-regional
-diffuse/scattered
Higher Probability
Malignancy
-pleomorphic
-fine/linear/branching
Nipple
Retraction
Axillary
Adenopathy
Skin
Retraction
Assymetric
Breast Tissue
Focal Assymetric
Density
Mammography Database

Radiologist interpretation of mammogram



Patient may have multiple mammograms
A mammogram may have multiple
abnormalities
Expert defined Bayes net for determining
whether an abnormality is malignant
Original Expert Structure
Mammography Database
Patient Abnormality Date Mass Shape …
P1
P1
P1
…
1
2
3
…
5/02
5/04
5/04
…
Spic
Var
Spic
…
Mass Size
0.03
0.04
0.04
…
Loc
RU4
RU4
LL3
…
Be/Mal
B
M
B
…
Types of Learning

Hierarchy of ‘types’ of learning that we
can perform on the Mammography
database
Level 1: Parameters
Be/Mal
Shape
Size
Given: Features (node labels, or
fields in database), Data, Bayes
net structure
Learn: Probabilities. Note:
probabilities needed are
Pr(Be/Mal), Pr(Shape|Be/Mal),
Pr (Size|Be/Mal)
Level 2: Structure
Be/Mal
Shape
Size
Given: Features, Data
Learn: Bayes net structure
and probabilities. Note: with
this structure, now will need
Pr(Size|Shape,Be/Mal)
instead of Pr(Size|Be/Mal).
Mammography Database
Patient Abnormality Date Mass Shape …
P1
P1
P1
…
1
2
3
…
5/02
5/04
5/04
…
Spic
Var
Spic
…
Mass Size
0.03
0.04
0.04
…
Loc
RU4
RU4
LL3
…
Be/Mal
B
M
B
…
Mammography Database
Patient Abnormality Date Mass Shape …
P1
P1
P1
…
1
2
3
…
5/02
5/04
5/04
…
Spic
Var
Spic
…
Mass Size
0.03
0.04
0.04
…
Loc
RU4
RU4
LL3
…
Be/Mal
B
M
B
…
Mammography Database
Patient Abnormality Date Mass Shape …
P1
P1
P1
…
1
2
3
…
5/02
5/04
5/04
…
Spic
Var
Spic
…
Mass Size
0.03
0.04
0.04
…
Loc
RU4
RU4
LL3
…
Be/Mal
B
M
B
…
Level 3: Aggregates
Avg
size
this
date
Be/Mal
Shape
Size
Given: Features, Data,
Background knowledge –
aggregation functions such
as average, mode, max, etc.
Learn: Useful aggregate
features, Bayes net structure
that uses these features, and
probabilities. New features
may use other rows/tables.
Mammography Database
Patient Abnormality Date Mass Shape …
P1
P1
P1
…
1
2
3
…
5/02
5/04
5/04
…
Spic
Var
Spic
…
Mass Size
0.03
0.04
0.04
…
Loc
RU4
RU4
LL3
…
Be/Mal
B
M
B
…
Mammography Database
Patient Abnormality Date Mass Shape …
P1
P1
P1
…
1
2
3
…
5/02
5/04
5/04
…
Spic
Var
Spic
…
Mass Size
0.03
0.04
0.04
…
Loc
RU4
RU4
LL3
…
Be/Mal
B
M
B
…
Mammography Database
Patient Abnormality Date Mass Shape …
P1
P1
P1
…
1
2
3
…
5/02
5/04
5/04
…
Spic
Var
Spic
…
Mass Size
0.03
0.04
0.04
…
Loc
RU4
RU4
LL3
…
Be/Mal
B
M
B
…
Level 4: View Learning
Shape change
in abnormality
at this location
Increase in
average size of
abnormalities
Avg
size
this
date
Be/Mal
Shape
Size
Given: Features, Data,
Background knowledge –
aggregation functions and
intensionally-defined relations
such as “increase” or “same
location”
Learn: Useful new features
defined by views (equivalent to
rules or SQL queries), Bayes
net structure, and probabilities.
Structure Learning Algorithms

Three different algorithms



Naïve Bayes
Tree Augmented Naïve Bayes (TAN)
Sparse Candidate Algorithm
Naïve Bayes Net

Simple, computationally efficient
Class
Value
…
Attr 1
Attr 2
Attr 3
Attr N-2
Attr N-1
Attr N
Example TAN Net

Also computationally efficient
[Friedman,Geiger & Goldszmidt ‘97]
Class
Value
…
Attr 1
Attr 2
Attr 3
Attr N-2
Attr N-1
Attr N
TAN


Arc from class variable to each attribute
Less Restrictive than Naïve Bayes


Polynomial time bound on constructing
network


Each attribute permitted at most one extra
parent
O((# attributes)2 * |training set|)
Guaranteed to maximize LL(BT | D)
TAN Algorithm

Constructs a complete graph between all
the attributes (excluding class variable)




Edge weight is conditional mutual information
between the vertices
Find maximum weight spanning tree over
the graph
Pick root in tree and make edges directed
Add edges from directed tree to network
General Bayes Net
Attr 2
Class
Value
Attr N
Attr 1
Attr 3
Attr N-1
Attr N-2
Attr N-3
Sparse Candidate



Friedman et al ‘97
No restrictions on directionality of arcs for
class attribute
Limits possible parents for each node to a
small “candidate” set
Sparse Candidate Algorithm

Greedy hill climbing search with restarts


Initial structure is empty graph
Score graph using BDe metric (Cooper & Herskovits
’92, Heckerman ’96)


Selects candidate set using an information
metric
Re-estimate candidate set after each
restart
Sparse Candidate Algorithm

We looked at several initial structures




Expert structure
Naïve Bayes
TAN
Scored network on tune set accuracy
Our Initial Approach for Level 4



Use ILP to learn rules predictive of
“malignant”
Treat the rules as intensional definitions of
new fields
The new view consists of the original table
extended with the new fields
Using Views
malignant(A) :massesStability(A,increasing),
prior_mammogram(A,B,_),
H0_BreastCA(B,hxDCorLC).
Sample Rule
malignant(A) :BIRADS_category(A,b5),
MassPAO(A,present),
MassesDensity'(A,high),
HO_BreastCA(A,hxDCorLC),
in_same_mammogram(A,B),
Calc_Pleomorphic(B,notPresent),
Calc_Punctate(B,notPresent).
Methodology



10 fold cross validation
Split at the patient level
Roughly 40 malignant cases and 6000
benign cases in each fold
Methodology

Without the ILP rules



With ILP




6 folds for training set
3 folds for tuning set
4 folds to learn ILP rules
3 folds for training set
2 folds for tuning set
TAN/Naïve Bayes don’t require tune set
Evaluation

Precision and recall curves

Why not ROC curves?
With many negatives ROC curves look overly
optimistic
 Large change in number of false positives yields
small change in ROC curve


Pooled results over all 10 folds
ROC: Level 2 (TAN) vs. Level 1
Precision-Recall Curves
Related Work: ILP for Feature
Construction
 Pompe
& Kononenko, ILP’95
 Srinivasan & King, ILP’97
 Perlich & Provost, KDD’03
 Neville, Jensen, Friedland and Hay,
KDD’03
Ways to Improve Performance



Learn rules to predict “benign” as well as
“malignant.”
Use Gleaner (Goadrich, Oliphant & Shavlik,
ILP’04) to get better spread of Precision
vs. Recall in the learned rules.
Incorporate aggregation into the ILP runs
themselves.
Richer View Learning Approaches



Learn rules predictive of other fields.
Use WARMR or other first-order clustering
approaches.
Integrate Structure Learning and View
Learning…score a rule by how much it
helps the current model when added
Level 4: View Learning
Shape change
in abnormality
at this location
Increase in
average size of
abnormalities
Avg
size
this
date
Be/Mal
Shape
Size
Given: Features, Data,
Background knowledge –
aggregation functions and
intensionally-defined relations
such as “increase” or “same
location”
Learn: Useful new features
defined by views (equivalent to
rules or SQL queries), Bayes
net structure, and probabilities.
Integrated View/Structure Learning
sc(X):- id(X,P), id(Y,P), loc(X,L),
loc(Y,L), date(Y,D1), date(X,D2),
before(D1,D2), shape(X,Sh1),
shape(Y,Sh2), Sh1 \= Sh2.
Increase in average size of
abnormalities
Avg
size
this
date
Be/Mal
Shape
Size
Integrated View/Structure Learning
sc(X):- id(X,P), id(Y,P), loc(X,L),
loc(Y,L), date(Y,D1), date(X,D2),
before(D1,D2), shape(X,Sh1),
shape(Y,Sh2), Sh1 \= Sh2, size(X,S1),
size(Y,S2), S1 > S2.
Increase in average size of
abnormalities
Avg
size
this
date
Be/Mal
Shape
Size
Integrated View/Structure Learning
sc(X):- id(X,P), id(Y,P), loc(X,L),
loc(Y,L), date(Y,D1), date(X,D2),
before(D1,D2), shape(X,Sh1),
shape(Y,Sh2), Sh1 \= Sh2, size(X,S1),
size(Y,S2), S1 > S2.
Increase in average size of
abnormalities
Avg
size
this
date
Be/Mal
Shape
Size
Integrated View/Structure Learning
sc(X):- id(X,P), id(Y,P), loc(X,L),
loc(Y,L), date(Y,D1), date(X,D2),
before(D1,D2), shape(X,Sh1),
shape(Y,Sh2), Sh1 \= Sh2, size(X,S1),
size(Y,S2), S1 > S2.
Avg
size
this
date
Be/Mal
Shape
Size
Richer View Learning (Cont.)

Learning new tables



Just rules for non-unary predicates
Train on pairs of malignancies for the same
mammogram or patient
Train on pairs (triples, etc.) of fields, where
pairs of values that appear in rows for
malignant abnormalities are positive
examples, while those that appear only in
rows for benign are negative examples
Conclusions



Graphical models over databases were originally
limited to the schema provided
Humans find it useful to define new views of a
database (new fields or tables intensionally
defined from existing data)
View learning appears to have promise for
increasing the capabilities of graphical models
over relational databases, perhaps other SRL
approaches
WILD Group







Jesse Davis
Beth Burnside
Ines Dutra
Vitor Santos Costa
Raghu Ramakrishnan
Jude Shavlik
David Page

Others:





Hector Corrada-Bravo
Irene Ong
Mark Goadrich
Louis Oliphant
Bee-Chung Chen
Download