Conference Presentation

advertisement
Hierarchical Shape Classification
Using Bayesian Aggregation
Zafer Barutcuoglu
Christopher DeCoro
Princeton University
1
Shape Matching
• Given two shapes, quantify the difference between them
– Useful for search and retrieval, image processing, etc.
• Common approach is that of shape descriptors
– Map arbitrary definition of shape into a representative vector
– Define a distance measure (i.e Euclidean) to quantify similarity
– Examples include: GEDT, SHD, REXT, etc.
• A common application is classification
– Given an example, and a set of classes, which class is most
appropriate for that example?
– Applicable to a large range of applications
2
Hierarchical Classification
• Given a hierarchical set of classes,
• And a set of labeled examples for those classes
• Predict the hierarchically-consistent classification of a novel
example, using the hierarchy to improve performance.
Example courtesy of “The Princeton Shape Benchmark”, P. Shilane et. al (2004)
3
Motivation
• Given these, how can we predict classes for novel shapes?
• Conventional algorithms don’t apply directly to hierarchies
– Binary classification
– Multi-class (one-of-M) classification
• Using binary classification for each class can produce
predictions which contradict with the hierarchy
• Using multi-class classification over the leaf nodes loses
information by ignoring the hierarchy
4
Other heirarchical classification
methods, other domains
• TO ZAFER: I need something here about background
information, other methods, your method, etc.
• Also, Szymon suggested a slide about conditional probabilities
and bayes nets in general. Could you come up with something
very simplified and direct that would fit with the rest of the
presentation?
5
Motivation (Example)
• Independent classifiers give an inconsistent prediction
– Classified as bird, but not classified as flying creature
• Also cause incorrect results
– Not classified as flying bird
– Incorrectly classified as dragon
6
Motivation (Example)
• We can correct this using our Bayesian Aggregation method
– Remove inconsistency at flying creature
• Also improves results of classification
– Stronger prediction of flying bird
– No longer classifies as dragon
7
Naïve Hierarchical Consistency
INDEPENDENT
TOP-DOWN
BOTTOM-UP
animal
YES
animal
YES
animal
YES
biped
NO
biped
NO
biped
NO
human
YES
human
YES
human
YES
Unfair distribution of
responsibility and correction
8
Our Method – Bayesian Aggregation
• Evaluate individual classifiers for each class
– Inconsistent predictions allowed
– Any classification algorithm can be used (e.g. kNN)
– Parallel evaluation
• Bayesian aggregation of predictions
– Inconsistencies resolved globally
9
Our Method - Implementation
• Shape descriptor: Spherical Harmonic Descriptor*
– Converts shape into 512-element vector
– Compared using Euclidean distance
• Binary classifier: k-Nearest Neighbors
– Finds the k nearest labeled training examples
– Novel example assigned to most common class
• Simple to implement, yet flexible
* “Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors” 10
M. Kazhdan, et. al (2003)
A Bayesian Framework
g1 animal y1
g2 biped y2
flying
g3 creature
y3
gsuperman
y4
4
Given predictions g1...gN from kNN,
find most likely true labels y1...yN
11
Classifier Output Likelihoods
P(y1...yN | g1...gN) = α P(g1...gN | y1...yN) P(y1...yN)
• Conditional independence assumption
– Classifiers outputs depend only on their true labels
– Given its true label, an output is conditionally
independent of all other labels and outputs
P(g1...gN | y1...yN) = i P(gi | yi)
12
Estimating P(gi | yi)
The Confusion Matrix obtained using cross-validation
Predicted
negative
Predicted
positive
Negative
examples
#(g=0,y=0)
#(g=1,y=0)
Positive
examples
#(g=0,y=1)
#(g=1,y=1)
e.g. P(g=0 | y=0) ≈ #(g=0,y=0) / [ #(g=0,y=0) + #(g=1,y=0) ]
13
Hierarchical Class Priors
P(y1...yN | g1...gN) = α P(g1...gN | y1...yN) P(y1...yN)
• Hierarchical dependency model
– Class prior depends only on children
P(y1...yN) = i P(yi | ychildren(i))
• Enforces hierarchical consistency
– The probability of an inconsistent assignment is 0
– Bayesian inference will not allow inconsistency
14
Conditional Probabilities
• P(yi | ychildren(i))
g1
g2
y2
g4
– Inferred from known
labeled examples
y1
g3
y3
• P(gi | yi)
– Inferred by validation on
held-out data
y4
• We can now apply Bayesian inference algorithms
– Particular algorithm independent of our method
– Results in globally consistent predictions
– Uses information present in hierarchy to improve predictions
15
Applying Bayesian Aggregation
• Training phase produces Bayes Network
– From hierarchy and training set, train classifiers
– Use cross-validation to generate conditional probabilities
– Use probabilities to create bayes net
Hierarchy
Classifiers
Cross-validation
Bayes Net
Training Set
• Test phase give probabilities for novel examples
– For a novel example, apply classifiers
– Use classifier outputs and existing bayes net to infer
probability of membership in each class
Test Example
Classifiers
Bayes Net
Class Probabilities
16
Experimental Results
• 2-fold cross-validation on each class using kNN
• Area Under the ROC Curve (AUC) for evaluation
– Real-valued predictor can be thresholded arbitrarily
– Probability that pos. example is predicted over a neg. example
• 169 of 170 classes were improved by our method
– Average AUC = +0.137 (+19% of old AUC)
– Old AUC = .7004 (27 had AUC of 0.5, random guessing)
17
AUC Scatter Plot
Scatterplot of AUC scores after vs. before Bayesian correction
1
0.95
0.9
AUC for kNN+Bayes
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0.5
0.6
0.7
0.8
AUC for kNN
0.9
1
18
AUC Changes
• 169 of 170 classes were improved by our method
– Average AUC = +0.137 (+19% of old AUC)
– Old AUC = .7004 (27 had AUC of 0.5, random guessing)
19
Questions
20
Download