Hierarchical Shape Classification Using Bayesian Aggregation Zafer Barutcuoglu Christopher DeCoro Princeton University 1 Shape Matching • Given two shapes, quantify the difference between them – Useful for search and retrieval, image processing, etc. • Common approach is that of shape descriptors – Map arbitrary definition of shape into a representative vector – Define a distance measure (i.e Euclidean) to quantify similarity – Examples include: GEDT, SHD, REXT, etc. • A common application is classification – Given an example, and a set of classes, which class is most appropriate for that example? – Applicable to a large range of applications 2 Hierarchical Classification • Given a hierarchical set of classes, • And a set of labeled examples for those classes • Predict the hierarchically-consistent classification of a novel example, using the hierarchy to improve performance. Example courtesy of “The Princeton Shape Benchmark”, P. Shilane et. al (2004) 3 Motivation • Given these, how can we predict classes for novel shapes? • Conventional algorithms don’t apply directly to hierarchies – Binary classification – Multi-class (one-of-M) classification • Using binary classification for each class can produce predictions which contradict with the hierarchy • Using multi-class classification over the leaf nodes loses information by ignoring the hierarchy 4 Other heirarchical classification methods, other domains • TO ZAFER: I need something here about background information, other methods, your method, etc. • Also, Szymon suggested a slide about conditional probabilities and bayes nets in general. Could you come up with something very simplified and direct that would fit with the rest of the presentation? 5 Motivation (Example) • Independent classifiers give an inconsistent prediction – Classified as bird, but not classified as flying creature • Also cause incorrect results – Not classified as flying bird – Incorrectly classified as dragon 6 Motivation (Example) • We can correct this using our Bayesian Aggregation method – Remove inconsistency at flying creature • Also improves results of classification – Stronger prediction of flying bird – No longer classifies as dragon 7 Naïve Hierarchical Consistency INDEPENDENT TOP-DOWN BOTTOM-UP animal YES animal YES animal YES biped NO biped NO biped NO human YES human YES human YES Unfair distribution of responsibility and correction 8 Our Method – Bayesian Aggregation • Evaluate individual classifiers for each class – Inconsistent predictions allowed – Any classification algorithm can be used (e.g. kNN) – Parallel evaluation • Bayesian aggregation of predictions – Inconsistencies resolved globally 9 Our Method - Implementation • Shape descriptor: Spherical Harmonic Descriptor* – Converts shape into 512-element vector – Compared using Euclidean distance • Binary classifier: k-Nearest Neighbors – Finds the k nearest labeled training examples – Novel example assigned to most common class • Simple to implement, yet flexible * “Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors” 10 M. Kazhdan, et. al (2003) A Bayesian Framework g1 animal y1 g2 biped y2 flying g3 creature y3 gsuperman y4 4 Given predictions g1...gN from kNN, find most likely true labels y1...yN 11 Classifier Output Likelihoods P(y1...yN | g1...gN) = α P(g1...gN | y1...yN) P(y1...yN) • Conditional independence assumption – Classifiers outputs depend only on their true labels – Given its true label, an output is conditionally independent of all other labels and outputs P(g1...gN | y1...yN) = i P(gi | yi) 12 Estimating P(gi | yi) The Confusion Matrix obtained using cross-validation Predicted negative Predicted positive Negative examples #(g=0,y=0) #(g=1,y=0) Positive examples #(g=0,y=1) #(g=1,y=1) e.g. P(g=0 | y=0) ≈ #(g=0,y=0) / [ #(g=0,y=0) + #(g=1,y=0) ] 13 Hierarchical Class Priors P(y1...yN | g1...gN) = α P(g1...gN | y1...yN) P(y1...yN) • Hierarchical dependency model – Class prior depends only on children P(y1...yN) = i P(yi | ychildren(i)) • Enforces hierarchical consistency – The probability of an inconsistent assignment is 0 – Bayesian inference will not allow inconsistency 14 Conditional Probabilities • P(yi | ychildren(i)) g1 g2 y2 g4 – Inferred from known labeled examples y1 g3 y3 • P(gi | yi) – Inferred by validation on held-out data y4 • We can now apply Bayesian inference algorithms – Particular algorithm independent of our method – Results in globally consistent predictions – Uses information present in hierarchy to improve predictions 15 Applying Bayesian Aggregation • Training phase produces Bayes Network – From hierarchy and training set, train classifiers – Use cross-validation to generate conditional probabilities – Use probabilities to create bayes net Hierarchy Classifiers Cross-validation Bayes Net Training Set • Test phase give probabilities for novel examples – For a novel example, apply classifiers – Use classifier outputs and existing bayes net to infer probability of membership in each class Test Example Classifiers Bayes Net Class Probabilities 16 Experimental Results • 2-fold cross-validation on each class using kNN • Area Under the ROC Curve (AUC) for evaluation – Real-valued predictor can be thresholded arbitrarily – Probability that pos. example is predicted over a neg. example • 169 of 170 classes were improved by our method – Average AUC = +0.137 (+19% of old AUC) – Old AUC = .7004 (27 had AUC of 0.5, random guessing) 17 AUC Scatter Plot Scatterplot of AUC scores after vs. before Bayesian correction 1 0.95 0.9 AUC for kNN+Bayes 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.5 0.6 0.7 0.8 AUC for kNN 0.9 1 18 AUC Changes • 169 of 170 classes were improved by our method – Average AUC = +0.137 (+19% of old AUC) – Old AUC = .7004 (27 had AUC of 0.5, random guessing) 19 Questions 20