Does one size really fit all?

Does one size really fit all? Evaluating classifiers in Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute Agenda 1. Content-based Image Classification – Motivation 2. Bag-of-Visual-Words 3. Bag-of-Visual-Words Classification ■ Classifier Evaluation ■ Model Visualization 4. Conclusion Content-based Image Classification Find all photos that show ... ! en t e r co n c yo u r f av or ep t h er e . . i t e . Does one size really fit all? Christian Hentschel, 09-18-2014 Chart 3 Content-based Image Classification (2) Training: ■ Positive images: (that depict a concept) ■ Negative images: (that don’t) Classification: ■ Test image if it depicts concept (or not): Does one size really fit all? Christian Hentschel, 09-18-2014 Chart 4 Bag-of-Visual-Words ■ Origin - text classification □ e.g. Task: classify forum posts into “insult” (positive) and “not insult” (negative) "haha... at least get your insults straight you idiot!!. ..." D1 { "You're one of my favorite commenter s." D2 } “idiot”: 1, “favorite”: 2, “to”: 3, “you”: 4, “at”: 5, “least”: 6, “commenter”: 7, … D1 [1, 2, 1, 1, 2, 0, 0,…] D2 [1, 1, 1, 1, 0, 1, 1,…] Does one size really fit all? Christian Hentschel, 09-18-2014 Chart 5 Bag-of-Visual-Words (2) ■ Learn a decision rule (e.g. linear SVM) Feature weights □ i.e. learn features weights Does one size really fit all? Christian Hentschel, 09-18-2014 [Adopted from A. Mueller, https://github.com/amueller/ml-berlin-tutorial] Chart 6 Bag-of-Visual-Words (3) ■ Examples for Visual Words Examples p for visual words Airplanes Motorbikes Faces Wild Cats Leaves Does one size really fit all? People Christian Hentschel, 09-18-2014 Bikes [Schmid, 2013] Chart 7 Bag-of-Visual-Words (4) Does one size really fit all? Christian Hentschel, 09-18-2014 Chart 8 Bag-of-Visual Words Classification ■ De-facto standard: kernel-based Support Vector Machines □ Decision rule: □ Kernel-Function: □ Distance metric: Does one size really fit all? Mo d el Christian Hentschel, 09-18-2014 Chart 9 Bag-of-Visual Words Classification (2) ■ Testing different classification models □ Average Precision (AP, area under Precision Recall Curve) ■ Test Dataset □ Caltech-101 – 100 + 1 object classes – 31 – 800 images per class ■ Tested Classifiers: □ Naïve Bayes, K-NN, Logistic Regression □ SVM: linear SVM, RBF kernel SVM, Chi2-kernel SVM Does one size really fit all? □ Ensemble Methods:Random Forest, AdaBoost Christian Hentschel, 09-18-2014 □ Hyper parameters optimized in grid-search using CV Chart 10 Bag-of-Visual Words Classification – Results ■ Mean AP scores over all classes: 0.67 Chi2-Kernel SVM 0.63 AdaBoost 0.61 Random Forest 0.59 RBF kernel SVM linear SVM 0.55 Logistic Regression 0.55 k NN Naive Bayes 0.52 0.48 Does one size really fit all? Christian Hentschel, 09-18-2014 Chart 12 Bag-of-Visual Words Classification – Results (2) ■ mAP-scores between best (Chi2-SVM) and worst (Naïve Bayes): 0.19 □ Poor performance of Naïve Bayes and k-NN – but fast training ■ Superior performance of kernel-based SVM, but: □ Kernel function (Chi2 vs. Gaussian RBF) is crucial: – Ensemble methods outperform Gaussian RBF – Gaussian RBF only slightly better than linear SVM □ increased evaluation time: – complex kernel function between each SV and a testing example Does one size really fit all? – ensemble method reduce classification time Christian Hentschel, 09-18-2014 Chart 13 Bag-of-Visual Words Classification – Results (3) ■ Correlation between training sets size and average Precision: Does one size really fit all? Christian Hentschel, 09-18-2014 Chart 14 Bag-of-Visual Words Classification – Results (4) ■ Outliers: □ “minaret” □ “leopards” Does one size really fit all? Christian Hentschel, 09-18-2014 Chart 15 Bag-of-Visual Words Classification – Model Visualization ■ Visualize impact of individual image regions on classification result Local Region Descriptor BoVW Vector Feature Weights □ Use ensemble methods – No kernel function – AdaBoost: direct indicator for feature importance: mean decrease in impurity Does one size really fit all? Christian Hentschel, 09-18-2014 Chart 16 “minaret” ■ “leopards” Does one size really fit all? Christian Hentschel, 09-18-2014 Chart 17 ■ “minaret” Does one size really fit all? Christian Hentschel, 09-18-2014 Chart 18 ■ “car_side” Does one size really fit all? Christian Hentschel, 09-18-2014 Chart 19 ■ “watch” Does one size really fit all? Christian Hentschel, 09-18-2014 Chart 20 Conclusion ■ Kernel-based SVM are best choice when aiming for accuracy □ Kernel function is crucial □ Evaluation time-cost is high ■ Ensemble methods are second-best winner □ Fast evaluation □ Offer intuitive visualization of model parameters ■ Visual analytics reveal deficiencies in datasets □ Improperly chosen training data affects classification results Does one size really fit all? Christian Hentschel, 09-18-2014 Chart 21 Thank you for your attention! Christian Hentschel, Harald Sack Hasso Plattner Institute

Does one size really fit all?

Related documents

Products

Support

Does one size really fit all?

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib