Does one size really fit all?

advertisement
Does one size really fit all?
Evaluating classifiers in Bag-of-Visual-Words classification
Christian Hentschel, Harald Sack
Hasso Plattner Institute
Agenda
1.
Content-based Image Classification – Motivation
2.
Bag-of-Visual-Words
3. Bag-of-Visual-Words Classification
■ Classifier Evaluation
■ Model Visualization
4.
Conclusion
Content-based Image Classification
Find all photos that show ... !
en t e
r
co n c yo u r f av
or
ep t h
er e . . i t e
.
Does one size
really fit all?
Christian Hentschel,
09-18-2014
Chart 3
Content-based Image Classification (2)
Training:
■ Positive images:
(that depict a concept)
■ Negative images:
(that don’t)
Classification:
■ Test image if it depicts concept
(or not):
Does one size
really fit all?
Christian Hentschel,
09-18-2014
Chart 4
Bag-of-Visual-Words
■ Origin - text classification
□ e.g. Task: classify forum posts into “insult” (positive) and “not insult” (negative)
"haha...
at least
get your
insults
straight
you
idiot!!.
..."
D1
{
"You're
one of my
favorite
commenter
s."
D2
}
“idiot”: 1,
“favorite”: 2,
“to”: 3,
“you”: 4,
“at”: 5,
“least”: 6,
“commenter”: 7,
…
D1 [1, 2, 1, 1, 2, 0, 0,…]
D2 [1, 1, 1, 1, 0, 1, 1,…]
Does one size
really fit all?
Christian Hentschel,
09-18-2014
Chart 5
Bag-of-Visual-Words (2)
■ Learn a decision rule (e.g. linear SVM)
Feature weights
□ i.e. learn features weights
Does one size
really fit all?
Christian Hentschel,
09-18-2014
[Adopted from A. Mueller,
https://github.com/amueller/ml-berlin-tutorial]
Chart 6
Bag-of-Visual-Words (3)
■ Examples for Visual Words
Examples
p
for visual words
Airplanes
Motorbikes
Faces
Wild Cats
Leaves
Does one size
really fit all?
People
Christian Hentschel,
09-18-2014
Bikes
[Schmid, 2013]
Chart 7
Bag-of-Visual-Words (4)
Does one size
really fit all?
Christian Hentschel,
09-18-2014
Chart 8
Bag-of-Visual Words Classification
■ De-facto standard: kernel-based Support Vector Machines
□ Decision rule:
□ Kernel-Function:
□ Distance metric:
Does one size
really fit all?
Mo d el
Christian Hentschel,
09-18-2014
Chart 9
Bag-of-Visual Words Classification (2)
■ Testing different classification models
□ Average Precision (AP, area under Precision Recall Curve)
■ Test Dataset
□ Caltech-101
– 100 + 1 object classes
– 31 – 800 images per class
■ Tested Classifiers:
□ Naïve Bayes, K-NN, Logistic Regression
□ SVM: linear SVM, RBF kernel SVM,
Chi2-kernel
SVM
Does one size
really fit all?
□ Ensemble Methods:Random Forest, AdaBoost
Christian Hentschel,
09-18-2014
□ Hyper parameters optimized in grid-search using CV
Chart 10
Bag-of-Visual Words Classification – Results
■ Mean AP scores over all classes:
0.67
Chi2-Kernel SVM
0.63
AdaBoost
0.61
Random Forest
0.59
RBF kernel SVM
linear SVM
0.55
Logistic Regression
0.55
k NN
Naive Bayes
0.52
0.48
Does one size
really fit all?
Christian Hentschel,
09-18-2014
Chart 12
Bag-of-Visual Words Classification – Results (2)
■ mAP-scores between best (Chi2-SVM) and worst (Naïve Bayes): 0.19
□ Poor performance of Naïve Bayes and k-NN – but fast training
■ Superior performance of kernel-based SVM, but:
□ Kernel function (Chi2 vs. Gaussian RBF) is crucial:
– Ensemble methods outperform Gaussian RBF
– Gaussian RBF only slightly better than linear SVM
□ increased evaluation time:
– complex kernel function between each SV and a testing example
Does one size
really fit all?
– ensemble method reduce classification time
Christian Hentschel,
09-18-2014
Chart 13
Bag-of-Visual Words Classification – Results (3)
■ Correlation between training sets size and average Precision:
Does one size
really fit all?
Christian Hentschel,
09-18-2014
Chart 14
Bag-of-Visual Words Classification – Results (4)
■ Outliers:
□ “minaret”
□ “leopards”
Does one size
really fit all?
Christian Hentschel,
09-18-2014
Chart 15
Bag-of-Visual Words Classification –
Model Visualization
■ Visualize impact of individual image regions on classification result
Local
Region
Descriptor
BoVW
Vector
Feature
Weights
□ Use ensemble methods
– No kernel function
– AdaBoost:
direct indicator for feature importance: mean decrease in impurity
Does one size
really fit all?
Christian Hentschel,
09-18-2014
Chart 16
“minaret”
■ “leopards”
Does one size
really fit all?
Christian Hentschel,
09-18-2014
Chart 17
■ “minaret”
Does one size
really fit all?
Christian Hentschel,
09-18-2014
Chart 18
■ “car_side”
Does one size
really fit all?
Christian Hentschel,
09-18-2014
Chart 19
■ “watch”
Does one size
really fit all?
Christian Hentschel,
09-18-2014
Chart 20
Conclusion
■ Kernel-based SVM are best choice when aiming for accuracy
□ Kernel function is crucial
□ Evaluation time-cost is high
■ Ensemble methods are second-best winner
□ Fast evaluation
□ Offer intuitive visualization of model parameters
■ Visual analytics reveal deficiencies in datasets
□ Improperly chosen training data affects classification results
Does one size
really fit all?
Christian Hentschel,
09-18-2014
Chart 21
Thank you
for your attention!
Christian Hentschel, Harald Sack
Hasso Plattner Institute
Download